This overview succinctly captures the industry's pivot toward autonomous agency, moving beyond simple chat interfaces to integrated, multi-step execution. It serves as a practical primer for understanding how AI is evolving from a conversational tool into a functional workforce.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
ChatGPT Hermes Agents LEAKED, GPT Images 2.0 Drops + Google's NEW Autonomous Research Agent!Added:
Chat GPT agents just leaked, a full agent studio baked into the app with templates, slacks, schedules, memory, and always-on workers living inside Chat GPT. OpenAI also dropped Chat GPT images 2.0, and Google also dropped two new autonomous research agents called Deep Research and Deep Research Max. So, let's get into it.
According to some leaks, it looks like OpenAI might be working on an agent studio and creating a new agent called Hermes. So, OpenAI is kind of preparing an agent in Chat GPT, and the code name is Hermes or Herms. I don't know how you pronounce it, but it's supposed to include an agent builder studio, templates, schedules, option to use your agent in Slack. You can also add apps, skills, files, and memory, and instructions, and more. It looks like they're combining what Open Claw is and kind of putting it into Chat GPT. One thing to note, the founder of Open Claw is kind of tied with Chat GPT now and the Codex team. So, this kind of makes sense for them to create something like this. But, what this agent is supposed to do is supposed to work for you 24/7.
You can also start with a proven workflow. What that basically means is that you can pick a template and get your agent up and running in a minute.
So, if there's a template that OpenAI already has set up, you can use that, customize that, and get your agent ready to go. Or, you can start from scratch and build that whole workflow by yourself. You can also build agents that reply to you in Slack. And, you can add your agent to Slack to handle common questions without the back and forth that usually happens when you're working with agents at the moment. You can also create agents tailored to how you work.
So, you can customize each agent with tools and skills, and then you can also add in schedule of when you want it to run specifically, which I think is important because this opens up possibilities that are kind of endless.
Let's say you're a marketing consultant, you can have your own marketing AI agent that's tracking your client's portfolio 24/7 and compresses your work week in like two to three days, which I think is amazing. And, there are a few other changes that have been referenced as well. Things like audio summary, more like a public radio style recap, podcast, executive briefing, study guide, etc. I don't know how that's going to tie into the agent builder.
That might be a separate update that might be coming from Chat GPT, but I think this is really cool because this is kind of showing us that building agents is going to get easier and easier, and the more powerful these agents get and the more powerful these models get, the real-life workflows are going to get much more easier to automate, which is going to have a lot of benefits for a lot of people, especially business owners, even corporations. They might be able to replace a lot of jobs that are tedious or manual with AI agents, which I think is sad, but also I guess the reality of things nowadays. But, let's see what this feature actually looks like when it launches, and let's see how strong it is. Before we continue, we just launched the Universe of AI newsletter. If you want to stay on top of AI news without having to hunt for it, link is in the description. Don't miss out. OpenAI dropped a new model today. It is not GPT 5.5 Pro yet, but it's the new Images model from them, which is Chat GPT images 2.0. Now, this new model, according to them, is much better at handling detailed instruction and is really good at handling precise instructions you might give. It can also generate objects that are accurate and render dense text at a level that's not done by the previous model. There's also more precision and control up to 2K resolution, which handles the stuff that usually breaks Images model, like small text, iconography, UI elements, dense composition, things like that. And, the new model is more precise when it comes to that. Now, this new model is also not limited to English alone. It has multilingual capabilities, so it can generate images in many languages. You can generate and translate text on screen. Let's say you had a poster in English, you wanted to translate that into Chinese, for example, the new model can handle that and is going to be precise. There's also a wider range for styles. They're stronger on photo, cinematic stills, pixel art, manga, and other distinct visual languages, and it also has better consistency in texture, lighting, and composition, and even fine detail. So, game prototyping, storyboarding, and marketing creative, all of those tasks got much more easier with Chat GPT images 2.0. There's also newer aspect ratios, anywhere from 3:1 to 1:3, so outputs can come out ready for banners, slides, posters, social graphics. So, they're really targeting a wide audience with this new image model, and I think this is probably because they closed down Sora. So, all the data and all the learning they gathered from the Sora initiative is probably used to train the new model, so you're going to have a stronger model. And, remember, Sora was also used for video generation.
When you're generating a video, you do need to gather a lot more data compared to just images alone, especially when it comes to position and consistency. So, they probably used all that data to beef up their new model, which is Images 2.0, which I think was a smart move by them cuz clearly this model looks way better than anything that is out there in the market at the moment. They're also saying that this is their first image model with thinking. And, when you turn thinking on, it can search the web for real-time info, generate multiple distinct images from one prompt, and double-check its own outputs, and even create functional QR codes, which I think is really cool. It also has real-world intelligence, and the knowledge cutoff is set to December 2025. So, to summarize, this new model is not only supposed to be smarter, but it's supposed to think more thoroughly on purpose. So, it might be a little bit slower, depending on if you have thinking mode turned on or off, but that's going to help you do less prompting, meaning you're not going to have to follow up with the model and say, "Hey, fix this." or not because it's going to be more accurate and precise in the first generation. So, now this model is available to everyone, so all Chat GPT and Codex users have access to it today. But, the Images with Thinking is only available to Plus, Pro, and Business users, and the GPT Image 2 API is also live now. So, test it out and let me know what you guys think about the new model.
Google also dropped two new autonomous research agents today, Deep Research and Deep Research Max, and they're both available via the Gemini API. And, they're powered by the latest Gemini 3.1 Pro model, meaning you can now trigger any research workflow you had with more control and transparency than ever before. There's also arbitrary MCP support, which basically means you can plug in any MCP server into the agent, and it can pull from whatever tools and data sources you already have connected or have built. There's also native infographic and chart generation. So, what the agent does is does bunch of research, then creates a report for you that has high-quality visuals through HTML or Nano Banana 2 baked directly into the reports, and it's not bolted on after, it's actually built into the agent. You also get fully cited reports grounded in the open web if you choose that option or your own files and data.
So, this is really huge for a lot of people, especially for people working in area where research is their bread and butter. Think of like equity analysts, think of science workflows, things like that. So, this research feature is huge.
Now, Deep Research, the lower tier, is actually built for speed, and this is for when you're actually working with the agent live. And, compared to their previous research agent, this is much more faster, and it costs less, and is ideal for when you want the answers fast and you want high-quality answers. So, that's when you would use this. Compared to Deep Research Max, which is built more for deeper dives, and it takes extra time to think and reason. So, this would be something that you would set up at night, go to sleep, and hopefully by the time you wake up, you have a generated infographic, reports, and things like that. So, this is more of a deep dive agent, so you would use them based on your needs. If you want something instantly, use Deep Research.
If you want something that takes a longer time, thinks more thoroughly, use Deep Research Max. Both of these research agents have also gotten new capabilities. There's collaborative planning, meaning before the agent actually goes research anything, you can sit down with it, figure out a plan, say what you want it to do, say what you don't want it to do, and have that plan ready before it executes. There's also extended tool links, so you can run Google Search, MCP servers, URL contacts, code execution, and file search all together. Or, you can also turn off web access entirely if you only want it to, you know, go through your own custom data. It also understands multimodal inputs much better. It can look at PDFs, CSV files, audio, videos, and images. So, it expands the use case quite significantly. And, there's also native charts and infographics built into it, so it can generate high-quality visuals like I mentioned before through HTML or Nano Banana 2. And, there's also real-time streaming, meaning you can track the agent's reasoning steps and receive text and image outputs as they're generated. So, you're not waiting for it to complete the task at all. You can track its process throughout the whole workflow. What I personally found interesting is that when you're actually using the Deep Research agent, you're using the same autonomous infrastructure that is powering Google's most popular products like Gemini app, Notebook LM, my favorite, Search, and Google Finance.
So, both of these agents that you're working with are the same agents that are powering one of the most popular apps. And, if you ever used Notebook LM, you know how good Deep Research is on that section. Now, you have access to that as well, and you have access to Gemini Deep Research Max as well. So, try these agents out, especially for doing anything that requires a lot of research.
But, that's it for today's video. Make sure you guys are subscribed to the channel. [music] Follow our new newsletter as well at universeofai.beehive.com.
As well, subscribe to the main channel, World of AI, and support us on X by following the Universe of AIZ as well.
Until then, I'll see you guys in the next video.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











