Google's latest AI innovations demonstrate a fundamental shift from passive information retrieval to active AI agents that can autonomously perform complex tasks, including continuous real-world monitoring, multi-step booking across online and offline services, and multimodal content creation with character consistency. Gemini Omni represents a breakthrough in video generation AI by enabling precise control over visual elements while maintaining character consistency across clips, supported by Google's massive compute infrastructure scaling. These developments illustrate how AI is evolving from simple query-response systems to proactive assistants capable of understanding context, making decisions, and executing multi-step workflows across diverse domains from search to creative production.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
【創業以来で最大の変革】Google CEO「ただの“出発点”」/「Gemini Spark」は24時間無休で働くAIエージェント/OpenAI撤退の動画生成で最強「Omni」登場【AI QUEST】Added:
AI, Google, Google.
Google in search through and AI over Fore!
Foreign! Foreign!
AI agent agent Gemini era agentic payments agent agent introducing Gemini Spark.
It's your personal AI agent that helps you navigate your digital life, taking action on your behalf and under your direction. It runs on dedicated virtual machines on Google Cloud and it's 24/7 and yes, you can close your laptop.
We're launching a brand new intelligent search box. Before the search box was a contained space, but now it's totally reimagined with AI. It expands with your curiosity. You can ask across modalities with text, images, files, videos, and search reasons across them all. Now, this is the biggest upgrade to our iconic search box since its debut over 25 years ago, and is starting to roll out today.
Asian.
All right, welcome to the search AI sandbox. Um, so I have four demo options for you today for how we're bringing the latest AI capabilities to Google search to make it more helpful. So the first option is information agents which allow you to track updates in the real world.
Uh the second is aentic booking and calling capabilities that will allow you to find complex things in the real world. The third is building uh generative UI to help you visualize and simulate custom or complex topics. And then the fourth is building uh custom mini app experiences with anti-gravity.
So let's start with updates uh information agents updates. So this one I really like when I want to keep updated on something that I care about.
So, this one is about keeping me updated when there's new restaurant or bakery tasting pop-ups in SF, especially if it's on a weekend. So, this is going to set up an information agent to monitor for these new restaurants and bakery tasting pop-ups, and it's going to give me a notification when it finds something that's relevant to my interest. So Google automatically like uh search uh for new popup like periodically and >> yeah it will be like continuously monitoring these new popups.
>> Um and this one's a little bit hard to live demo because it relies on like updates in the real world. So we ran a few of these last week. Uh so I'd love to show you some example responses from real queries from last week.
All right. So here's the one for bakery tastings. And yeah, basically it's found several different bakery popups over the weekend. Um, it's also found different food and culture tastings and it will continue to monitor this >> every week to find things that I care about.
>> Mhm.
>> All right. So, that's information agent.
>> And next, I'll show you another gentle capability for booking and calling in AI mode. Like if you're looking for a restaurant reservation or a hair appointment, you have to make sure it's available, it meets all your needs. Um, so here we're looking for a pretty complex task of uh booking a private karaoke room. We want to find one in Chicago, Chinatown on a specific date that can fit nine people and we needed to serve food all night around 8:00 p.m.
for 2 hours.
>> So typically to accomplish this task, you'd go to all these different karaoke websites one by one. You plug in your date, your time, your number of people, you check out their food options. It might take you quite a bit of research, but now AI mode can help you check availability.
And so now, basically behind the scenes, AI mode is going to all these different websites and using its live browsing capabilities to click and scroll around to find karaoke rooms for your specific dates that match your number of people, looking at the food options. And this is going to take a few minutes to run. So, we've preloaded a response here where it's found karaoke rooms that match your specific date, your time, um, 2our sessions. It's pulled the food options for you to look at. You can dive deeper and check out kind of the the vibe of the venue. And then when you're ready to book, you can click through and this will take you to the third party site where you're in control of the booking.
>> Um, sometimes this information >> Yeah, exactly.
>> thing.
>> Yeah, exactly. Okay.
>> Um, so this is great when information is online, but we also know there's a lot of businesses that aren't online. So, if you're moving out of your apartment and you need an appointment for a deep cleaning service and park slope, this is one where first it will kind of better understand your request like how many bedrooms and bathrooms. You'll give the size of your apartment and then it will offer to say like I'll look for you online, but also I can call businesses for you if online information isn't enough. And if you're okay with that, I can also send you an email when your results are ready. So if you're all good with that, then first it will start by looking online. In this case, it only found one online appointment. And because of that, it's going to go and call a bunch of businesses that are online. And in this case, it was able to reach four cleaning services by phone.
And it will relay what these businesses said, including the cost of their cleaning, their availability. Some of them have extra cost for like stove and refrigerator cleaning. And if you want, I can also show you like behind the scenes a simulation of what this looks like on the business side, like what that sounds like.
>> That would be great.
>> Sure. Let's hear it. All right.
>> Hi. Um, I'm calling for a customer who had some questions. This is an automated service from Google and we call monitor and recorded to improve our services.
They would like to know what's your availability and cost from DC.
The building is between May 14th and May 15th between 1 and 5 p.m.
>> Okay. And what is the cost for a move out? The customer has a twobedroom, one bathroom apartment.
Two bedroom, one bathroom, we start at $250, but that doesn't include those.
>> Got it. Is it extra for stoves or pet stones?
Yes, tech and carpet is going to run you $50 a game and so clean up for $75.
>> Okay.
>> Wow.
>> Yeah.
>> Wait. So the Google AI is actually calling the businesses >> to see the the information.
>> Exactly. Yeah.
>> Wow.
Wait, this is a ongoing thing >> or ju just a de demo or >> so this will be in response to a task from a real customer who has these needs and then uh with the customer's permission it will go call >> businesses um >> was this already launched? Oh yeah. So it has been launched um in US uh for a bunch of different services like beauty appointments uh auto services, home services, uh pet services.
>> Now we can call these businesses.
>> Yeah.
>> Great.
got >> discovery is going to improve all this is going to impact people's lives in the US as we this year at Google IO we laid the foundation uh for you know agentic transformation across our products right and it may have you've heard concepts of agents in search and Gemini inner models in Chrome and Android and so on. But ultimately we are in this moment of technology transformation where at some point you would query get some information back.
From there you move on having ongoing dialogue conversation with these products to these products being able to actually uh uh you know take meaningful actions and make your lives easier on that basis. That transformation is what I think people will you know if they look back and uh remember as a foundation and I think uh it will work across our key products and uh you know so I would be uh super excited to look back uh at that and actually making sure that it's worked well uh and empowered billions of users These are the first two designs of a bigger collection that's coming this fall. All I have to do now is ask.
>> Hey, can you navigate me to the place I met my friend Gianna at last week?
>> Hey Nisha, I've set your route to the Redwood Grove Nature Preserve from last week's hike. Want to add a stop on the way to grab your afternoon cold brew?
>> Yes, Gemini. I would love that.
>> Okay, I'm starting walking navigation with a stop at Koopa Cafe.
Gemini, can you actually put my usual order in at that coffee shop we just talked about?
>> Sure. I'll order you a Nitro cold brew for pickup from Koopa Cafe on Door Dash.
So, this is the one >> Google announced last year or this year.
>> Um, but it is a new prototype and there are new features you'll be able to >> with display.
>> With display, >> not not just not audio glass, display glass.
>> Display and audio. The display is in the right eye. I'll give these to you to try on.
>> There you go.
Okay. And I gave your glasses back.
Okay. Perfect.
>> Yes.
>> Okay. Go ahead.
>> Hello.
>> Hello.
>> I'm Lucas.
>> Thank you.
>> Nice to meet you.
>> Nice to meet you.
>> We're going to be going through a go to my shopping list.
>> So, in the in the background, uh it calendar.
Okay.
Take a picture of me and put me on the mountain.
But there you go. You should be seeing yourself on the moon.
>> So if you if you would like Oh yeah. Please make sure you >> can you guys take the questions.
>> Do you mind if you know camera?
So, uh I I heard this is a new prototype from the different from the one the announced IO last year.
>> They are Google prototypes. Um they're not the first one ones that you saw on stage earlier. U which are coming out >> the display glasses come in.
>> We're not sharing that date yet, but it will be after after the audience.
>> Okay. What's the difference from last year? show this year like all the integrated apps. Um, so being able to do things like listening to music with visual cues as well as the AI camera and then of course a lot of the integrated like search um navigation >> all the workspace integrations like task keep calendar all of that stuff is >> it's new I believe we did not share that >> right >> I thought the display on the lenses It's it's really hard to make you know the >> I think I can see the what's on the display really clearly in the bright environment.
>> Yeah.
>> So is it what what's the point of the making it work the display work? So these glasses are really meant for everyday wear and so like obviously it's really important for it to work in you know regular normal like conditions. Um and so I think that's the optimization that you're seeing and glad to see that it was um >> yeah yeah >> bright lit room that's you know we expect people to be wearing their glasses inside outside >> um getting that everyday assistance you know on the go which definitely >> need that in the daylight outside for like navigation help >> um you know all sorts of like reminders tasks all of that stuff >> we expect people to be this stuff um when they're you know out and about.
>> So there's been improvement both from hardware side and software side both >> correct? Yeah.
>> Okay. We've been making progress on all fronts past integrate Gemini into the card.
Deep mind.
Many of you have may have heard of Nano Banana last year and we're generally on a journey to bring Gemini intelligence to our media models and Gemini Omni is really the next step towards being able to do that. Um we're also really excited about um you know fulfilling our promise of making Gemini truly multimodal. Um and again this is sort of the next step towards that where Gemini Omni can take in any input so images, videos, text, audio. Um and right now generate video but we're excited to bring in more out of modalities in the future as well including image generation and audio generation. So as you saw in this clip the model is much better than our previous models like VO really modeling the world accurately. Um and we think this is an important step towards you know artificial general intelligence and being able to really simulate the real world and also teach you know whimos and robotics how to navigate the world including in situations that they may not frequently encounter. So by you need to know what's the next see a person walking and you need to predict what's going to happen next then in a way the model has to understand the basics of walking. We want to take a person and or an animal in the case that you seem like a monkey and wait on someone's head. But the model has to understand how that monkey interact with with the person maybe how it jumps. So, so all of that that's what we call modeling or understanding.
creativity.
Okay.
Google.
Hi. Hi.
So many Yeah.
Yeah.
Oh, you are in charge of Google's AI uh sort of production tools like flow and flow music. So what does it do basically?
>> H you can think of flow as a creative suite for visual creation.
>> Could you give uh a quick demo?
>> Yeah, definitely.
>> How flow works.
>> Welcome to Google flow. So there's a couple parts of it that I'll show you.
You can see I started talking to the flow agent and I just said what I wanted to do. I'd like to set a scene in the 1980s era New York City, specifically Time Square. Can you give me five photos that are representative of a typical Saturday afternoon in the summer in the 80s there?
>> And you can see it made five different images >> that are all using Gemini's knowledge of the world. So both creating images, but also creating images that are grounded in real information. So for example, >> in this one you can see lay miz Oh, this is so 80s.
>> All right, I'm able to say, "Okay, great. I'd like to show off the era and highlight some of the key differences with the Times Square of today. Give me some storyline options."
>> Another one is a tourist who arrives and they only have a paper map. They don't have Google Maps. They don't have a cell phone or a GPS. I loved that third one.
And then I asked it to create a story board and so it frames out the different beats of the story for me. the arrival, the broken link, the human sea, the neon shelter, the rendevu.
>> And so you can get how I go from this highle concept to an increasingly specific idea that I can bring to life.
>> Second thing I wanted to show is around character consistency and videotovideo edits. So I'm going to >> go into a project I've been working on.
So in flow, you can now create a character.
>> So I had two. One is um a lovely person named Jazz and the other is it's kind of my evil alter ego called Ominous Alliance. And so for every character, they have a visual identity.
>> There we're introducing a new feature.
>> You can customize like everything about the personality and every like Yeah, >> exactly. And they don't have to be photorealistic. They could be animated.
They could be animals. They could be whatever you want to be your character.
And so here I started by making a video of Jazz riding a bicycle. And all you have to do when you want to call a character, I'll show you an example, the prompt box. If I type at jazz, >> it just adds my character and I can ask for whatever I want.
>> So in this case, I made a video of Jazz riding a bicycle in a suburban neighborhood.
>> And then I decided actually I want to swap it. I want the exact same video, but I want it to be the ominous Elias character.
And so you can see that video. It's identical.
>> Yes.
>> And then I realized actually I think it'd be a funny mix if I brought back that miniature pincher.
>> So I said, keep everything the same, but add a basket on the bicycle and put a miniature pincher in it, >> but put it in different locations around the world in famous spots. So obviously that's in front of, you know, the Statue of Liberty, in front of pyramids, in front of the Great Wall, in front of the Taj Mahal.
>> And you'll notice everyone only the thing I've asked for has changed.
everything else is stayed exactly the same. So all of those are just examples of how much you can control the same character showing up the same way and making these precision edits from video to video.
>> Google announced about uh Gemini Omni uh at this year's IO. So that how does Gemini Omni um like change like how you make videos, edit videos or create something? M two of the most if we go back to flow two of the most important things I showed one how the same character shows up the same way from clip to clip to clip >> it's Gemini Omni flash that allows that >> the multimodal references of an image reference a voice reference a text reference to describe their mannerisms that's part of the multi in multimodal those are each different media types that it's able to understand and use to create the output video >> and then the second part is bringing the real world knowledge of Gemini to media generation just like we saw with Nano and Nano. One of the incredible things it unlocks is precision.
>> So the same character with just a different background or just a bicycle with a basket instead of a bicycle without them. That level of control would have been very difficult to achieve before. So those are the two big things that Gemini Omni Flash brings to flow and to film making. Mhm. Why does Google make those like production tool like music video production tool, music production tool?
>> Maybe part of the reason you were surprised is this is a new product line for Google. Like a product line totally dedicated to creativity and creative work.
>> It's new. Flow was the first product.
Flow music is the second.
>> At Labs, we organize our work around a set of futures that we want to chase.
>> Google Labs.
>> Google Labs. And one of those futures is the future of creativity. And we think the future of creativity will be fundamentally changed by AI. A whole generation of creators that couldn't have created before can now. And the best creators can make even more incredible stories than they could before.
>> That feels like such an important thing to help be a part of. And so that's why we're doing an entire product line under Flow around creativity.
>> Great. It's very interesting to hear all of that.
>> Thank you very much. Thank you so much.
>> Thank you so much.
Fore is an extraordinary woman in computing the technology world, the scientists, the researchers and the engineers always respond well to challenges. We are asking more of our compute infrastructure uh than uh than ever before. One one good way to say in the past two years, Google has built as much compute capacity as in the prior 20 years of our existence, right? So you don't often see moments like that that is challenging every layer of the stack. I'm excited if you look at the amount of investments we are making in new sources of energy you know we are signing purchase agreements for nuclear fusion for geothermal uh for small modular reactors we are uh you know we've increased the pace at which we build our TPUs right and we are increasingly developing I spoke about rather than only having to train in large single data centers how we building a virtual uh cluster across the globe. These are all innovations which are happening thanks to this moment and I think uh obviously we rely a lot on uh supply chain uh uh from uh you know many countries uh in the world and but particularly in Asia and I think you know they are responding very well to this dynamic moment and I think I think it's going to be an extraordinary moment of innovation economic value creation uh across the ecos system and I'm excited to be a part of it.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











