VP Land masterfully clarifies the shift from chaotic generation to physics-aware modification, framing Google Omni as a "world model" rather than a mere pixel synthesizer. This distinction marks a crucial evolution from AI as a digital artist to AI as a spatially-conscious editor.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
“Nano Banana for Video”: The Simplest Way to Understand Gemini OmniAdded:
I think the best way to think about it and the way that they were explaining it at the event was like this is not V4 [music] and it's a completely different model. This is the video version of Nano Banana Pro.
All right, welcome back to Denoised.
Addy, good to see you.
>> Nice to see you, Joey. Welcome back to town.
>> I am back in town. I just spent the last few days at Google IO which was a lot of fun. I am decked out in swag. I got a Gemini hoodie and vibe blank vibe code hat.
>> Maybe I'm just getting old, but when did tech gear free stuff start to look so good?
[laughter] >> Cuz remember you we used to go to CES like 15 years ago. You get these free t-shirts. You're like, "Oh, I can't wear that."
>> Yeah. The husky shirts or whatever the the cheapest ones you could mass print.
>> Exactly.
>> Out. Uh, no. This stuff's very nice.
Okay. So yeah, there's a lot of Google updates obviously and a lot of IO updates or updates from IO. So let's talk about it. Biggest one, Google Omni, their new video world model.
>> Yeah, big update. Um, I know we get video models that are coming out every couple of weeks and Google also dropped, you know, they're also in the fall. But what I noticed, I mean, please go through the Omni specs with me and then I'll tell you what some of the things that I thought was really amazing. I've been sitting on this since it for about a week of like where does this fit in or what is this good at? My initial when I first had access and messed around with it obviously my first jump to like okay how does this compare to claying and sea dance and out of on the surface I would say for like if I'm looking to do cinematic visuals it still is not at the level of four and it's a completely different model. Um this is the video version of Nano Banana Pro. It's a world model. It has world understanding. Uh it understands physics and it basically VO is more of a diffusion model. Uh that was just really good. And so this is completely different. This is like V1 of this new omni model. And the things that it is really good at is editing videos and modifying videos. I feel like we're going to get stuck with calling things editing videos. I still think editing videos is like snip snip. cutting videos together. But that lingo has >> Yeah, I call them video to video mod.
>> Video to video or just video modification. You give it an input video and you change something >> in painting. Yeah, >> in painting video.
>> Um really good at that. Really good at just giving it world understanding prompts and really good or pretty good um avatar feature which we can talk about in a second. I thought the the like the biggest sort of um you know the biggest gain in quality was definitely building a human from the likeness of like I don't know like a one minute or 90 second calibration very similar to what you did with the Sora app.
>> Yeah. So the Okay. Okay. So, yeah, the avatar feature, let me say if you give it an input image, and I tested it with like a couple of like photos of me or my wife and like image to video, uh, and it has, and if you go in the Gemini app, there are a lot of like template setup of just like fun things you could do with your stuff.
>> It really the likeness fell apart. And that was my first impression when I kind of gave it some images of me from like a trip and it tried to make a video. It completely changed our faces. Um and my initial impression was like what the hell is this model?
>> What's the fuss?
>> Yeah. And then um actually at the event itself uh I was speaking with Justine Moore from A6Z and she was like, "Oh, you got to try the avatar feature." And I was like, "What avatar feature?" And it's like buried in the UI is this avatar feature that it's assigned to your account. You basically use your phone. It's very similar to Sora. It has you like turn your head left, turn your head right and then you say a bunch of numbers >> and that is your avatar. It's tied to your account and then you can make videos of yourself with your avatar and the quality there is way better.
>> Oh, so good. I thought this is way better than Sora's avatar was.
>> It's way better than Sor's avatar.
>> And even like I think you used Avatar for the champagne shot here as well as the the Miami Vice gray t-shirt shot.
Like to me that's 98% Joey. Like I I've seen you in person enough to know like like maybe you're not.
>> It's like >> you're not this much of a stud. No offense, [laughter] but like it's pretty close.
>> No, I like Omni that it definitely, you know, kept giving making me more jacked than I am.
>> Yeah. And it just gave you a sharper jawline and just fuller hair. like it just like amplifies like the common denominator is is like a really handsome man or a woman and it just kind of pushes your avatar into that direction.
It doesn't retain 100% of your likeness, but I think that's a creative choice on Google's part. I think if they really wanted to, they can dial that aesthetic thing down and just keep it true >> there's probably a big balance of just like what is it actually capable of and how much are they letting people do?
There were times where I was trying to test out prompts and stuff and it was just like >> either it was overloaded or it couldn't do it or wouldn't do it. Um, and sometimes it was like what's the reason behind that?
>> The other the other really impressive thing about the avatar feature, there you go. Yeah, you're pulling it up right now. is the is the vocals like the the AI voices miss a lot of the inflections and some of the like the hey you know and then the like the range is very limited but here I feel like they're they're getting a little bit better with that range it's still not 100% as natural as how we sound every day but it's getting there and it's I think Omni really made it >> kind of like the issue with the avatar and so you can see like this was the source image like I train this at night in my hotel room on my phone. So, from the visual quality, uh, really good of what I was able to pull out from my phone. The audio of my voice, let me try to phone microphone because it was and there's no like enhance like no Adobe podcast audio. There's no like audio enhancement that this model is doing.
>> Google just dropped Gemini Omni. It's nano banana for video.
>> Yeah, that doesn't sound like you at all. It's funny because some of the other examples that I've heard is pretty spot-on to the people that >> maybe like uh you know if I plugged in a DJI mic it and trained it with a better microphone then maybe it would give me better outputs.
>> I don't know if it's the mic. Does it make you just say the same thing for >> No, it's in the prompt. It's whatever.
Oh. Oh. Oh, you mean >> like the calibration process?
>> The calibration is literally you're saying numbers. That's it.
>> Yeah. But I'm just, you know, I'm just holding up like >> So you don't say like Mary Se, it's whatever you know your mic is here.
So yeah, if I had a better mic plugged in, then maybe I would.
>> Yeah, but I was quite impressed by the avatar feature more than anything else on Omni. Although Omni does have really good reasoning in the same way that Sora does. Uh whereas like if you give it just an idea, it'll expand on that idea and really go to town on creating the entire editorial or the cut.
>> Yeah. Let me show you some of the outputs. First up, an honor view.
>> Oh yes.
>> 1970s New York City walking down the street.
>> Although it is technically I mean these cuts are incredible, right?
>> Yeah. And this is a very basic prompt.
Like I didn't give it any info. It just kind of reminds me of um Saturday Night Live um like the Travolta scene where he gets the pizza two slices. Two two and then he does the walk.
World understanding stuff. This was a video to video test. And so the original video was just like my wife walking the dog >> Mhm.
>> uh on Venice. So it's like okay dog standing ping around. And then I said okay turn the dog into a robot.
And like the dog turned it to the robot.
The movement looks good. Everything else looks >> no distor. The only thing I saw was her left hand and the leash was a little funky when she grabbed it again right there.
But I'm being super picky. Obviously the the sheen on the and the finish on the metal seems very uh plasticky and not as reflective and you know responsive to the environment. Yeah, I think those robots are matte plasticky. It even has the same kind of movement of it, but it also kept his arm.
>> I'm being super picky. This is incredible.
>> Like robot dogs here.
>> Yeah. Yeah. Wait a [laughter] minute.
This is the best. Well, cuz also I will once we get to our later updates of Olive 2. I will process the same shot.
>> Looking forward to that. [laughter] >> I haven't tested I assume you could give it I could give it an input image like a restyled image to give it more direction and be like change the dog to this. This is my prompt is literally change the dog and picks the robot that it wants >> most basic prompts possible. Yeah, pick the robot but also like, you know, literally understood the dog, swap the dog out, everything else like the transparency of the background.
>> But the fact that it changed it to a robot dog and not like a robot the size of a dog that's a biped, >> that's reasoning like it's going through some sort of intelligence.
>> Yeah, it kept the same, >> right?
>> It kept the same dog thing. Yeah. I mean, I also did this one with same same source video. I said change them to a chimpanzee.
>> No, no, that looks better. See, like when you don't do metals and reflections to me, that feels more real.
>> These look really good out of the box.
It just kind of these video to video once it was like, okay, don't look at it as like a video generation model, but video modification model.
>> This one, this was a shot from one of my favorite installations at LACMA. I forgot the name of it. The um Metropolis city with the >> Excellent.
Um, >> it's like a permanent installation. It's like a massive permanent full.
>> Yeah, it's like a bunch of like Matchbox cars in this like crazy futuristic city.
>> And I said restyle it to a futuristic city, but like keep the same geography.
>> Yeah, they kept that shot, right?
>> Yeah.
>> And kept the cars, kept the stuff. A little bit of warping there, but like >> this was one of the best outputs where it still kept the >> the shape of the track, >> the camera movement, and it just changed everything else. One more. This one was actually fun. And this one I just gave it an image of one of the Google buildings cuz they had a bunch of these and they just looked really awesome. And then I said make the building take off like a spaceship.
>> Yes. [laughter] >> Yeah, I did it.
>> Like I kept the building [laughter] it added this extra shot I wouldn't have done but like it kept the building and had it shake and bust out of the ground and have rockets underneath. Crazy. Um, okay. The world understanding this. I said create single shots of vintage objects. The first letter of each object spells out the word denoised.
The letter should also be on the object in a natural way.
>> So basically I wanted the first letter if it was like you know whatever like I said vintage object but if it's like dog like the first >> it's a good uh reasoning test. Yeah.
>> Uh >> E N.
>> Yeah. So does it understand objects?
Newspaper oils.
E >> uh D >> didn't get the last one. Don't know what the last one was, but I got basically uh dial phone cuz I asked him like explain.
>> Love that. Yeah.
>> Yeah. I said explain what you what the list was. Uh and it also the list objects here are slightly different than what I put in the video. So the understanding between the two is a little mismatched. But and newspaper Oil can >> the ink thing with Old timey pens.
>> Ink ink holder. Eye. Yeah. Um. S.
Stopwatch. E. I think that was Edison fan.
>> Eyeglasses. Okay.
>> All right. Well, I don't know. And then the key. I don't know.
>> Did you generate this in one shot?
>> Off the rails, but >> that's crazy.
>> This is literally this one.
>> The fact that it is able to generate so many different things and then cut it together into one clip.
>> Yeah, that's impressive. That's where the world model part comes in of like it knows objects, letters, uh you know much sharper at text rendering.
>> It just it just trained on the internet, man. Like [laughter] it just knows every single object known to man >> through the through the years through the decades. Yeah. Yeah. Like it'll know all types of bulbs, not just this bulb, right? It'll know fluorescent bulb, LED bulb, and so on.
>> So Omni probably biggest announcement.
Super cool. They also had Gemini Flash 3.5 and 3.5. Excuse me. Tell me nano banana. Yeah, [laughter] >> games are confusing and it's still basically they're like it's not the pro model, but it's beaten benchmarks for them that their 3.1 Pro has and that they're like kind of pushing it as the new model for now until there's like a pro model.
>> No, no, no. This is just Gemini like their text model for >> Okay. Yeah. the >> correct me if I'm wrong, but the official name for Nana Banana is also like Gemini something, right?
>> It's like Gemini image 3 something. Yes. Yes.
>> Yeah. No, you you said you saw Sundur, the CEO, as well as Deis uh speak on stage. What what what did they say? What happened?
>> Oh, yeah. I mean, so yeah, they had the keynote. They they they they did that and then there was like um kind of fireside chat side things uh that you could go watch.
And so yeah, I know for Devis it was AJI coming in >> Oh wow, three years. They pushed that timeline way back. Yeah, >> that was his time frame.
>> Well, okay.
>> Timeline.
>> Yeah.
>> And there was definitely a bigger push.
you know, his was interesting because of like it was a, you know, debate of like the AI doom doomers versus like optimism, realistically optimistic or something. Um, but basically it kind of came down to like how you talk and frame uh when around AI and the risks it brings but also the benefits that it could bring which is also why they kind of did do a big push on like AI for science and I think it's a whole separate um division now or focus on Google. Yeah, that I'm glad they're doing that cuz I don't think um OpenAI and some of the other competitors are probably not investing as heavily in science.
>> No, they're not talking about it enough.
And that was like his original uh you know a lot of the original DMI stuff with um alpha fold uh you know mapping um proteins and stuff. So like >> you know that stuff it's like if you can get rid of diseases or cure diseases like no one >> Yeah. And if you can cure cancer, then you can go make a little bit more slop and it kind of just evens out. So they understand it better than we do. Um, and yeah, I think there like initially, if you remember a couple years back when like the first chat GBTs were coming out, they're like, "Yeah, at this point we're going to be able to like figure out uh nuclear fusion and cancer research and all the stuff that needs heavy heavy supercomputers." They're like, "Yeah, we can do it now." And then it kind of just went away. And then we had a bunch of meme generation capability and uh you know a lot of job displacement unfortunately like uh a lot of companies bet on AI and had to uh fund the capital expenditure so they had to fire people to fund that. So it was just all been negative since then. So, I'm hoping like with Damis' sort of push back into goodness for humanity, maybe maybe, just maybe, we can solve something huge here that will really benefit mankind.
>> Yeah. I mean, yeah, I think I think it's there's like a big just what is the benefit to me thing with AI and all this stuff and it's like kind of been hard to see, especially with like a lot of displacement, job loss, all that stuff.
And yeah, I think if you could just be like, oh well, you know, we can improve human health and well-being and cure terrible diseases. That is a clear benefit to society that is hard to dispute.
>> So about 12 months ago when we started the podcast, they were saying that, hey, AGI is going to be here very soon, but ASI will take a while. So general intelligence is what I think is defined by an AI system that is as good as one human being that can learn as well as we can. new skill sets and so on. ASI artificial super intelligence system is some it's like a bunch of human intelligence put together into one system. So it is super intelligent than us. And if they're saying AGI is now 3 years out, I'm guessing it's more like 5 to 6 years out and ASI is probably decades away. It's probably a good thing. It's probably a good thing. We don't need this stuff right away. There is enough disruption as it is. As you saw just a couple days ago, Meta had I think the biggest layoff yet, right? 10% of their workforce >> something like that. And they were also like everyone that was staying like >> there's leaked audio from the Zuck that yeah, we're we're training on everybody cuz you're you're you guys are really smart yet we don't need you here. I haven't dug into the full post, but also the CEO of ClickUp, they did a layoff and I the g I didn't read the whole thing yet, but like the gist was like, you know, people that stay, you know, AI should make them 10 times 100 times more effective, but like I'm also looking at compensation tiers that are like a million dollars. you know, if you are the person who like leverages AI and then could do like 10x 100x more and then get compensated appropriately for that.
>> Yeah. There there's a really interesting divide in the talent pool like in the job market now and I'm I'm kind of starting to notice it a little bit more and it's becoming more and more pronounced is that there is an over supply of talent on the like there's a hard fence in the industry on AI jobs and not AI jobs. the side of nonAI jobs, there's an overupp and so you're fighting and waiting through like thousands of applicants and so on. On the AI side, there's not enough people.
Like they literally can't find people, so they have to overpay and get those million-doll contracts out to hire whoever.
>> And the other about AGI, I think he was asked, and I'm going to obviously paraphrase and try to remember the best I can, but he was asked like, "How would you even know like what you know if it does achieve that?" And Demis' one of his tests was like if you had the AI model and you just gave it world knowledge up to like early 1900s like 1910 or something would it be able to figure out you know >> relativity like things that Einstein discovered and other things that humans discovered in that time for uh you know after in that time period. Uh that was like one of his benchmarks for like >> yeah it all goes back to what Yan Lun is doing now right which is which is that when we train a model during the training it it absorbs and learns everything but then after training it's the clay has hardened right so you can't retrain it unless you train a new chat GBT version or so on like it has to be an entirely different model so how do you build in mechanisms for a model to keep learning obsessively over and over the same way you and I like hey do you remember a time Joey before cameras and before editing and before color science like you learned all this stuff man so like how do you make an AI do that and that's what Yan Lun is working on is um he's abstracting away the notion of tokens essentially so instead of like data captioning and images and videos being the tokens how do you have a more abstract form of neural network that just relies on vectors and like there is I'm forgetting here but B essentially he is making a a more abstract version of a neural network where it's multimodal by default and it's much more than that it is also able to absorb inputs much more easily into the network. Yeah, what I could do is I'll do a little bit more research and then you can ask me about it on one of the episodes we shoot later.
Yeah. How do you >> I don't use Notebook LM. It It just hasn't served me um high utility yet.
>> You will when you're researching this.
>> Just watch Jan Lun's Okay. The best thing to do is to watch some of his YouTube stuff and go to the Gemini summary.
>> Yeah, that's also helpful. Yeah.
[snorts] Last one about the demo thing that came that I thought was interesting and then also has turned out to be really um ironic in the last 24 hours is he talked about how um they were using uh Genie, the generative world model uh you know that we talked about where you can spin up an a world and navigate through it. Um they're using that to create worlds to test the Whimos in to with like super fringe case studies to see how they would react. Um one example was, you know, if the Whimo's driving in a forest and like a forest fire breaks out and it's surrounded by flames, like what would it do? How would it behave?
and he was talking about these like one in a billion like uh fringe scenarios that just like they're not going to like um you know h that's not going to usually happen on regular training data with in the real world. Uh, I was like, "Huh, okay. Interesting." Fast forward to no >> yesterday. Have you seen this? The Whimos in Atlanta have been going full send into flooded streets where they have now killed or they have they have now stopped.
>> Okay, for a second I thought that was an AI generated image. No, that's terrifying if it's real.
>> No, this is real. The Wayos have been driving into flooded streets in Atlanta, so they need to spin up some more genie models to test out the fringe cases cuz it happened and they drove through it.
>> Sometimes I feel bad for the Whimos, but then they're machines. Who cares?
>> I did take my first uh highway Whimo and uh at the event.
>> I I texted you about my first Whimo fight, too. So, I was on I was in Santa Monica, Joey's neighborhood, and uh if you don't know, Santa Monica is like the unofficial capital of Whimo in LA. And um there was a Whimo next to me, and I was like, "What happens if I mess with it?" And just so you know, this was for educational purposes only. I don't recommend that you actually mess with Whimos. Disclaimer. Okay, so I took my car and I just lightly started to go into its lane. And at first it kind of just slowed down and then I just like, "Huh, it's pretty polite and then it caught up." And then this time I was a little bit more aggressive and I was just like like just did a real quick jerky move and it honked at me and I was offended. I was like, "Hey, don't honk at me.
>> Never [laughter] see the way honk."
>> But I totally deserve that honk. But I did want to check how responsive their uh driving system was and it was quite responsive. Someone brought this up and I've felt the same way where like if I like crossing a street I feel way more safe like or don't think as much if I'm jumping in front of a Whimo.
>> Absolutely.
Most likely stop whereas a human driver might be on the phone or whatever.
>> Yeah.
>> Yeah. Distracted. Yes. The Whimo's got a bunch of sensors. I was at a group I guess I should also probably disclose like the trip was paid for by Google. Uh and it was you know a lot of it was like the builders groups. So there's also a lot of kind of content creators and people that are messing around with the Google products, but I will say they were genuinely interested in [clears throat] like getting user feedback and like making these products better. Yeah. And they're they're they're they are aware of what happens online and what people talk about. So they're always looking, you know, even if it's just bug fix or things, but also just like what use cases there are and how can the models get better and also explaining why they throw a bunch of stuff under Google labs and then kill products sometimes because they're like, "Yeah, sometimes the products suck or they just don't take off and you know, just want to see what works, which sometimes is good and sometimes makes figuring out which Google product to use for your problem."
>> Yeah. I mean, the end of the day, they have to run a business and the product has to have some type of revenue promise, right? So they can't just keep making R&D things happen all the time.
>> Yeah. And then also trying to figure out new things and new tools and uses. Um so yeah other two quick things I want to touch on that were interesting and relevant to our audience. Uh one is Flow. So Google Flow which is their web app that you use to >> Yeah. for video creation. Yeah. It's like kind of their narrow version of like free pick uh for like video creation.
>> The a couple updates there. One was uh obviously the new AMI models built into it. They added a character system where you can create characters and then call them up. I was pegging them with questions of like is there something in the model or when you when they eventually roll out the API because there isn't an API yet um will there be something that like treats characters differently as other reference images?
And basically the answer I got was like no. Well, basically what Flow is doing is just sort of structuring the data a little bit differently under the hood, but there's no special omni model that it's using. Um, so it's basically like anyone could kind of they just built a clever interface and have some stuff helping with character consistency under the hood.
>> Yeah, there is no uh grand master plan to tie everything underneath with like a master workflow. I basically there's no special model that Flo is using that anyone else wouldn't have access to via the API. It's just a regular model just with like uh some extra stuff built on top to to direct it.
>> The master plan thing though that you mentioned reminded me we never finished our avatar talk. The thing with Avatar and comparing it to Sora and Sora characters, >> you can only make one avatar of yourself. You can't make avatars of other people. And I asked I'm like, "Well, is there a plan where you could, you know, if you give me permission like with Addie, can I pull up Add's avatar if he gives me permission and like make videos like we did with Sora where it was a permission process and if you're okay that people could use your avatar in their own creations?" Um, and they said, you know, maybe, but it wasn't on the road map right now. So, I don't really know what I'm like, if that wasn't on the road map, then what are you going to do with avatars if it's just yourself and you can't I can't share with anyone?
>> I don't know. I just think it has huge YouTube implications.
>> Like um a lot of the faceless YouTube channels that make a ton of money, >> if they added a synthetic face, they would make more money.
>> Stuff like that, you know.
>> Yeah. You know, actually around here, that's that's actually not bad. That's probably it.
>> And and with the quality level where it's at, like I don't think they really care that it's not cinematic and it's not meant for our world. I think if it's good enough for like a talking ad YouTube video, it's more than good enough for them.
>> Yeah, I mean, I'm sure they'll prove the quality on it. Um, but yeah, I mean, the use cases of of us of like >> we need we need aging, deaging, we need costume, hair, makeup. Yeah, like our our needs are up here, man. Like nobody nobody's doing that anytime soon. Yeah.
>> Okay, so back to Flow. Uh cool thing they added which you know obviously seems to be a trend with a lot of these tools is an agentic workflow. So there's sort of like a Gemini sidebar and so you can just chat with it and be like hey generate 10 shots of like a man walking in the forest or if you have a bunch of shots the example they gave here of like a daytime scene and you could say okay change all these shots to nighttime and it'll just batch process and generate these new shots just through this basic chat interaction.
>> I love stuff like that. Um gosh, I'd hate to call it like uh agentic. It's probably not, but it's more like assisting you in the creativity process.
You know, >> it's agentic in the sense that like it is trying to figure out like if I just ask something vague like I want a um I went I I think I tested it. I said I want like 10 different shots of the same person walking in the same field. And I just gave it the text. it spun up and generated an image of >> the guy, an image of the the the field >> and then started making the shots using those images as inputs for consistency.
And I just told it the one thing. So it had that egentic understanding of like, okay, to make something consistent for videos, I need to have consistent inputs. I got to make the inputs first because they don't exist.
>> Also like like a real creative would never just go make that storyboard, right? all those old shots and then just go right into production. Like that's their iterative cycle, right? They'll start there, they'll modify shot three, modify shot seven, keep going. And I think that's still so much faster and gives them so much more options than like every little story board from scratch.
>> Oh yeah, exactly. Yeah. If you're just like, copy paste this prompt and redo it, copy paste this prompt, redo it.
It's like, oh no, well, if you just want to batch change something in a >> shots you already have, just tell it and and then have it do it.
>> Obviously, it's still charging you credits for every single thing you do.
So, you will eat through your credits faster. But, um, you know, I think I think this sort of agentic workflow where you're talking to stuff and it's doing >> and setting up things for you is where a lot of these tools keep going. And then the other thing in flow and I'm curious to see how people use this is, and this is sort of a trend with Google IO in general, was building more uh vibe coding and tool tool building capabilities into more tools. So, Flow has this creative tool builder inside Flow that is basically like a very light version of a vibe coding tool set where you just kind of describe what tool you want and it sort of spins up these templates, man.
>> Like a motion tracking one and [clears throat] can you download other people's tools that have been previously created?
>> You can remix. Yeah, there's like a there's already like a like a directory of like um publicly created ones that you can copy.
>> This was my idea for iOS. Like Apple should have rolled out something like this where it's like vibe coding for idiots in iOS. So you can build little apps that only exist on your phone.
>> Yeah. Something. Yeah. It's been up a widget really fast. Um yeah, for Yeah.
This is like that idea, but it's like lives on your computer. Inflow. Um, I'm just curious because it's like, you know, it's a would this turn into the gateway drug of like you mess with something here and then you start going $3,000.
[clears throat] >> Yeah. Anti-gravity. Uh, Google's version of cloud code. Our our creatives thinking uh that way where it's like I want a tool that does something >> that solves a specific problem. It's it's a it's an obvious idea now that we say it, but for them to actually go through it, I'm sure there's a ton of engineering under the hood that's happening for user generated app application on the fly. So that that's a that's a first. Um especially on the creative side of things.
>> Yeah. Uh okay. And then this be my last one, but this one I think was the most impressive or coolest surprise thing that I saw and it's called Google Pix, which is a confusing name because there's also Google Photos, but um Google Pix, it's basically it's turned Nano Banana into like a Canva like editor. Uh, so you can generate an image, you can generate a flyer, you know, with not a banana that has like text and images and design elements. And normally if you did that and then you wanted to change something, you'd have to like redo the whole thing.
And then it sort of does this like again, this is one of those things which is like they're doing something under the hood. It's not a special model, but I'm not quite sure how they figured out how to like inpan blend the >> edges. So it turns something flat into many many layers then then you can individually edit.
>> Yeah, this was like a not the best recorded demo but at the booth basically this from scratch it generated this rooted future flyer thing and then you can see the mouse hovering over every element and it turns every element into something I can click and then I can reprompt to say what I want to change about it >> and then it'll just edit that one section and not touch anything else. But the outputs are like >> cohesive. It was kind of wild. Um, this was another test one I did where I >> I had already prompted it. I think I said I clicked on the text and I said make the text fancier and I clicked on the person on the left and I said put them in like a spaceship in an astronaut outfit and the person on the right in a um I don't a safari outfit. And then here's the output. And so it changed the text is the same. It changed the text.
It kept the person's likeness, but it changed their outfit. Didn't change anything about the >> still there.
>> Uh even that tape thing on the top, the tape's still there. It's still blended in, but you know, it changed the uh underlay of the wording underneath it.
>> Dude, graphics design is getting so easy. I mean, like if you compare or if you complement this with GPT2, which is really good at graph graphic design. So you take, you know, your first pass in GPT2, you generate the thing that you want and you're like, now I want to kind of change it. You bring it into Google Pix >> and you're doing layer by layer in element by element adjustment >> without >> I don't know. I'm curious uh if they'll let you image >> I guess they would. They're like >> also got the complete opposite of what it's doing. Has nothing to do with real photos, per se.
>> Yeah. [sighs] Look, man. They're good. They're >> They're good at making stuff. Sometimes they're not the best in naming stuff.
[laughter] Like, I don't know.
>> Yeah. No, that's that's amazing. Look, we've covered IO in the past, but this feels like a really eventful one and uh a slightly less evil one if we're pivoting into science.
>> Yeah. I also didn't touch I know people are upset because they also announced that they're basically shifting main Google search to like AIcentric search first.
Separate topic, but >> I mean, how are they going to make their money?
All that comes from AdSense, right? They probably already thought about all this.
>> Moving on.
>> Yeah.
>> Runway Olive 2. Runway Olive was Runway's model. That was like one of the original >> edit video to video. Give it a video.
Tell it what you want to change.
>> It was okay.
>> How was your experience with that?
>> It was okay. I thought the um I thought the quality lacked about the promise of ALF was and ALF was one of the first if not the first video to video model predated cling and some of the newer ones and I thought yeah this is absolutely the direction we should go.
We should not have to generate everything from scratch. We could just generate something that's roughly there and then iterate and iterate and iterate on it. Um, some of the generations that I ran had um, like high frequency noise and like blobs and things like that, which I thought, hey, it's just the first generation. It's going to get better. But the promise and the vision for runway, what I thought was really really strong.
>> Uh, yeah, similar boat. I've like I've always had a hard time getting good outputs out of Runway. It's just never really like vied for me. all if everything I tested it would just either change too much about the video, warp too much stuff, too much would be soft or fuzzy. It was just never really usable.
>> Olive 2 definitely feels like a huge bump in quality. I'm not sure what the resolution is, but just even from like lighting composition, you know, that the tone map looks a lot of that is subdued and just looks more and more natural.
>> One of the improvements is you can get more specific about what you want to change. So, it has a slider where before you kind of had to describe your change or you can kind of change the first frame, but this slider lets you pick any frame in the video and then modify that frame. So you can give it a better starting point uh and guidance of like what you want to change and then you can upload images or you could also um change this frame and describe it with what you want to change and then create the first frame with a banana pro >> image reference insertion on the frame that you'd want.
>> Yeah. And obviously you could just download this frame and then like >> Yeah. But that that's like round that really dial in whatever you want to change.
>> Yeah. But yeah, at least they have it built in here. You can modify it here and then change it.
>> So that's the new workflow. Uh I did the same test where I gave it that the video of the um Metropolis sculpture. First image it generated from the first frame kept too much >> lackma railing and balcony. So I didn't use this frame. Uh I ended up just >> getting one where the camera tilted down and used this bottom frame >> as the guidance. And this was the output.
>> Uh yeah, it's so much softer and lacking so much detail than what we saw with Omni.
>> Yeah.
>> And then this was the same. Let me change the shot to a dog or change the dog to a robot. This was the robot that I came up with. I was like, "Sure, cool.
Works."
>> No. And then this is the output.
>> Oh, the leg disappeared for a frame or two.
>> The legs warping. It's still moving like your dog, but not a robot dog. Whereas the Omni model was really the mechanics and the rigging was like a robot dog.
>> Yeah, this this is like they're feeding it through open pose and just extracting the the anchor points and then just attaching it to the new dog. Whereas the Omni model was really doing something different. It fundamentally translated that animation into a completely robotic animation. Yeah. So you see there like those are backward bending legs in the front like that's hard to do anyway and yeah like moving like that that's how Boston dynamic robot spins like on all fours but not a real dog probably won't do that so from an animation quality standpoint nailed it not so much here having said that like it is still really successful in painting cuz you could see like her hand on the leash and just overall crowd work and all that like none of that changes. No, I mean, at least it's not messing with >> Yeah, the shadow swims a little bit. You know, the the left paw, right paw just kind of disappears.
It's It's tough, man, cuz runway's probably been working on this for, I don't know, 6 months to a year. They released at the same time as Omni.
Obviously, Google is way more funded and has a bigger team, and they're going to release a more superior product.
Unfortunate thing is they are coming out at the same time, and we're going to compare the two against each other. I mean, they felt like this is probably a thing where they, you know, it's like, when do you release it? It's like, okay, well, like Omni is coming out and like touting doing the same thing. So, like, we better drop this update. You know, if you had success with Olaf, let me know. I'm always curious cuz I know some people like it and have used it.
And I've just never really >> I'll just say one shout out to Runway that the the fact that being like a small company, they're not a Google or an Open AI or, you know, they're still hanging with the big boys, right? So, Crystal Ball's doing something right.
>> All right. links for everything we talked about at dinoispodcast.com.
>> If you'd like to meet us in person next week, we'll be at AI on the Lot. Joey's going to be a little bit busier than I am, so I'll be walking the show floor.
Uh, come say hi if you see me.
>> Yeah, I'll be buried in the back room, but let me never get to be there. I'll be around at the uh afterparty stuff.
Thanks everyone. We'll catch you in the next episode.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











