Gemini Omni Video offers multiple video creation capabilities including template-based generation, custom avatar creation with voice cloning, frame continuation for extending scenes, and video editing features like object replacement. However, users must navigate strict daily generation limits (3 videos for Pro users, 5-10 for Ultra users), and complex prompts or edits may fail due to time constraints and context retention issues. The platform requires careful prompt engineering and credit management for effective video creation.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
Gemini Omni Video is Here: Templates, Avatars, & Video Editing本站添加:
With the new Gemini video capabilities, you can do [music] text-to-video, use templates, run edits, and even use your custom avatars. So, let's see how it actually works. On the chat, you can go to the plus, click, and then go to create video, or on the left, you have the option to go to videos. When you are on the video interface, the first thing that you can do is to create a scene from a template. And this works the same as the other tools. If I click on plus and go to image, we get the same template type format. And if I go to Lyra, which [music] is the create music, we get the same thing. We can create something and then use a template. The whole point of a template is that you don't need to provide a massive text description just to get a very specific style. For this example, I will use the one that says "Indie [music] Pastel". If I want this template, I just need to click on it, and it will show up at the very bottom. And now we need to describe what happens on the scene. This is my prompt, and it's a very simple one. It's a close-up of a woman. What's important right here is that she looks directly at the camera and says, "Life is like a box of chocolates." At the very bottom, you can add images from your computer. I will do that in a second. And you can select the ratio, which you can do only landscape or portrait. I'm going to keep it on landscape and run this generation.
All right, so let's see what we get back.
>> Life is like a box of chocolates.
>> Okay, so notice that I didn't mention the keywords Wes Anderson style. That's the whole point of this templates. They handle the heavy lifting for you. Now, not all templates work the same [music] way. Take the montage, for example. If I hover on top of this, it will take different images and then create a montage. I'm going to click on this template, and then I'm going to drag some images. Going to take them and drag and [music] drop. Notice that for this template, the placeholder it's a little bit different. It says, "Add photos for your montage." Before, like yesterday, if I add the images, I don't need to add the description. I just could go right here and submit, because this template only needs uh images. But now, today, if I want to submit this action, I need to add a description. My prompt is going to be create a montage. So again, Gemini and Google Flow and everything about Google is going through a lot of changes, so you can expect some glitches right here or there.
>> [music] >> Now this is animating some of the images that we provided.
Like I mentioned, different templates require different inputs. For example, if I go to the cut paper, which is this one right here at the bottom, if I click on it, it says what text do you want animated? So you need to provide a text [music] for this one. If I go to this one, it will require for you to upload an image of yourself or a person. Okay, so I think you get the point in how the template works. So I'll leave you the rest of this for you to explore on your own. I wish I could sit here and show you every single template, but I can't.
And it's due to a very strict limitation, a restriction that we get on Gemini. I will explain that when I get to that point. Beyond templates, you can also build completely from the scratch using standard text prompts. This is something that we could do before, but this latest iteration feels much better.
The workflow is more straightforward and the results are a step up from previous versions. That's for sure. Just to give you an example, let's say I want a user-generated content type of video, like a UGC type of style. I want for a woman to eat ice cream and drop some lines. As before, we can add the references. You can click on the upload on this button or you just can drag and drop your images. I'm going to drag and drop an ice cream and a woman. And I would just need to text to video to prompt what we want. This is my prompt.
I'm not going to go through the whole prompt. It's a very simple one. It's just a woman eating an ice cream UGC style. What's important is that she needs to drop a couple of lines, like "Whoa, this ice cream is the best." And she eats the ice cream and after a tiny pause, she needs to say "Tasty. You have to try it." I'm using this very specific example because most models fail on this one. There's not enough time to eat the ice cream and then drop a new line. So I want to see how this model handles this request. Fingers crossed. Let's run it.
Okay, so it's here and it took a while.
Now, before we see it, I can almost predict that this will have uh some tiny problems. I'm going to tell you why in a second. Let's just see it.
>> Wow, this ice cream is the best.
Mhm.
Tasty. You have to try it.
>> All right, so I've tried the generation with previous versions of Gemini, and I got to say this one it's a little bit better. I mean, nonetheless, it has some issues. First, when the video starts, she starts, you know, saying something, then there's a tiny uh cut.
Then, the spoon disappears. And this is because I'm prompting for too much. I'm asking the model for her to say something, then eat, and then speak again. Since Gemini can create up to eight or 10 seconds, there's not enough time to do all of this. So, it always fails. The solution would be prompting for something else like maybe she tries the ice cream and then she drops the line, or maybe using a multi-shot type of instruction where we have different transitions. This is where things get really interesting. One of the flagship features of this update is the ability to deploy your own avatar. Gemini will take your likeness, your voice, and then you can use it uh to create a video or images. But, there are a few important gotchas that you need to keep in mind here. First, this feature is currently on beta. It's only available in some countries, and it's not a super wild feature. You will see why in a second.
And secondly, the avatar that you create must actually be you. So, you cannot or you should not create an avatar of someone else. The actual avatar setup requires you to record a quick selfie video on your camera on your phone following some on-screen tracking instructions and reading some numbers aloud. And all of this just of course to track or get your voice. After you create your avatar, that same avatar of you, of course, stays under your account. So, it's almost like uh creating a password.
[music] Once you have the avatar, you just need to go right here to the plus, click, and then you can go to more uploads, and you will be getting it right here. If not, what you can do is do at, and then it will show the avatar that you created. If I want to select me as an avatar, you just select it, and it will show up right here. Now, I want to do the same thing that before. I'm going to go right here, just delete everything, and paste the same UGC video, or just prompt that we used [music] a minute ago. Of course, you need to go to the prompt and use your avatar, and I was using a woman, so on this prompt I've replaced everything everywhere where it says she, and I've replaced [music] it for a he. So, it is the same prompt that before. The only difference is that I'm using, uh, myself as an avatar. Now, I'm going to drag and drop the ice cream. There we go. So, now we should be getting something pretty similar to the previous generation with the [music] avatar. And that's pretty much it. Once you have the avatar, it's just like working with a normal reference. I'm going to run the generation. Right, so it's here, and I forgot to change it from 9:16 to 16:9, [music] which, of course, it creates a problem. But, that's fine. Let's see what happens. Wow, this ice cream is the best.
Tasty.
You have to try it. Right, so we are getting the same problems than before, because, of course, we need to change the prompt, but nonetheless, it the generation is pretty good. Now, wow, this ice cream is the best.
>> the character looks like me.
Tasty.
>> The voice cloning it's pretty good. It's not a 100% match, because the process of creating the avatar requires you to read some numbers at loud. So, you don't need to speak, uh, for like 30 minutes, uh, which is what you do on 11 Labs. So, you don't get the super 100% faithful type of voice. But, nonetheless, I >> Tasty.
>> got to say, >> You have to try it.
>> that it's pretty good. The second thing I've noticed is that when you take the avatar, you don't speak. You just stay very still, and you move your head, and that's [music] pretty much it. In this case, this uh on this video, this character is uh talking. So, I see this, and it really can tell the difference.
And it makes sense. The AI model has no idea how I move my mouth. I mean, in real life. Unless, I really think that this is great for images and videos where the audience has no idea who you are. So, in conclusion, is it perfect?
Is it flawless? Of [music] course not, but nothing is nowadays. We could not do something like this on Gemini or Google Flow, but now we can. [music] And I'm pretty sure that this will get much better. Inside the same chat or by uploading an existing clip, you can tell Gemini to perform a frame continuation.
For example, I will use the UGC video that we've created a minute ago and ask to extend it. I'm going to use this prompt. It's a very simple prompt, so we can read it. I'm saying, "Extend this scene." Now, if I say, "Extend in this scene," Gemini is smart enough to know that we are talking about about this scene and not some other video. It will take a look at the context, take the last frame, and create the continuation.
My instruction for the continuation is going to be, "She says, 'Always remember, stay in school, kids.'" Let's run it and see what we get back. All right, so we get it back, and I can already see that this is a bad generation. I'm going to play it back, and you will notice that this is not a continuation.
>> It's just playing >> school, kids.
>> a part of the last video, and then it's, you know, dropping the next line, which is not what we prompted for.
>> cream is the best. Always remember, stay in school, kids.
>> So, it's messing up with the previous prompt and the context of the previous video. Now, of course, I cannot run it again, and I will tell you why in a minute, but this is something I've noticed and I've seen around um of course, on different videos, and some other users reported the same thing, that when you create or you want to run a continuation, it has a lot of problems. Now, I got to say that I've tested this with the same example yesterday, and it worked pretty well.
So, this means that if I run the generation again, I should be getting something correct type, something [music] usable. But, there's a problem and that problem is the limits that you get with Gemini. If you have a pro account, you can create up to five per day or maybe even less. And if you have an ultra account, you can create like 10 or something like that a day. Now, there's a way you can check this.
>> [music] >> If I go to the left side and then go to settings, if you go to usage limits, right here, it will let you know how much you used and how much uh you know, you have remaining. So, I can create one more video and this will reset at [music] a very specific time. And then and only then you can create three more videos. [clears throat] And on top of that, you have a weekly limit. So, if you want to create videos on Gemini, you really need to be aware of this that you cannot create a lot of videos just like you do on Google Flow. If you pay a pro account or you have an ultra account, it is much much better to go to Google Flow and use your credits there. Right here, it's, you know, very limited. Just to give an example, to record this video, it took me like the whole day. I had to record, create [music] three videos in the morning, then wait eight hours, and then record the next [music] three videos. That's why I cannot go right here and create one more video to, you know, fix the bad generation. It's because I have uh I don't have any limits left. [music] It is the way it is. I mean, if you want to create videos, you can do something simple with Gemini, but if you want to create pro videos, something, you know, a bit more polished, maybe try Google Flow. Okay, so now the limits topic is just >> [music] >> out of the way. Gemini now has a deeper understanding of what's happening inside of a video or an image. So, you can upload a video to the chat and ask Gemini to run some changes. Just like extending the videos, as you saw, >> [music] >> you need to keep it simple. If you provide a complex video and ask for complex edits, you will be getting a bad generation and then you run out [music] of credits or, you know, limits. I have this video of a man walking and holding a coffee cup. I want to upload this video to Gemini and ask [music] it to change the clothes of the man. I want to change the design of the coffee cup and I want to add to the subject a pair of sunglasses. First, I'm going to drag and drop the video so Gemini can read it, understand it, we can, you know, provide the edits. I have this Ray-Ban sunglasses and the red coffee cup.
[music] If I hover on top of this, I can see that the name of the image is red coffee cup [music] and then the other one is Ray-Ban sunglasses. And this is my prompt. Replace coffee mug in the video with a red cup of coffee. He is wearing Ray-Ban sunglasses and he's wearing a plain black shirt. Now, if I say red cup of coffee and the name of the image is red [music] cup of coffee, Gemini will understand that we are talking about the same element. Right?
So, fingers crossed, going to run the generation and while this is going, I really hope that I get something useful.
This is the last video I can create and it will reset in about 5 hours. [music] Okay, so it's here and the first frame looks looks great. So, let's just play it back and Fine. So, this was great.
Every time I try to edit something on Gemini and this new model, it did a pretty, pretty good job. Maybe extending the video is where it failed [music] the most. So, is this perfect? Absolutely not. But it is a massive progress and this is only going to improve over the upcoming months. One thing is for sure, you really need to be careful with your generations because you will constantly run out of limits. Even if you're running an ultra plan, you will run out of generations pretty quickly. So, [music] again, it's much better to take it to Google Flow if you want to create videos. Okay, so that's it. Hopefully you liked this video and you learned something new and if you did, hit a like, subscribe [music] and leave a comment. Don't be a stranger.
>> [music] [music] >> Woo!
相关推荐
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











