安装我们的扩展，即时搜索任意视频内容

Building Voice Agents with Gemini Live API and Agora’s Conversational AI
本站添加: 2026-05-06

203 观看239:24GoogleDevelopers原视频发布: 2026-04-29

This integration effectively collapses the traditional, high-latency AI pipeline into a seamless speech-to-speech experience, setting a new standard for real-time interaction. It marks a pivotal shift where the technical complexity of voice agents is finally replaced by intuitive, low-latency utility.

[00:00:06]Hey everyone, my name is Mason. I'm a field product manager here at Agora and I also work on the developer experience team. Today we're going to be showing you how you can use Gemini 3.1 Flash Live on Agora uh which powers 80 billion minutes of real-time voice and video a month. So let's dive into some code. So you see here we have our agent quickart that is Nex.js.

[00:00:29]We're gonna super simply come over here and copy the URL and we're going to go into our IDE and get clone and then paste the link. Okay, that's going to take a little bit of time to download, but it is done. So, the first thing we're going to do is cd into that project.

[00:00:53]And now we can open it up here in cursor or whatever ID you're using.

[00:00:58]Okay, so the first thing that we're going to want to do is come in and clone the env.local.example and turn it into an actual env.local.

[00:01:06]So, let's do that real quick. Create the file and then I'm going to come down here and then copy and paste this.

[00:01:16]So, once we've turned that into an actual ENV file, there's three keys that you're going to need to grab. Uh, one of them is going to be your Agora app ID.

[00:01:24]The other one is going to be your certificate for that app. And then the final one is going to be your Gemini API key. So I will go over here and show you guys how to do that. So you're going to want to go to agora.io uh create an account. And so I'm going to sign in here. And once you're signed in, it'll take you to the console. Uh you can create a new project at the top here. And let's name this Gemini Demo.

[00:01:50]And you can select a use case and click submit.

[00:01:55]And once you've created that new app, you're going to click on configure. And this is where we will find our app ID and our primary certificate.

[00:02:05]We're going to want to do one more thing before we go back into the app. We're going to want to come in here and engage conversational AI by clicking this button and confirming.

[00:02:17]And now that that's enabled, the only thing that you need other than this is your Gemini API key that you can provide. So we'll go back into the code now and we're going to paste in our app ID and our app certificate.

[00:02:32]Okay, so now that we have our keys pasted in, we're going to want to actually edit the API route. So let's go into app API and then the invite agent route. And the first thing we're going to want to do is import Gemini Live so that we can use it in the code. And then we will scroll down here. And as you see, uh, our SDK usually is set up for the chain pipeline, right? So speech to text, LLM, texttospech. We're essentially going to remove all three of those and just use dot with uh, MLM.

[00:03:06]So I'm going to remove these like that. So now that those are removed, we're going to come over here and do the mlm method. It's going to be new Gemini live like that.

[00:03:25]And then we are going to set the model as Gemini 3.1 flash preview and then choose our API key. And also we're going to set the URL to point at Google's websocket server. And then you can also, these are optional, but you can set the input modality and the output modality as well as the greeting. And so if I click into here, you'll see what the greeting is. And same thing with the prompt itself. We're storing the prompt in a variable and then just calling it down here.

[00:03:59]And so now we're going to save the project and then run npm rundev.

[00:04:07]Once you have everything configured inside of the project and you've ran npm rundev, super simple. Uh just go to localhost 3000 or whatever port you're running it on and we can click try it now.

[00:04:19]And we can give it a shot. Hey Gemini, can you hear me?

[00:04:23]>> Yes, I can hear you loud and clear.

[00:04:25]>> Awesome. Listen, can you tell me a story in three sentences?

[00:04:30]Once a brave little robot set out across the desert to find a legendary power. I >> actually wanted you to tell me that story in German.

[00:04:39]>> Oh, of course. Here it is in German.

[00:04:45]Sorry to do that again, but I meant French.

[00:04:49]No problem at all. Let's try French.

[00:04:57]And let's do let's do one more. Can you do it in Chinese?

[00:05:02]>> Sure. Here you go.

[00:05:12]>> So, you get the point. Obviously, this model is great. It's multimodal, speak 70 different languages, and responds super fast. But what can it do in the real world? Well, so next to me I have a reachi mini robot and we've given it over 70 different tool call emotes that it can call as we speak to it. And so we can kind of test engage and see how well the model does at calling tools. So let me power it up by going here.

[00:05:40]Let's give it a shot.

[00:05:50]Hey Richi, can you hear me?

[00:05:55]>> Loud and clear. How can I help you today?

[00:05:57]>> I was just wondering if you could tell me a story, but specifically in French.

[00:06:05]>> Ah, we I can do that. What kind of story are you hoping for?

[00:06:13]Yeah, I was thinking about something about a car.

[00:06:26]>> Well, truly, honestly, I don't speak French. So, uh, can you go back to English?

[00:06:35]>> No problem at all. Back to English it is. So, about that car story. Should we make it about a race or maybe a magical adventure?

[00:06:44]>> Um, well, I was thinking maybe about a race. Let's do something related to Formula 1.

[00:06:52]>> A Formula 1 race. Brilliant. Imagine a young driver considered an underdog facing the final lap against the reigning champion. The tension is high.

[00:07:02]The crowd is roaring. What do you think happens next?

[00:07:06]So, that was Gemini 3.1 Flash Life Preview built into actual hardware with over 70 different tool calls that are actually controlling the motors inside of here to determine how it behaves.

[00:07:18]There's one slight issue though, which is having hardware like this adds a lot of latency. Uh, so we built another demo that is a food ordering demo and basically you're getting the same thing but without the hardware. So, let's switch into that and give it a test.

[00:07:33]So, let's connect to the voice channel and then dispatch the agent and speak to it.

[00:07:40]>> Hey, can you hear me?

[00:07:43]>> Yes, I can hear you loud and clear.

[00:07:45]Welcome to FoodGora. What can I do for you today? Ready to order some food or are you just browsing the menu?

[00:07:52]>> Yeah, I'm just browsing the menu. I was curious what you would recommend.

[00:07:57]>> Ooh, nice. Well, if you want something really satisfying, the Transformer, which is our pulled pork, is always a hit. Or if you're feeling like ramen, the Token Ramen is incredibly flavorful.

[00:08:10]Are you leaning towards something burger-like or a bowl or something else?

[00:08:15]>> Let's go with the Transformer. Uh, you can add that to the cart. Is there any desserts that you recommend?

[00:08:23]>> Great choice. I've added the Transformer to your cart. For dessert, the diffusion cake, which is our chocolate ganache, is super decadent. Or if you want something a little lighter, the backrop brownie is always a good pick. What sounds better?

[00:08:37]>> Let's go with the back prop brownie. And also, I don't want the transformer. Can I go with something healthier like the llama bowl?

[00:08:47]>> No problem. I've removed the transformer from your cart and added the llama bowl and the back prop brownie. So, you're all set with that or is there anything else I can add or change?

[00:08:57]>> So, if you guys are looking to build a voice agent with as low latency as you see in these demos, whether it's software or it's hardware like this guy right here, uh, Gora powers 80 billion minutes of real-time audio and video traffic every single month. And so, we invite you guys to come try our platform. We have native SDKs as well as APIs for you to use.

#Google #developers #pr_pr: AI DevRel (fka Core ML);#Purpose: Learn;#Video Type:DevByte;

相关推荐

计算机科学

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views•2026-05-28

计算机科学

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views•2026-05-28

计算机科学

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views•2026-05-29

计算机科学

More tests are always better? How to use AI to identify tests that bring little value

Alliance4Qualification

335 views•2026-05-29

计算机科学

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

计算机科学

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

计算机科学

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views•2026-05-29

计算机科学

🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam

Pranavaa-y4y

104 views•2026-06-02

热门趋势

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30