Voice agents can be built by integrating three core components—speech recognition, language processing, and voice synthesis—into a single unified API, which simplifies development by eliminating the need to stitch together multiple services and their respective documentation.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Claude Code + AssemblyAI: The Easiest Voice Agent You Can Build!追加:
Building a voice agent used to take a team and a month. I just built one in an afternoon in clog code with a single API. It listens, it talks back, and it books a real meeting on my calendar while I do nothing. So, let me show you the exact prompts, the workflow, and what it costs to run. So, let's get into it. Let's start things off with why voice agents have been annoying to build. The normal way, you're stitching three things together. something to turn speech into text, a language model to think about what was said, and something to turn that answer back into a voice.
Three services, three sets of docs, and three bills. What Assembly AI did is put all of that behind one connection. Audio in, audio out, and they handle the speech recognition, the model, the voice, and the turn detection in between. That's the reason this build is short. I'm not gluing anything together myself, so there's less things I can break and get wrong. So, what makes this whole process easy is that Assembly AI, the company behind this API provider and the sponsor of today's video, actually wrote a system prompt that basically allows you to pair your AI coding agents directly to the API with ease. And it's the stuff that Cloud Code would otherwise guess wrong. The exact audio format the API wants, what specifically breaks in a browser, how the tool calling has to be structured, the defaults for a turn detection that actually feels human. So before I ask cloud code to build anything, I paste the entire prompt in as the system prompt. So we'll click on this here and it'll take you to this part over here which is the copy the integration prompt and you can see that this is going to be useful for oneshot agent sessions and then the whole full integration prompt is over here and all you have to do is copy this to your clipboard. So this is going to be the difference between cloud code writing code that runs on the first try and cloud code confidently writing code that's subtly wrong. you're handing it the rules of the API upfront instead of hoping it remembers them. So, if you take one thing from this video is that when a tool provider like Assembly AI is giving you a prompt, make sure you paste that into your coding assistant. And then what I'm going to do is after I've taken that system prompt from the API docs, what I'm going to do now is paste that prompt into cloud code and help me set it up. So, it's very simple. All I did was copy something and I'm just going to press paste over here. So, let's paste it and then press enter. So you can see the system prompt is telling cloud code that I'm here to help you integrate the API and it notices the directory is empty. So we're going to start from fresh. So before it writes any code, it is going to understand what I'm building with so you can recommend the right approach. So I'm going to tell that I'm building a browserbased voice agent using assembly as voice agent API and it's going to be an intake assistant. So it's for my consulting business. So it's going to greet the caller, find out what they're trying to build, ask a couple of qualifying questions and if it's a fit. So then I'm going to press enter. So you can see that it understands my request and is then is going to fork the official quick start from the actual API docs based off what I told it. You don't have to do that step because you've also given this system prompt but this is just a good backup for it. And then it's asking me can I get access to the API browser integration? I'm going to press allow as well any other information it needs from the API docs. I'm going to press allow.
So now cloud code understands basically what it needs when it comes to API side of things. Now it's trying to figure out how it wants to structure my API key handling. So it has given me three options. I can paste a token at runtime.
So this allows me to have a one HTML file. Then I can mint a token with the curl word liner or one HTML file plus a tiny token server. So I'm going to do this option. I'm going to press enter.
So now cloud code is trying to address the second part which is the tool call.
When the agent decide that the person I'm talking to is actually a fit for my business and it wants to book a call, what should actually happen? So this tool call feature, this is actually an instruction from the system prompt. So it understands that okay, this is going to be a tool call. Let's not break the voice AI agent. So then cloud code is trying to determine what it needs for the tool call. So I can collect and confirm like no calendar integration.
There's a real calendar link or there's a real calendar API, right? So I'll do this one and then just press enter. And then it's trying to determine what data region or base URL should this use the US the default one or the EU data residency. If you are in EU, you have to use this. Obviously, I'm not. So, I'm going to use this one. So, then it's asking me for my API key. So, I have one. So, I'll click on that. Press submit. So, now it's asking me which calendar should book the call to. So, I'm going to just choose the top one because that's the simple one. So, I'll click on that. Cal.com. So, now we have our final plan basically completed. So, based off the system prompt instructions we got from assembly AI. This is the plan it has created. So, a browserbased voice agent. We selected a region and then it's not going to select a model.
is going to be managed by the voice agent API. So this is where assembly AI comes in and the default voice Ivy which is a feature from assembly AIS is going to do that and it'll swap to James or winter if you want a warmer and the voice ids are exact string. So so these are all instructions from the system prompt we pasted in. So that's it guys.
That's basically a whole plan and then all we have to do is it's asking us ready to proceed. I'm just going to press enter and that's all we had to do.
We had to paste in that system prompt and then answer a couple of questions.
We have a whole plant and now cloud code is going to go execute. So one thing you're going to have to give cloud code is obviously your API keys. So if you go to your assembly AI dashboard and you go to your API keys, I have one over here.
So copy and paste that in. And then also we're using ky.com to manage our booking. So this basically allows our voice agent to number one read our calendar as well write to it, figure out what dates are available based off what the client wants and then actually write to our actual calendar. So you would go to your app.call.com call.com and then go to your settings and go to your developer section and go to API keys.
Then copy API key that you already have or create a new one and that basically allows cloud code to communicate and then this is the generation that cloud code has created. For full transparency, I gave it some design updates as well.
The original feature was very bare bones, but I told it like I'm universe of AI, so try to match that. So it kind of did that and you can see it named itself the Nova. So now let's start talking to our AI agent and let's see if it's actually able to book a meeting for us. So I'll start the call.
>> Hi, I'm Nova from Universe of AI. What are you looking to build or automate with AI?
>> Hi Nova. I'm trying to build a simple workflow automation for my gardening business, allowing me to book people, you know, for cleaning the grass and things like that. Can you help me with that?
>> That sounds like a great way to save time to see if we're a good match. Do you have a specific timeline in mind for this? And do you have a budget or a technical contact we should work with?
>> Yeah, I think I want something up and running by maybe end of June. And the technical contact would be me, I guess.
>> That sounds like a great project and definitely something we can help with.
Would you like to book a 30inut consultation to go over the details?
>> Uh, yes, please. Let's do that.
>> Great. Could you please tell me your name, your email address, and a date and time that works best for you?
>> My name is Joe. My email address is [email protected].
And June 17th, I think I have some availability.
>> June 17th sounds good. What time on that day would you prefer?
>> 400 p.m. works on my end.
>> All right, Joe. I have you down for a 30inut consultation on June 17th at 400 p.m. America per Toronto time. That is June 17th at 8:00m UTC. I will send a confirmation to [email protected].
>> Thank you.
>> You are very welcome, Joe. We look forward to speaking with you soon.
>> So, that was not a bad session at all. I was able to talk to my voice agent for 2 minutes. Now, the real test is if it was actually able to book something on my calendar. So, if I go to my bookings, so these are some of the test ones I did.
And this is the one I just booked right now, which is the 30inut meeting between Universe of AI at June 17th at 400 p.m.
And then what's really cool is that this is synced up to my Google calendar. So I can join the call video and it actually send the confirmation to [email protected] if there's an actual [email protected]. They probably got a confirmation. They're probably confused what's happening.
Sorry, Joe. But you guys can see that's how simple it is to build an AI voice agent with the power of Cloud Code as well Assembly AI's API. Now, the part that you might be wondering about is what does it cost to run this? Assembly AI's voice agent API is $4.50 an hour flat. And that's the whole stack on one connection. The speech recognition, the model, the voice, all of it. You pay for the seconds the call is actually happening, nothing else. And just to compare that, OpenAI's real-time API is $18 an hour, four times the price, and it bills you per token of audio. So, your cost moves around, and you don't really know the number until the bill actually shows up. Deepgram lands at the same $450, but it builds component by component. So again, you're doing math.
Assembly as simple pricing model makes it so easy to understand that a,000 calls cost exactly what a,000 calls should cost, and you can work that out before you run a single one. The thing the price doesn't tell you is what you're actually getting at the same rate. same one second response time as OpenAI, speech aware turn detection, and a couple of things the other providers don't have, like reconnecting mid call if the connection drops, and changing the prompt or voice without tearing down the session. Now, I'll be straight about the limits. 450 an hour is great for an agent that runs in burst, taking calls when they come in. So, if you're planning something always on a huge scale, sit down and do the hourly math first, same as you would with any usage based tool. But for building, testing, and most of your workloads, Assembly AI is cheap for what it does. But that's the whole build. Cloud Code, one API, and a couple of minutes, and a voice agent that actually does something about 450 an hour to run. So if you want to try it yourself, everything you need is linked in the description below. Your API key, the docs, and the cloud code system prompt I use. You can start with the prompt, and it's part that makes this whole process easy. So make sure you guys don't miss out on that. And if you build something, let me know in the comments what you guys build and what you guys are working on. But that's it for today's video. Make sure you guys are subscribed to the channel. Follow our new newsletter as well at universeai.behive.com as well as subscribe to the main channel World of AI and support us on X by following the Universe of AIZ as well.
Until then, I'll see you guys in the next video.
関連おすすめ
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Are AI deceiving us? | Roman Yampolsky, Gleb Solomin #AI #science
shortsGlebSolomin
1K views•2026-06-02
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
AI Doesn't Create Bias — It Inherits It
UXEvolved
176 views•2026-06-01
Distributed Inference Challenges Explained #shorts
alexa_griffith
466 views•2026-05-31
[한글자막] OpenAI @ Replay 2026 | OpenAI는 Codex로 개발 방식을 어떻게 바꾸고 있을까요?
TechBridge-KR
1K views•2026-06-03
Starting & Test Driving JAKE'S Abandoned BUS from Subway Surfers | POV Restarting
RestartGaragePOV
4K views•2026-06-04
Building the Future of Voice-First Sovereign AI: Sarvam & NVIDIA
NVIDIA
3K views•2026-06-01











