Install our extension to search inside any video instantly.

Build Voice Agents from Scratch
Added: 2026-05-12

178,664 views335:18vizuaraOriginal Release: 2026-05-06

Dr. Panat provides a rigorous, high-signal guide that addresses the actual engineering bottlenecks of real-time voice AI rather than just surface-level integration. It is an essential resource for bridging the gap between academic theory and the practical complexities of low-latency deployment.

[00:00:00]Hi, my name is Dr. Sridhar Panat. I am a PhD from MIT and a graduate from IIT Madras. I'm also one of the three co-founders of Visual AI Labs.

[00:00:09]Today, I am incredibly excited to announce the release of our new live boot camp build voice agents from scratch.

[00:00:17]This boot camp is starting on 12th of May and it will last for 2 months. As you know, voice agents are rapidly increasing in popularity. That is because voice is becoming one of the most convenient ways in which you can interact with a large language model.

[00:00:33]For those of you using speech-to-text tools or text-to-speech tools or for those of you who never type anything in ChatGPT but rather use Whisper flow or similar tools, you know the importance of voice as a modality.

[00:00:47]Many estimates suggest that voice is probably a trillion-dollar opportunity as an interface to interact with AI models.

[00:00:58]The overall market size of voice agents are exponentially increasing. You might be familiar with companies like Eleven Labs, Deepgram, Cartesia, Woppy, etc. These tools allow you to not just build voice clones but also they are one of the ways in which you can build voice agents to which your customers can interact.

[00:01:22]Voice agents sound easy. You may feel like you just need a speech-to-text conversion, you need a LLM reasoning layer or intelligence layer and then a text-to-speech conversion layer. It sounds very easy, right? But how about latency?

[00:01:37]How about interrupt-driven conversations? How about managing silences? How does the large language model know that when a silence is there?

[00:01:45]Is it that you have finished your sentence or is it that you are just thinking?

[00:01:50]When you are interrupting a conversation, how does the large language model know that the output has to stop immediately so that the input can be taken in to create the next output?

[00:02:02]So, all of these are very serious concerns while building a voice agent which can work in almost real time.

[00:02:08]In this boot camp, we'll be exactly learning how to tackle these challenges and build a beautiful voice agent which you can deploy in pretty much any setting. It doesn't matter what the domain in which this voice agent will be used in. Many prominent figures in AI including Sundar Pichai or Jensen Huang have mentioned about the importance of voice.

[00:02:31]Typing is often very difficult but more importantly, it's very difficult to interact with an intelligent layer always in a chat mode. For example, I'll tell my personal side as well.

[00:02:43]I very much like interacting with voice agents when I'm taking a walk. I like to interact with my large language model by just speaking to it so that it can speak back to me.

[00:02:54]And this almost feels like a conversation with a friend especially if the latency is very low and if the conversation which is happening with the large language model is very meaningful and thoughtful.

[00:03:07]And this is going to be the future.

[00:03:09]Not just for customer-facing applications but also for companies' internal applications.

[00:03:15]So, if you are a student or industry professional looking to master the voice as an interface to interact with large language models or somebody who is truly keen about knowing about voice agents and building one from scratch for yourself and deploying it later in pretty much any domain, this is a boot camp for you.

[00:03:33]As part of this boot camp, we will be mastering several different parts of the entire end-to-end voice agent pipeline starting with speech-to-text, starting with the LLM reasoning layer and also real-time text-to-speech streaming, dealing with latency, dealing with interruptions and dealing with pauses.

[00:03:53]We will be meeting for 8 weeks. Every week we'll be meeting on Tuesday for around 2 hours.

[00:04:00]Throughout this boot camp, there will be hands-on assignments and we'll be coding things from scratch.

[00:04:08]So, you will not just be using API calls but you will also be building things to put things together using Python code.

[00:04:16]At the end of this boot camp, you will have something ready which you can actually ship meaning it could be an AI receptionist or it could be a scheduling agent to which you interact through voice. It's can be assistant or therapist or an assistant which stays on your desktop or a meeting assistant or anything. At the end of the day, it will be a voice agent deployed for a specific purpose. This boot camp has primarily three options. You can attend the full boot camp all the foundational material.

[00:04:46]There is a research starter kit which will give you enough detail material including code files and research road map to kickstart your research journey. But in addition, if you want one-to-one research mentorship on building voice agents from scratch plus doing additional research on top of it in various sub domains within voice agents, you can also go for our mentorship plan.

[00:05:09]You can check out the details on our website voice-ai.ai.

[00:05:13]The boot camp starts on 12th of May.

[00:05:15]I'll be looking forward to seeing you in the boot camp.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30