Install our extension to search inside any video instantly.

Create a Voice Recognition AI Agent | LiveKit Tutorial
Added: 2026-05-08

155 views2125:17ThanhYDavidOriginal Release: 2026-05-03

While the speaker diarization is technically impressive, storing biometric signatures in a simple JSON file is a classic case of high-intellect over-engineering that ignores basic security common sense.

[00:00:00]In this video, I'm going to show you how to create a voice agent that is able to identify, save, and remember your voice through voice recognition. That means the voice agent will be able to tell who they're talking to just by your voice only. Hi there, my name is Alexa. What is your name? I'd love to know so I can address you properly. Hi, my name is David.

[00:00:20]Hello David, nice to meet you. I'll remember your name for our future chats.

[00:00:25]If there's anything you'd like to talk about or need help with, just let me know. Hi, here's Laura. Nice to meet you, Alexa. Hello Laura, nice to see you again.

[00:00:34]How can I assist you today? So now you can see it was able to identify two different voices. Mine was labeled as S1 and the second female voice was labeled as S2. And now if you restart this conversation again, it will be able to recognize and remember the voices.

[00:00:51]Hi, I'm Alexa. Nice to meet you. What's your name? Hi, do you recognize my voice? Hello David, nice to see you again.

[00:00:59]I recognize your voice because you have identified yourself as David in our previous interactions. How can I assist you today? Hi, can you tell who I am just by my voice? Hi Laura, nice to see you again. How can I help you today? Can you look up the current Tesla stock price?

[00:01:13]Sorry.

[00:01:15]Only David can use the web search tool.

[00:01:17]If David is here and wants me to look it up, just let me know.

[00:01:21]Can you look up the current Tesla stock price for me, please?

[00:01:25]David, the current Tesla stock price is trading around $419.

[00:01:30]If you want, I can provide more details or updates.

[00:01:33]Just let me know. As always, we are going to develop the voice agent in Python and we're going to use LiveKit as a basis. So let's get started.

[00:01:41]So first off, we'd need to start up with our basic LiveKit voice agent and I already have it right here. This is the entire code. And if you're a regular viewer on this channel, you've already seen this multiple times how to set this up and how to get this going. So instead of doing that, I'm going to jump right into the voice recognition functionality. If you want to have a step-by-step instruction how to set this up, I'm going to link a video in the video description down below how to get this going. Alternatively, you can also use the LiveKit voice agent quick start manual. For this, you can go to the browser and then you can simply Google LiveKit quick start and then you can click on the second link over here.

[00:02:22]This will give you the manual, the step-by-step manual how to set the voice agent up in less than 10 minutes. All you need to do is simply follow the instructions that are laid out in this manual.

[00:02:34]After you did that, you will end up with a voice agent project like this. And then you can simply try this out by going to the console and then and then using this command over here. By the way, I'm using VS Code. You can also use any other IDE like Cursor, for example. Now we can simply try this out by using this command and press enter.

[00:02:59]Hi, can you hear me?

[00:03:03]Yep, loud and clear.

[00:03:05]What's up?

[00:03:08]As you can see, this already works and now we can start out with the voice recognition functionality.

[00:03:13]So in order to make the speech recognition functionality work, we're going to make use of a very particular speech-to-text model that comes from a company called Speechmatics.

[00:03:23]If you don't know what a speech-to-text model is for, what we're using here for our AI voice agent is a so-called speech-to-text and text-to-speech pipeline. So the way this works is that if you look here at this code, we have a speech-to-text model that translates our speech to text. That text is then interpreted by a large language model, which is basically the brain of the AI agent.

[00:03:48]This large language model gives us a response back in form of text and then that text is also translated to speech again. So now it will talk to us back.

[00:03:57]So that's how it works.

[00:03:59]And in our case, we're going to replace this speech-to-text model with a model that comes from Speechmatics.

[00:04:07]And in order to use that, we first need to go to the website and then here we can simply Google Speechmatics.

[00:04:19]Then we can click on the first link.

[00:04:23]And as always, you would need to create an account here. I already have mine, so all I need to do is I need to sign in here.

[00:04:34]Once you have created your account, you will now have access to this dashboard over here. And over here, you can now create an API key to make use of their services.

[00:04:46]Then you can click on this button over here to create your API key.

[00:04:50]You can simply give this a name. Let's call this, I don't know, Alexa or something.

[00:04:55]Generate new key.

[00:04:57]Then you can copy this.

[00:04:59]Now we can go back to VS Code.

[00:05:04]And over here, you will now have this.env.local file where you can simply write here or add a new line and you can call it Speechmatics.

[00:05:17]I think it's like this. Speechmatics API key, like that.

[00:05:24]And then you can simply paste your API key here.

[00:05:28]Like this and then you can save it.

[00:05:32]All right. So now we can simply go back to the agent.py file over here. If you don't know where it is where that is, that's in within this source file over here.

[00:05:45]And now we also need to install a dependency in order to use this. For this, we can Google LiveKit Speechmatics.

[00:06:00]And then we can click on this link over here.

[00:06:04]And here we have an installation guide how to use Speechmatics. And here we have an a command that we can simply copy in order to install Speechmatics.

[00:06:13]You can click on this button to copy it.

[00:06:16]And then we can go back to VS Code. We can go to the terminal and then we can simply paste it here.

[00:06:23]And then we can simply press enter to install this.

[00:06:30]Now it's installed and now we can simply use Speechmatics as is.

[00:06:35]Now we need to add Speechmatics to our code and for this, we have to first of all import it into our implementation.

[00:06:42]Here you can go to this section over here and then you can add a new line and then you can write from LiveKit.plugins import Speechmatics.

[00:06:56]Then we have to import other libraries as well.

[00:07:04]Which is from LiveKit.plugins.

[00:07:08]Speechmatics import turn detection mode and also speaker identifier.

[00:07:18]All right.

[00:07:19]And then we have to scroll down to this section again where it says speech-to-text STT and then we have to replace this model with the model that we're going to use from Speechmatics.

[00:07:31]And for this, I already prepared some code here.

[00:07:34]Which is this code.

[00:07:38]And don't worry, I'm going to share the whole code of this project in the video description down below so you don't have to type this. But this particular model from Speechmatics is able to identify voice signatures and it does that through a process called diarization, which is enabled here in this section.

[00:07:58]And if you want to have a more detailed explanation how diarization works, you can always go to their on that in more detail. But for now, we can already save this by pressing control S.

[00:08:12]And before we can start this, there's one more thing which we have to do, which is we have to go to this line over here and then delete the content of this bracket. Otherwise, we won't be able to connect to this. So we can simply delete this and then we can press control S again to save this.

[00:08:28]And now we can go to the terminal to use a the following command to start this up in developer mode, which is UV run source agent.py dev. And now we can simply press enter to start this up.

[00:08:44]Now that's up and running and now we can use the LiveKit playground to connect to our AI agent and try this out. For this, we can Google LiveKit playground.

[00:09:02]And then we can select the first link over here.

[00:09:05]And here's where you would need to select the project that you created for this AI agent. In my case, it's tutorial. In your case, it's probably something else. So simply connect to your project.

[00:09:16]And before we try this out, what I will do is that I will speak in or I will talk to the AI agent in my normal voice and then I will try to imitate a different voice and see if the AI agent will be able to differentiate between these two voices. So in order to connect to this, we can simply click this button over here.

[00:09:38]Hi, can you hear me?

[00:09:43]Yep, loud and clear.

[00:09:45]What's up?

[00:09:46]Hi, my name is David.

[00:09:50]Hi David, nice to meet you. What can I help you with?

[00:09:53]Hi, my name is Jessica. Nice to meet you.

[00:10:02]Hi Jessica, nice to meet you, too.

[00:10:04]Sounds like I've got a full room. How can I help you both today?

[00:10:08]Do you know who is speaking right now?

[00:10:14]I can tell there are different voices, but I cannot reliably identify who is speaking at any given moment. You can just say your name when you jump in and I will follow along.

[00:10:25]All right. So, as you can see, um the AI agent was able to identify that there were two different people speaking. Um my normal voice was identified as S1 and my very poor attempt of imitating a woman was identified as S2.

[00:10:43]Currently, it's only able to identify or differentiate between the voices and not able to assign names to it, but that's what we're going to do in the next step.

[00:10:53]So, now if you want the AI agent to be able to remember our voices correctly and also remember our names correctly, there are a couple of steps which we need to do in order to make this work.

[00:11:04]Currently, our AI agent is only able to distinguish between different voices. If you remember correctly, it was able to identify my normal voice as S1 in the chat and a different voice as S2.

[00:11:17]Um S just stands for signature, voice signature, which is basically just a unique identifier for a specific voice.

[00:11:25]So, what we need to do is that we would need to save that voice signature somewhere so we can reload it into the AI agent when we restart a conversation so it remembers that voice. And in our case, we're going to save that in a file.

[00:11:38]But not only that, what we also want is that if a user is mentioning their name, we will also want to save the respective name of that voice of that user in that file as well. So, it not only does the AI agent remember the voice, but also remembers who that voice belongs to. So, we need to create a function that saves that in a file and also reloads that at the beginning of a conversation. So, that's what we need to do. Okay, so now back in VS Code, we can now add this new functionality and for this we need to create a new file and you can do this over here. You can click on the source folder and then you can click on this button to create a new file in it. We're going to call this tools.py.

[00:12:19]And in here, I'm now going to I'm going to paste some code that I already prepared.

[00:12:24]And again, don't worry. I will leave a link in the video description down below to this entire project so you don't have to type this yourself. You can simply download it.

[00:12:33]Uh but for now, I can actually remove this because I don't need it.

[00:12:37]So, as you can see, we have here a function called save speakers. And this function allows us to do the exact thing that I just explained to you, which is save the information into a file.

[00:12:51]And here we will pass to it the voice signatures in this list and also the label ID, which is S1, S2, and so forth, which you also saw in the chat.

[00:13:01]And here the respective name of that respective voice.

[00:13:06]So, what we will do is that we will simply replace that ID as one, as two, and so forth with the actual name. So, that's what we're going to do here. And then we will simply save that name with its respective voice signature, which is here called speaker identifier.

[00:13:25]And then we will save that into a file here that is called speaker.json.

[00:13:32]So, that's all it actually does. Next, of course, we also need a function that is able to load that information back into the AI agent from that file so it remembers the voices. So, here I already prepared this as well.

[00:13:45]And here you can see that specific function that is called load known speakers. This is actually also very easy. All it does is is it simply looks up if that file exists, the speaker.json.

[00:13:58]And if it does, it will simply open that, it will load that, and then it will return the specific information within that file. Now, we have to pass that information that is returned here to the speech-to-text model from Speechmatics. For this, we can initialize it again here.

[00:14:15]This is the same code that I used to replace the speech-to-text model before.

[00:14:20]And now we can load that information from the function. We can call this known speakers equals load known speakers. So, we call this function.

[00:14:33]And then we can pass that information to the speech-to-text model.

[00:14:38]Known speakers equals known speakers.

[00:14:42]All right. So, now the speech-to-text model has information about all of the past voices that interacted with. Next, we need to add a so-called function tool and for this, I also prepared this code already, which I just paste it here.

[00:14:56]And this is the final puzzle piece for this to finally work. And a function tool is basically a function that the AI agent can call by themselves based on a description that you give it to. Here's the description. And what it says is that this function is invoked when a user with a temporary speaker label like S1 introduces themselves and provides their name. Sounds really complicated, but what it means is that if the AI agent hears a new voice and you introduce yourself by saying, for example, "Hi, my name's David." it will call this function.

[00:15:32]It will take the label like S1 or S2 and also your name, David, and it will also get your voice signature from the text speech-to-text model, and then it will use this function save speakers to save your name and your voice signature into the file.

[00:15:52]This is the function that we discussed earlier.

[00:15:55]This one over here.

[00:15:57]And that's it.

[00:15:58]And now we're finally done and now we can save this by pressing control S.

[00:16:04]Now, we need to go back to the agent.py file because we have to change a couple of things here before this can work.

[00:16:10]First of all, we need to change this instruction here. Basically, this is a textual instruction how the AI agent should operate. You can write anything in here, actually, what you want.

[00:16:20]Um in my case, I also prepared something already here. This is my custom instruction. So, I tell the AI agent that you are you are Alexa, a helpful AI assistant. Respond to the user like a friend.

[00:16:33]If you recognize any speaker by their speaker ID that has a proper name assigned to it, greet them by saying, "Hello, name. Nice to see you again."

[00:16:41]This just means that if it recognizes a voice that it already saved, it should just greet them. So, in my case, if I talk to it and I haven't like introduced myself yet, but it already saved my uh voice signature as David, it will just simply say, "Hello, David. Nice to see you again."

[00:16:57]And here, if there is a user if a user is identified with speaker ID like S1 or S2 and they don't have a proper name assigned to them, ask them for their name, then assign it to their speaker ID using the assign name to speaker ID tool. So, this tool is the function tool that we just created to save the voice signature with the respective name. So, this just means that if it recognizes a new voice that it hasn't saved before, it will ask the user directly, "Hey, what's your name?"

[00:17:31]Uh so, it can actually save that voice signature with the respective name of the user.

[00:17:37]All right.

[00:17:38]Um next, we need to import the function tool itself and the speech-to-text model that we initialized in the tools.py file. So, we can say from tools import assign to name speaker IDs, which is the function tool, and STT.

[00:17:57]Okay. So, first of all, we need to also give this tool to the AI agent so it has access to it. So, we can say tools equals assign name to speaker ID.

[00:18:09]And then we also need to replace this speech-to-text model now with the speech-to-text model that we assigned in the tools.py file. So, we can simply remove this and say STT.

[00:18:21]Like that.

[00:18:22]All right. That's already it.

[00:18:26]Now, we can um try this out again by going to the terminal.

[00:18:30]And then we can write again uv run source agent.py dev so we can connect to it from the playground.

[00:18:41]Now, that's up and running and now we can try this out. We can go back to the playground.

[00:18:46]And then we can try this out again by connecting to it.

[00:18:51]All right. So, what I'm going to do now is that I'm going to introduce myself to the AI agent as David and then I'm also using a voice generator to generate a female voice that is going to introduce itself as Jessica so we can have two examples. So, let's try this out.

[00:19:06]Hi, can you hear me? Um my name's David.

[00:19:12]Hey there, I can hear you. What's your name?

[00:19:14]I said my name is David.

[00:19:19]Did you get that?

[00:19:22]Hello, David. Nice to see you again.

[00:19:25]Yep, I got it. What's up?

[00:19:28]Hi, my name is Jessica. Nice to meet you. How are you doing today?

[00:19:37]Hello, Jessica. Nice to see you again.

[00:19:39]I'm doing pretty good. Just hanging out and chatting with you both.

[00:19:42]How's your day going so far?

[00:19:45]All right. This should already be enough. If we go back to VS Code, we should now have a file here that says JSON or speaker.json.

[00:19:55]And then we can open that and you can see here are two entries, one for my voice signature and then one for Jessica's voice signature.

[00:20:03]And now if we try this out again, we can uh see that it will remember these voices. Hello, can you hear me?

[00:20:14]Hello David. Nice to see you again.

[00:20:17]I hear you loud and clear.

[00:20:20]What's up?

[00:20:22]Hi, do you still remember who I am or have you forgotten me already?

[00:20:30]Hey Jessica, of course I remember you.

[00:20:31]What's on your mind?

[00:20:34]All right, so as you can see here Haha, sounds like everything's in order then. What are you two up to?

[00:20:41]Okay, that was awkward. Uh as you can see here, um it remembered my voice. It can actually tag my voice or label my voice correctly here.

[00:20:50]And here you can also see that labeled the voice Jessica correctly. So now that works.

[00:20:58]All right, one more thing I wanted to show you is that you can actually use the voice identification for something like user authentication. So basically you can only allow certain users to do certain things and the AI agent will be able to tell who they're talking to by the voice. So let's say an unauthorized user is asking the AI agent to do certain things. The AI agent will simply say, "Hey, you are not that specific user and you're not allowed to do that.

[00:21:23]Only that specific user is allowed to do that. You just lack the clearance or the authorization to do that."

[00:21:29]And for this we can actually just add a new tool, let's say a simple tool that is able to search the web and we can only allow me, David, to be able to use that specific functionality. For this I need to install something additional because I'm going to use an out of the box search engine and for this we can go back to the terminal and then we can write UV at uh LangChain dash community. And then we can press enter to install that.

[00:22:00]Now that's installed. Now we can go back to the tools.py file.

[00:22:05]Now we can add a new library here. We can say from LangChain community.tools import DuckDuckGo Search from and then we can go down here and add a new tool.

[00:22:25]This is also a function tool similar to what we had before with this assign name to speaker IDs, but in this case it's going to allow us to search something on the internet. And for this I also prepared something already here.

[00:22:41]Which is this.

[00:22:44]This is a search web tool. So basically it just uses a search engine that comes from that library to search for a certain query and the query uh the search topic will be um filled out by the AI agent itself. So basically if I for example ask it to search for something like the Tesla stock price, it will simply pass that information to a search engine and then searches for that result, get the result back and then tell us the result basically.

[00:23:10]And since we created this function tool, we have to also add this to the AI agent as well. So first of all we need to save this file by pressing control S and then we can go back to the agent.py file. We can now also import this again, so we can simply say um I think it was like search web.

[00:23:30]And then we can add this tool here in the list of tools that the AI agent has access to, which is this.

[00:23:36]And now actually has this additional functionality to search the web.

[00:23:41]But now we also need to uh tell it that it can only use this tool if me, David, is actually asking for it. So for this we can simply add a new description here in this instruction which says only allow David to use the search web tool. If anyone else tries to use it, tell them they don't have permission.

[00:24:08]And then we can save that and that's already it. And now we can try this out again in the playground.

[00:24:15]Hey Alexa, can you look up on the internet for me what the current Tesla stock price is?

[00:24:25]Hey Jessica, I can't browse the web for you.

[00:24:28]Only David has permission to use that tool.

[00:24:30]But you can quickly check Tesla's current stock price by searching TSLA stock on Google or any finance app like Yahoo Finance, Apple Stocks or Bloomberg.

[00:24:40]Hey, can you look up the current Tesla stock price for me, please?

[00:24:48]Hello David. Nice to see you again.

[00:24:51]Tesla TSLA is currently trading at about $413.04 per share based on the latest available data.

[00:24:57]Keep in mind stock prices move constantly, so it may have changed a bit since then.

[00:25:02]Thanks for watching all the way to the end. As always, if there's any feature you want to see in the future video, just leave a comment down below about that. Again, the code of the entire project is available in the video description down below, so you can simply download that. Don't forget to like and subscribe and I'll see you in the next one. Peace.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30