OmniVoice significantly lowers the barrier to linguistic inclusivity by providing a robust, parameter-driven framework for over 600 languages. It marks a sophisticated shift from simple data imitation to controllable, open-source synthetic speech generation.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Open Source AI TTS With 600 Languages - Installation and Showcase of Omnivoice追加:
Hey, what's going on today, YouTube?
Today, I want to talk about OmniVoice, which is kind of a newer text-to-speech system that's been released built off of Coqui TTS and so this here is the repository.
The cool thing about this is that it is supports it supports over 600 different languages, which is really good and it's very fast. So, it's it supports a lot of language. It's very multilingual and is a very fast architecture and I've been trying it out here as well. So, I have it local. Um and the sound quality for it is really good. The voice cloning is solid, very close. Not I wouldn't say the best, but it's for most applications probably sufficient. So, let's take a listen to some of the samples here. I just finished testing out some sentences and we can just take a listen to some of these. So, here's the sentence here.
This is an example that I had done earlier. Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground.
So, that's one sample from a character from Outlast Enfield that I was using.
We've got some other ones here from the same video game and we could regenerate that for this character. Now, it is very fast give or take 9 seconds right now because I'm utilizing most of my GPU.
So, um it could be a little bit faster, but it's more than two times real time cuz it's 19 seconds in under 10 seconds. So, let's take a listen to this. Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground. So, that one, you know, is it's it's speaking I would say the flow is kind of awkward. It's kind of weird, but the voice cloning is fairly accurate there. So, there is also this other cool feature where we can synthesize a voice or design a voice is what I meant to say.
So, we don't need to have a reference audio. We can simply just describe what type of voice we want. So, I'll show that real quick. So, I want let's just say male. Auto we'll just do let's say middle-aged. Pitch we'll do lower pitch.
Style I don't want whispering and then for accent we'll just go with American accent and generate from here. So, this is the other cool feature that exists with the OmniVoice is you can use or you can design your own voice here based on some of the prompts that they've given.
So, let's take a listen to this. Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground. Okay, so there you go. I would say that's like a generic narration voice right there. I wonder if we can do like a different accent. I'll throw in an Indian accent here as I think that one is a little bit more noticeable and we'll see what we get for this piece of text. Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground. Okay, so that did not have an Indian accent for that one.
Maybe we could do auto here and then auto for the pitch and then generate with that to see if we can get a better accent on this text. Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground. Okay, that one is definitely more accented. That is about um uh that's a lot better than what it had previously. I don't know if that's because I changed the age or had those for auto and then I guess we can even try let's do Australian accent and generate here. So, I think this voice design is super cool.
You can do so many different voices on here and I could probably spend ages on this.
Now, let's take a listen to this output.
Hidden beneath the forest floor is a living network that allows trees to exchange resources, warning signals, and support connecting seedlings, mature canopies, and damaged roots across remarkable distances underground. Okay, so I think that was a little Aussie, but it could probably be a little bit more, maybe a little bit more exaggerated. But besides the point, that is really nice.
We've got a voice design and then we've got a voice clone part of this Gradio interface here. Oh, and I also wanted to show it in some other languages. So, I have this sentence here that I'm going to say can you translate into 10 different languages and we'll get it translated with ChatGPT for that.
I'm going to use the voice design here.
So, for this case let's go with a male.
We'll leave the auto pitch auto or auto age and auto pitch. We'll leave everything else auto and then well, we're going to do we'll start with the first language, which is Spanish and get a generation up that. So, we'll see what it comes out with. So, I'm not going to be a very good judge of this, but I'll just go through some of these languages, play them out for you guys just for you to hear them.
All right, so that was Spanish. We're going to do French next.
All right, we'll do German next.
And we'll do Japanese.
I said the Japanese is pretty good. It's beyond what I'd be able to judge. All right, we'll do Korean next. For this let's just switch over to female for these samples.
So, I have no clue because I can't read Hangul, but we'll try one more language or let's do let's do three. Let's do these bottom three here, Russian, Arabic, and then Hindi. So, we're going to do Russian first.
All right, we'll do Arabic here.
All right, and then we'll go ahead and do Hindi last here.
All right, so there you go. It's got a bunch of different languages in there.
Like I said, it's got 600. Now, let's go into the installation phase. We are going to be using a few tools. So, there are going to be a few prerequisites that you need for this.
Number one, you're going to need Git in your on your computer and then you're going to need something called UV by a team called Astral. So, I'll show those real quick.
So, what I recommend to do is just open up a file explorer in any window here and what we're going to do is just type in CMD in the address bar up here to open up a command line prompt and then we are going to go over to like this installation section for Astral or for UV and then install this onto the computer. So, we can just paste that into here and it's going to go ahead and run through UV and get it up and going there. So, wait for that to finish up.
Um and then for Git you're going to go to this page here and then just click here to download. It's going to download an installer and then once you get the installer just go ahead and run through all of the default options for that and you should have Git accessible on your computer. So, I'm not going to go through that one as you can just just follow the default process for that one. I already have it installed on my computer and then once you're finished with UV, you should be good to go. Now, I can't do it right now because I'm currently already using UV, but in your guys's case you should have it successfully installed. So, once you have those two, close out of that terminal window and then we're going to navigate into GitHub. So, for this we're going to be using my fork of OmniVoice. Now, you can always find the main GitHub repo that I forked off of, but just for YouTube purposes I like to fork it. Oh, yeah, you do need an Nvidia graphics card most likely for this.
There are other accelerators that you can use, but Nvidia is going to be the most accessible for speed. You can use Apple silicon. They've got instructions down here. Feel free to follow those instructions if you would like. But for this what we're going to do is just go into the green little button there, click on copy URL to clipboard, navigate into the file explorer and once again inside of the address bar type in CMD so that we can open up a new terminal. So, we close out the previous terminal just to make sure that UV and Git are now accessible as commands. So, if I type in UV we'll get something like that. If I type in Git I'll get that. If you get kind of like an error message where it says cannot process or the command or let's just go with something like this is not recognized as an internal or external command, go through those installation processes again, close out of all of your terminals and then reopen them uh reopen the terminals again. So, I'm going to clear that. And so, once you're here, we can clone that repository. So, we'll do get clone um the URL that we provided here. Once you've done that, we're going to CD into Omni Voice. I'm using tab to auto complete uh to the folder name. And then once you are once you are inside here, all you really need to do is just do UV sync and that's going to um prepare the virtual environment that you need in order to run this with Python. So, that is the really nice part about UV. Now, for you guys, it's going to take a lot longer because it's got to install or it's got to download some packages, but since I already have them pre-cached on my computer, I don't really need to um wait. So, yeah. Wait for those to finish up and install. And uh once you have those, all you need to do is um let's activate the virtual environment.
So, you type in venv or actually, we could probably just go like this UV run Omni Voice - demo.
And that's going to launch the um Gradio interface that is that is available inside of this repository. So, so once you launch it here, it's going to download the models that you need in order to run it. Currently right now, um I have I have one on this address already. So, uh I'm going to need to close out of what I am currently running. So, let me just close out of this real quick and then I can go ahead and navigate back into here um and then just run that command again. So, that just means you can't have two um Omni Voice demos open at the same time.
Um if not uh if you do, you'll get conflicting um addresses um or you'll get conflicting programs trying to run on the same address, which you know, computers don't allow. So, once it finishes up here, it'll say running on local URL and then we can open up a browser to get that um launched. So, um what I would do is just go into the address bar here, type in localhost colon and then we'll type in 7860.
And so, this is going to bring us over into the Omni Voice demo page where we can go through and synthesize uh different voices. So, this is the interface that I was showing earlier.
So, I'll just go through a quick explanation on what the fields in here do. Uh this text to synthesize is the text that that we want to generate.
And then inside of the reference audio, this is where you would upload whatever uh voice you want to clone. So, in this case, I've got some of these here. Let me just choose uh this guy right here.
Um and so, this is going to be the reference audio for whatever voice you want to clone. Reference text is going to use um or reference text is going to be the transcription of the reference audio. Uh you don't need to enter anything in here if you want to have it auto transcribe by Whisper, I think is what it uses, or some type of um automatic speech recognition system. And then for the instructions, um we're going to leave that blank. I believe what the instruct does here is you can kind of tell it um how to generate. And then for some of these other generational options, you have like speed um or duration, you know, how many inference steps to do for the denoising process um and then um all of these other things, which we could probably just leave at defaults.
So, just to check to see that everything is loaded and installed um properly, uh we could just click on generate here and see if we get audio out once we've uploaded that reference. So, let's take a quick listen. The text that we want to generate. So, there we go. Uh we've got uh the Omni Voice demo running here.
Um and then once again, we've got the voice design here, too. So, that is a quick way on how to get Omni Voice up and running on your guys's computer.
Um now now you can use this inside of Python. Um they give examples on how you can utilize it from Python. So, if you want to put it inside of your own scripts, go ahead and reference the readme here. And then um there are some other additional things down here that you can look at.
So, one thing that's cool to note is that it has like kind of these non-verbal tags that you can put in there like laughter, sigh, and these other ones. For example, I guess if we go back into the interface here, um we could probably, let's just say, put in that laughter tag here and I cannot type today. Put in that laughter tag there and then regenerate and then we'll take a listen to see if we hear that laughter.
The text that we want to generate. So, it didn't laugh, but it uh created a a a a breathe in.
And let's let's take a listen to this.
The text that we want to generate. So, there we go. It had the little chuckle there um for laughter.
So, in my experience, uh I was trying some of these tags earlier. Um it's not the best for these supported tags, uh but it's it exists. So, it's nothing like um Fish Audio's S2. Um Fish Audio's S2 is much better with these type of controls and some of the other text-to-speech models out there um are better that with the controls, but they exist. And um I think that's pretty much it for all of the different things. And then just a quick little plug here. Um as you guys might know, if you've been following the channel, I have an audiobook maker where the Omni Voice text-to-speech engine is available to use here. Um this is very useful if you want to have audiobooks read out to you or create your own audiobooks. So, this was just a test example I had here with my voice. This is really bad. And uh yeah, so you can use audio Omni Voice for here. But this one is available to the people who have purchased the audiobook maker. Just want to throw that out there in case you guys have that Omni Voice is available as well as Elo Doddy and Vibe Voice in that packaged version. Now, the open-source version might be getting it sometime, but I need to clean some things up and and then uh upload it. But for those who have access to the audiobook maker, we're on version 4.2.1 and I've put that in here. So, go take a look at that if you haven't already and you're an audiobook maker follower. And yeah, that's going to be it for today's video about Omni Voice. I think it's fairly cool. It's got 600 different languages, so that's the I think the biggest part and a very fast inferencing text-to-speech model. But yeah, once again, like to thank all the members of the channel for supporting me. I very much appreciate it and I will see all you guys later.
関連おすすめ
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











