Building sovereign AI for India requires addressing unique challenges including linguistic diversity across 22+ languages, limited digital data availability, and the need for data sovereignty. Companies like Sarvam AI overcome these challenges through comprehensive approaches: developing specialized tokenizers for Indic languages to reduce computational costs, curating domain-specific datasets (medical, legal, financial), and maintaining vertically integrated systems where data never leaves the country. This enables applications like EMR systems that can save doctors 2 minutes per consultation, translating to 5-10 more patients served daily, while also supporting document digitization across handwritten and printed materials in multiple languages.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Here's What Sarvam Looks Like Behind the Scenes
Added:I'm here today at Sodom AI in their office in Bengaluru to do an office tour, see everything there is to see about this company. I think Siv is a company that a lot of people have heard about at this point, but perhaps not many people actually know what it's like to work at this company, what it feels like to be inside of it. And so that's what today is all about. I'm going to be going all around this office here in Bengaluru, specifically located in Indranagar to meet a bunch of the teams at this company, the go to market team, the product team, the APIs team, the design team, the sumvad team to just get a sense of what's keeping these teams busy, what are they excited about, and what it's like to work at Sarv. So without further ado, let's get into it.
It's runtime, your daily Indian tech white pill.
So we're just here on CH road. It's a very normal regular part of Indranagar.
And so this is actually where it is.
It's in a it's in a Saram is in an urban vault uh office. You can actually see here we've got an antler sandwich going on.
>> Hey.
>> Hey. Hi Kale. Sahed here. Nice to meet you. The GTM and strategy efforts for Sarbam's model APIs.
>> Okay. Let's see the space. I'm excited.
This is my first time visiting Saram.
Uh, which I would say is arguably India's most Oh, hey guys.
>> Hello.
>> Um, what do you all like most about working at Saram? It's >> the people in teams.
>> The people in teams. Okay, safe answer.
>> Okay, here we are.
Okay. So I I feel like we have to whisper everyone's hard at work >> then. Yeah. I think busy working there.
But uh >> so this is the the first Sarvang floor.
>> Absolutely. So this is where most of what we have built has started right all of our business all of our model resource teams also sit here. Right. So this is pretty much where the conversational agents team sits right.
So the servant platform has literally been built right here. Right. Um you have a few other teams looking at the enterprise reasoning platforms come here. Yep. I think uh you know this is the where this was the first floor we got. Of course now we've expanded a lot.
You'll see the office is quite full. I think on a regular day we have no place to sit on this floor right. So it has >> which is why it's expanded up to the to the upper floor.
>> So we started with just this floor. Now we have three floors of this building.
So it's been very rapid expansion. I think a lot of the love that we got from the developers has led to that, right?
In terms of how a lot of our models got adopted, how the product started to take up. So, >> and that's actually what you're spending a lot of your time >> and I think I think we should spend special time on this team. So, this is the design team, right? So, a lot of the brand you saw, >> the lift team, >> the lift team. So, a lot of the brand videos you saw, all of the design, all of the efforts that going into putting out the content happens right here.
>> This is where it happens. Wow. Oh, I think everyone's been very impressed by the branding. In fact, I'm wearing a Saram t-shirt, which I love to wear, not just to kind of promote Sarv, but also just because it's a great tea. And if you can see the the back of the t-shirt as well. It's like very well designed. I don't know if that was anyone on your team or someone from the past, but yeah, I love what you're all what you are all doing.
>> Did you work on this t-shirt?
>> Oh my gosh. Okay. I I feel like it's a like a celebrity moment >> because I've literally worn this like probably hundreds of times at this point.
>> Cool. Shall we sit down and and talk about APIs?
>> Absolutely. Right. So, see, I think one of the things that makes Serum unique is the fact that we're not just building the models, right? Or the uh APIs. It's that we are a full stack company, right?
So we are building the models, we are doing the research there, we are hosting it, we are doing all of the work on the inferencing side, we are making them available as APIs but we are also looking at the application layers right so servad right the conversational agents platform or the dubbing studio right both of these are examples of the entire solution where all of the models have been made together in a I would say very efficient orchestration to get the most out of them. So we also have these uh you can call them product lines which focus at these verticals right so conversational agents looks at BFSI e-commerce digital native companies throughout the board right anyone looking to automate with voice agents for let's say inbound or outbound counts serum sambal focuses on um I think that is uh you know a big focus for us uh the dubbing studio also has a lot of use cases for content creators trying to make sure that anyone who's making content in one language can reach a wider audience, right? That is the vision that we are taking up there. And then you have the APIs, right? Which is more for builders, right? They they are the building blocks so that someone who wants to build something by themselves can go ahead and do it, right? Because again, while we are getting into the full stack of it, while we're getting into these specific applications, I think people have people know the nuances of several different products, right? This is just the tip of the iceberg with what you can build with these models, right? You can build things like postcon analytics solutions, uh, EMR solutions, electronic medical records in the healthcare space, right?
And there of course people within that space understand them much much deeper.
So that is the idea behind putting the APIs also out so that developers, engineers within these spaces can adopt them and build those products.
>> So you're working directly with developers to help them to use the APIs.
what are what are like the top two or three projects that you just find really interesting and and were sort sort of surprising for you? So I personally uh find the EMR use case quite uh interesting right um I think it's one that can create a lot of value because if you think about it right um for doctors right especially in India the amount of uh I think patients that they are is going up day by day right and the work that they do is very crucial right naturally now if you can save even you know 2 minutes right for a doctor per consultation you can probably that probably translates into you know 5 10 more patients catered per day, per doctor, right? So, I think the unlock that that use case brings is super interesting. It's it's super valuable.
Um, and of course, the ST API being able to pick up those nuances, right? Because medical terminology is hard, right? It's it's uh very specific long words, you know, words that are not common. So, making sure our model can understand those words is a whole different challenge, right? Which is super interesting to crack as well. And uh again those conversations tend to happen in several parts of the country in different languages right it'll happen in rural areas urban areas different accents different dialects. So that use case especially for the HTT API I think brings about all the a big majority of challenges that can exist. Solving that is super interesting and the impact it can generate is super cool. So EMR I think is one of the more interesting applications that I have come across.
Besides this of course I think there are so many startups doing extremely cool things right. You have people building language learning apps, right? Which help people not only get acquainted with some of the regional languages like let's say Asamis, right? But uh also you have people trying to learn English, right? To be able to apply to certain specific jobs or to be able to find confidence in urban cities, right? So again, super inspiring to see what these startups are able to do. Um recently we have had uh you know a bunch of NOS's also come up trying to you know build um kind of like an edtech conversational bot which can help young students right uh kids in the first to third standard learn specific subjects in a again more efficient quick manner right so again super interesting projects I think that people across the board are coming up with >> so one of the APIs that I found the most interesting uh earlier this year when I was playing around with the platform was document digitization And so I the thing that I wanted to try which at the time it wasn't available um but I wanted to take a Amar Chetraata comic book uh in which was written in some I forget which language it was like I think it was just uh Hindi right and I wanted to put that into the SAM platform and get out something that was like translated on the page in the specific spot where the speech bubbles were coming up into English so that I could read this old comic.
>> Um yeah tell me more about document digitization. Absolutely right. See, I think it's one of the more interesting ones and it's one of the newer APIs and models that we have, right? We uh launched it back in February uh just before the impact uh AI summit. Um so, so absolutely right. I think uh digitizing things like comic books and so on is is a big part of what the content creator space can use it for, right? I think um you know it's the start, right? Let's say if you have certain PDFs, handwritten notes, anything, right? that maybe you want to convert to something like an audio book, right? This is the first step towards creating that pipeline. And um I I think you know some of the more uh you know interesting use cases we were also looking at um look at sectors like BFSI legal or healthcare again right where you have a lot of such old documentation you have uh you know folks in the BFSI space you have a lot of KYC documents that are processed on a daily basis right even if you're applying for as something as simple as a credit card you'll have to upload your KYC documents so making sure that pipeline is efficient right making sure you're able to extract key values from those documents fairly quickly and move forward with the process is something that we want to look at very heavily. Uh similarly in the healthcare space right old patient records is something you can digitize get information out of very very quickly and uh you know what sort of will stand out is that these this model does really well on 22 languages right so it could be handwritten it could be digitized in any language you will be able to understand what is written in it you'll be able to then reason upon it put put it into different kinds of pipelines uh even the benchmarks which we've posted in our blog right I think those highlight you know and Again, this is the first step and this highlights how far we have been able to come with this model, right?
It's only going to keep getting better from here out. So, um super interesting uh you know to see how this particular API evolves. I think um one of the key aspects we were also looking at is really old manuscripts, right, which are let's say not very clear. Then the the page is not clear, the handwriting is not very easy to understand.
>> I think there's probably a lot of that in India, right? these archives, these libraries that nobody has scanned and digitized.
>> Correct. It's it's immense, right? And I think uh being able to understand that nuance is very difficult. But I think even with the initial model that we have right now available on the APIs does a spectacular job at it even to begin with, right? So I'm super excited to see where this goes. Now let's go over to have a chat with Hesh as well. Hesh is part of the engineering uh team at the for the server model APIs.
>> Okay. And um you know hit will help us understand a lot about how these APIs scale up right the kind of load that we can cater to and of course while you're scaling up it's important to make sure latency doesn't spike right or quality doesn't drop so the engineering team takes care of this I think he'll be able to give us some insights on this >> how do you evaluate your models internally >> uh so internally we have a a pipeline uh where we take uh so we have we take users who use our API for specific use cases like let's say edtech or uh bank loan uh bots. So we take our highest users and take their uh let's say for text to speech in this case we take their actual transcript the text that they sent and the speech that was produced and we run a whole bunch of tests to see what the error rate was and whether the model hallucinated or yeah so we have these in-house uh evals in place which we do weekly for every uh uh domain that's there and based on that we keep improving our models or if there's a If it's a pre-processing issue then we immediately make a fix on that and release it immediately. So yeah we have a system like that.
>> So let's head up to the fourth floor.
Let's see if any of our model research teams are there. I think ASR and TTS which have you know Bullbull and SARS which have become our kind of like our more predominant APIs and flagship models for over a year now. Those teams sit up right. So let's see if we can have a chat with them.
>> Okay. So okay I didn't expect this. I thought we were going to walk into like an office space but we're okay. Okay.
So, it's over there.
>> But up here, people can kind of hang out.
>> Yes.
>> Just chill.
>> This is >> So, this is like our evening space, right? I think uh uh anytime we just want to grab a coffee or, you know, just relax for a few minutes, this is where we find ourselves in the evening.
>> Yeah, it's really nice. Wow.
>> And I think the sunset here is amazing.
I think we not at the best time, but yeah.
>> Wow. Spectacular.
>> So, this is where the ASR and TTS teams hang out. A lot of the engineering teams for Sam Sbad also hang out here.
>> Okay.
>> I remember this is also where we had the first session where we invited I think about 15 different enterprises to look at the early preview of Bulbull V3. So this is where Bulb V3 was born. We had the first session. We got a lot of detailed feedback and it took us about you know few more weeks after that initial feedback to iterate and put it out to the world. So a lot of that initial ideation happens here. So I think Sodom as an Indian AI company has a very interesting challenge um that it's currently tack tackling and also will be tackling in the future which is the number of languages in India as compared to say an American AI company where they basically only have to they like mandatory is English right and then everything else is sort of secondary right if they can include Spanish and French and a bunch of global languages great >> but sort of can't really do that right because Sodom is serving Indian customers. Um tell me about the challenge of of building products um for so many different diverse use cases and languages.
>> Yeah, sure. No, I think you're right.
Like while uh the primary language is language of LMS today is English, taking it to a billion kind of a population scale requires you know meeting them where they are. And if you work with any Indian enterprises that are working on ground at scale for years, you will see that they are very natively embedded in those languages. U given that there is lesser data in the world digitally available about Indian languages that makes the challenge harder. But I think the team has been doing a fantastic job at curating the right kind of data sets.
And if you see one of the larger innovations we did around the model that we recently launched was our tokenizer efficiency on uh Indic languages. So when you pick up a script like D Nagri which powers uh Hindi, Maratti and a bunch of other languages uh our tokenizing efficiency is so high that you can serve the model at a much lower cost in those languages. So yes, of course that is a focus and while we will continue to push the barrier on English uh which is say in many ways the primary language language of LLM today a large focus is on how we take it to India like you said uh yeah >> how does it actually happen cuz I know for those early LLMs in the states they were crawling websites like Reddit um other places where there were huge amounts of of written English language text. Uh for Indian languages, is it the same process? You're finding forums, you're finding um just places where people in India hang out or is it a different uh process?
>> So it is similar. Uh it's just that the quantum of data already available is lesser. So how every model is trained is there some seeded data and then you kind of build on top of it synthetically generate more. So how do you how do you use the existing data out there and then uh generate more on top of it to achieve the right distribution that you need data distribution that you need that okay I'm going to be talking to an inim customer who can sometimes uh be excited sometimes be anxious sometimes be confused how do I model all of that into the distribution that the model is seeing during training and maybe post training as well >> and how important is um usage right now for building new models. I have to imagine a lot of data is being generated now that customers are using Saram's models. Um but how much like how how meaningful is that uh generated data versus existing data sets and synthetic data sets?
>> I think it's always a mix. keeps being it is it's usually not a decision that you freeze in time but you keep iterating upon and how you use different data distributions and of course we can use usage patterns to generate more synthetic data on directions that matter to us right so across existing data well anonymized data of how users might be interacting like the patterns over there and then using them to generate synthetic data sets that makes models behave very well in settings that the users care about I think uh we keep you know using across these channels. So one thing that I find interesting about Saram is the number of different AI applications that are being built out right there's an LLM now right a chat interface through Indis um there's also voice with mobile um there's document digitization >> there's a lot of different things that you guys are working on >> and I think in the in in the states what I see is that often times companies smaller companies they sort of focus on one thing right like 11 Labs is like largely around voice, right? Uh midjourney is largely around images, right? And maybe a little bit of video, but that's not their specialization.
Whereas some of them is kind of trying to do everything. So, how do you how do you guys think about the the different things that you build, the products that you're offering? How do you decide what to work on and what not to know? I think that's what it takes to build a sovereign ecosystem. You can pick up parts of the problem, but the whole piece won't come together. and someone has to take that ambitious bet and uh all the risk that comes with it. Right?
So it is important that tomorrow when I go to an enterprise that has a lot of data written in hand to use uh digitization then uh then when they want to whatever data is being used to power customer interactions they want to do it in voice then we're able to detect those voices and when that data is being input to the LLM it is efficiently able to analyze it like that's how uh use cases that matter uh in India will get powered. So while it is risky and maybe it's harder, it's definitely harder. I think that's what it takes to build a sovereign ecosystem and yeah.
>> Okay, >> thanks so much.
>> I'm going to go talk to Barat here. I think he's got uh Samvad loaded up.
We've got a voice agent.
Hey Bat. Hi.
>> We've got another person in this uh in this little interview here which is what's the name of this agent or does it have a name?
>> Uh this is an appointment booking agent.
So let's say you're calling a hospital and you want to book a slot for yourself. So let's let's try talking with it. Namaste.
General physician Dr. minute.
>> There we go. There's a little bit of internet.
>> Yeah, there's a little bit of connectivity issue, but usually it's not like that.
>> Okay. Got it. Um, so when did first of all, when did you join SAM?
>> Uh, it's been almost an year and a half for me here.
>> Okay. And and the whole time was spent on Sambar.
>> I've been working for ever since I joined. Yes. Okay. Um, obviously there's a very competitive space, right? Um, I think both within India but also globally there's a lot of um, interest in agents that can handle these kinds of calls.
>> Um, how do you guys I mean like I I just have to imagine that creates a lot of pressure on the team.
>> No, of course I think you're right. I think there's lots of good players not just in India but across the world who are trying to build conversational experiences using AI agents uh of course that means we have to make sure our product is the best out there uh on every sphere building an agent the experience of talking to the agent and everything around it right analytics scaling it so that enterprises can scale to huge volumes as they scale as the use cases expand and without facing any hiccups so it's the entire journey not just building an agent that's just one part of the puzzle but deploying it successfully scaling it obser oberving it, improving it, uh, and so much more.
>> So, what does SVA do better than anything else, any other platform or agent?
>> I think our agents are competitive with the best agents out there. But what I would say really differentiates Sambad with respect to what others are offering out there is that uh I don't think any offering out there has really been able to tap into all the 11 major 11 Indian languages, English plus 10 indic languages that exist. So, could be not just there multiple pieces to the puzzle, right? There is recognizing speech, transcribing it, your LLM being able to process it and your texttospech model being able to speak it properly in a way that sounds human. And so all the three pieces of the puzzle need to be done very well. I don't think uh any major player out there has really cracked that experience in all the 11 languages. And when I say that a call distributions is fairly spread of course English and Hindi dominate but not it's it's but there's a major chunk is non English and non-Hindi volumes that come in. So our chops, let's say Sarum's chops in having great text to speech and speech to text models in this expansive languages really helps us and uh not to mention that we also work with several regulated industry uh major banks, insurance players, NDFCs who have very important data security requirements. So as being vertically integrated, these models are developed by Saram, hosted by Saram. So inference is done by us. Our data does not leave let's say the country. It's processed inside India. So many regulated industries we're able to service them in very sensitive use cases. Uh we've also done air gap deployments for very critical uh players let's say like UIDI uh which runs the Aadhaar project. So that really helps us you know uh service these critical regulated industries and that really sets us apart from let's say uh any other player out there.
>> So you mentioned 11 languages.
>> Yes.
>> But obviously India has many more than that.
>> Correct.
>> And so I wanted to know what is what is sort of what's the next step? what what is stopping Saram right now and and maybe I'm sure you guys are working on expanding the scope um but what are some of the challenges that the team is facing in reaching the rest of the languages like for example I live in Mizo right and so I I'm like I really want miso actually it's 12th now we recently added support formemes so yeah as language from one of the states in the northeast so I think soon other languages as well so I think it's just that the quality let's say of the I mean I won't speak for the research team But it's it's it's not that we're stopping at 11. We are soon looking to expand to much more broader at least all the 22 recognized officially recognized languages in India and can we have coverage across all of them. So yeah. Is there is there something again this might be a team it might be a question for the research team but within a language I know that there's a lot of um diversity right depending on the the accent that a person has depending on where they're from in the country they might speak that language in a very different way to the point where it almost sort of becomes like a different language right if you have two people who are on opposite sides of the spectrum of that language they might have a hard time understanding each other how do you solve for that >> so I think a texttospech model is Not just let's say uh let's say it's not singular there are multiple speakers. So Hindi spoken by a speaker called as Adita sounds different from Hindi spoken by a speaker called uh let's say uh Tia.
Uh so those are different accents etc. And we've tried to encapsulate in the different speakers we offer and every speaker can speak all the languages but what differentiates every speaker is how they speak how expressive they sound. Uh could be that's some regional differences between how they speak.
person from a certain state might speak in a slightly different way than a person from another state. So we try to encapsulate those things in the different speakers we offer. Uh and I think of course we can improve on that and continuing to do that as well.
>> So at this point SVOD is doing about a million minutes per day of calls >> which is massive. Um but you know as we just experienced right there was a little bit of breakage sort of the you know connectivity was sort of fluctuating which is just you know that's how it is but how are you assessing the the customer satisfaction both the end customer but also your customer right the sort of enterprise customer um or the government customer how are you how are you like sort of yeah how are you assessing how satisfied they are with the with the experience of using something >> there are both quant quantitative and qualitative ways of assessing that. I think we work with different use cases.
So, you know, uh there can be debt collection use cases, sales use cases.
So, every sort of industry has its different uh objectives they're looking for and we're able to measure that.
Let's say if it's debt collection, uh percentage of people who agreed to pay uh is is a very important metric that we track in sales. It's converted leads, warm leads, people who really showed interest. So, those are real uh clear business objectives that were met as a result of the voice call. So you're not just deploying voice agents for the sake of it. You're deploying them because you really want some business outcome at the end of it. So those are really uh very very objective ways in which we measure how effective our voice calls are. Uh besides that there are also lots of qualitative ways right and which is where a very extensive eval setup on the platform which is baked into the product comes into play. We assess how humanlike the experience was. Did the agent hallucinate? How good the agent was? uh and so many other dimensions in which we try to score every dimension of how the agents so the experience of speaking to the agent was like and that feeds us feedback not just to how to improve that particular agent but also how to improve the product overall.
>> So meet Vun. Varun is our head of design. He is the brain behind all of the brand redesign that has come up. The the insane video that you saw go up talking about how Saram is reborn is Vun, right? It's all come >> the man, myth and legend. Great to meet you. when you joined the company like your what was your assessment of the sort of brand and and style and tone of the company? I mean this started way before me joining the company right because I was a consultant to serum and um I had a conversation with Pratish back then this is when serum didn't have anything right like serum didn't have a brand didn't have logo uh we don't even have our website right like we just uh ris funding and we are uh we started building our products and that's when I met British and vive and um like any brand exercises we've gone through few questions Kickstarts and then um uh we've wrote this statement or tagline back then uh that we want to build a brand where every Indian should feel uh relatable that they should feel oh this is something that I know this is something that I see. uh but at the same time we also want to uh make the brand feel relevant in the modern era right like we want uh our brand to stand its ground even in u say a western market very similar to how IKEA took their approach right like if you look at IKEA's approach they have a lot of uh cultural elements that are coming in brand like the hedge or the colors that they they picked but when they go and like put their brand let's say in India they still stand the ground. is still so there is a cultural aspect then there al there is also like this contemporary aspect right so this is what we wanted to do uh because we've seen that we are our values are strongly uh rooted in India but our ambition is deep tech now how do you bring the world of deep tech and culture together and it all started from there and then it's all bunch of iterations I we've done bunch we've done few iterations back then uh before me joining full time and um I realized no I mean if you want to build the kind of brand you're actually assuming I can't do this as a consultant I'll need to come inhouse build the entire team then build the brand and yeah that's how it started and I've seen that this that statement that we've wrote that day has has potential to create um a brand that India would be proud of. So when you first joined it was basically just you were the department.
>> Oh yeah.
>> Okay. And then now it's it's expanded to this table basically.
>> Yeah.
>> Okay. And probably next time I visit it'll be a couple of tables hopefully.
>> Couple more. Yes. Few more are not here yet. Uh they're working from home.
>> All right. So now that is that is fully that's everything. We've seen the entirety of Sarv at least in Bengal. I know you have another facility um in Chennai as well.
But um what's next for Sarbam? I mean early 2026 was a very important very exciting time period for the company.
Everybody was talking about I I think because of those 14 those 14 drops right everyone was talking about Saram and the India AI impact summit. Um what does the rest of this year look like and going into 2027 uh as well?
>> No I think now it's really about building right it's about exploding right. So, we have some really strong models. We'll continue to improve upon them. You'll probably see a SARS V4 or a Bulbull V4 fairly soon. So, stay tuned for that. But, uh I think uh the goal is to just blow up, right? I think we want more developers coming to our dashboard.
We want more people trying the ST. We want more people trying SARS and Bul. We want more people trying the OCR models.
And we want to see a lot of cool stuff being built. Right. I think the dashboard is well equipped for all developers, all enterprises who want to just get started in a flash, sign up, you know, have a look at um create an API key and just start building, right?
I think it's designed to do that. And we this year the focus is going to be to just blow things up, right? And we are here to support the community. We are here to listen to the feedback and uh we want to do our best to continue to support everyone. So, thank you so much for the tour.
>> Thank you so much. Thanks for having me and uh thanks everyone for watching.
Related Videos
NEW Hermes Mission Control is INSANE!
JulianGoldieSEO
405 views•2026-06-11
The Man Who Named AGI Says We're Doing AI Wrong [ft. Peter Voss @ AIGO.ai]
arcanumventures
221 views•2026-06-11
"Netflix Knows What You'll Watch Next — Here's How" #netflixalgorithm
ClearAutomate
313 views•2026-06-10
Unlocking AI's Dirty Little Secrets: Domain Reduction Explained #shorts
AIExplainedHubX
848 views•2026-06-10
Certified LLM Security Professional (CLLMSP): 100% Free Exam Opportunity
cybersecmaison
107 views•2026-06-08
I Built a 24/7 Finance Analyst With Claude (Full Tutorial)
lukefinance100
302 views•2026-06-11
Apple gives Siri an AI makeover in bid to catch rivals
Reuters
5K views•2026-06-09
The terrifying reason AI will make humans politically and economically irrelevant forever. 🚨
FlashFunTV-o1u
628 views•2026-06-10











