Chrome's Prompt API enables web developers to interact with a built-in LLM (Gemini Nano) directly in the browser, offering an alternative to cloud-based AI services by keeping data local and reducing costs. However, this implementation faces significant criticism from Mozilla regarding interoperability (prompts are model-specific and not portable), vendor lock-in (requiring acceptance of Google's terms of service), security risks (any website can trigger the model), and practical limitations (requires 22GB storage, specific hardware, and excludes mobile users). The API currently works only on Chrome and Edge desktop browsers, with reliability concerns showing approximately 20% failure rates in testing.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Chrome is about to break the web... AGAIN!Added:
Since I was raised in Eastern Europe, surrounded by the economical and social success of a failed communist country, nothing brings me more joy than being disappointed and depressed. And this is why from time to time, I like to ask myself what the Chrome team is doing to make my web developer job more miserable. I mean, Chrome already requires 2 GB of RAM to run all those npm libraries needed to render an AI generated form on the screen. And these days, the browser freezes my MacBook Pro whenever I'm listening to software development podcasts on YouTube. So, Chrome is already a pain. But I'm confident they can do much worse. Like, for instance, they could silently install a 4 GB AI model on your device without consent. A few days ago, I ran into this GitHub comment mentioning that the Mozilla team is against one of Chrome's proposals called the prompt API. So, I had to check it out. Of course, the proposal is about AI because everything has to be about AI these days. At this point, AI is mostly the 10x productivity multiplier of nontechnical tech founders who like to brag about the dumbest metrics in history while having nothing concrete to demo. But sadly, the rest of the tech world is kind of forced to play along so that they don't get left behind. So, in this Monday morning review, we'll look at the implications of bringing an LLM in the browser and how this could shape the future of web development. In simple terms, the prompt API lets web developers talk to a tiny LLM called Gemini Nano, which is readily available in your Chrome installation. The pitch from Google is actually reasonable on paper considering the current AI frenzy.
They argue that today, if you want to add AI features to your web app, you basically have two options. On one hand, you could send everything to a cloud API like OpenAI or Anthropic, which means your user's data leaves their device.
You pay per token, and your app stops working the moment your only user burns through a few thousand worth of tokens, and they max out your credit card. The second option is to ship your own model in the browser using WebGPU and Web Assembly, which is technically possible, but requires sending hundreds of megabytes of model weights to every visitor. Having a huge payload as a requirement on your app was the kind of bad decision that would get you fired in any reasonable company a couple of years ago, but times have changed and nobody cares if you mess up as long as you are productive. So, Chrome's alternative, since the browser already ships with an LLM, is to use the one that's already there. Gemini Nano is in the 2 to 4 billion parameter range quantized to fit in around 2 GB of disk space. It's not going to write your PhD thesis, but it can do the kind of tasks models are actually good at. The actual API is really simple. You just have to create a language model instance and then send it your request. You can also pass a system prompt, configure temperature, and do all the usual LLM ceremonies you'd expect. So, this actually sounds like a decent solution, but then you read the requirements. To use the prompt API, your users need at least 22 GB of free space on the volume that contains the Chrome profile. They also need either a GPU with strictly more than 4 GB of virtual RAM or a CPU with 16 GB of RAM and four cores or more. On top of that, Chrome for Android, iOS, and Chrome OS on non-Chromebook plus devices are not supported. So, that's already most of the planet excluded, but let's not allow this small detail hold us back. It also doesn't work in Firefox, doesn't work in Safari, and Mozilla has made it pretty clear they have no intention of shipping it. So in practice, the prompt API today is a Chrome and Edge feature desktop only only available on machines recent enough to run a local model without bursting into flames. Realistically, if you actually ship a feature that depends on it, you have to write a fallback for everyone else, which means you have to call a cloud API anyway. Note that as of right now, the prompt API is stable in Chrome 138 and later, but only for Chrome extensions. For regular web pages, it's behind the feature flag. So, Mozzilla disagrees with all these and this is where it gets interesting. But before looking at their arguments, please let me tell you a few words about today's sponsor. Are you tired of wrestling with a fragmented AI stack?
Building modern AI applications usually means you're stuck patching together an operational database, a separate vector database, embedding models, and a search engine. Managing these separate systems requires complex pipelines. This increases your total cost of ownership and introduces data staleness. So your AI answers questions based on outdated information. Enter MongoDB. It is the data platform for AI that collapses these three layers into a single unified architecture. MongoDB. With MongoDB's flexible document model, you can store your source data right alongside its vector embeddings. Because there is no data movement between disjointed systems, your AI models have instant access to the freshest data. Plus, it integrates seamlessly with major model providers and standard frameworks such as Langchain and Llama Index. Simplify your architecture, reduce complexity, and speed up your time to market using a single API. Discover how to build smarter applications at MongoDB.com today. Back to the video, Mozilla's first and most important argument is interoperability. The thing about LLMs is that prompts are not portable. If you spend 3 weeks tweaking your system prompt so that Gemini Nano stops returning JSON wrapped in markdown code blocks, congratulations. You now have a prompt that only works on Gemini Nano.
The moment Edge runs the same code through Microsoft's model, your carefully crafted prompt is going to behave completely differently. Prompts are tightly coupled to models and developers will inevitably tune to the quirks and policies of whatever model they're building against. That's how you end up with model specific code paths, which is the browser compatibility problem all over again. We spent 20 years digging ourselves out of the best viewed in Internet Explorer 6 message only for Chrome to come back with best viewed in Chrome 138 on this specific hardware setup. To be fair to Chrome, however, the prompt API isn't the only AI feature they're shipping. They're also adding a translator API, a summarizer, a writer, and a rewriter, which are all test specific. Two browsers can implement a summarizer differently and still produce roughly the same result. So, the prompt API is the odd one out because the input is an open-ended prompt and prompts don't standardize. The second argument is where the real fun begins because in order to use the prompt API, you have to acknowledge Google's generative AI prohibited uses policy. So, to use a web API on a website you wrote, you have to accept Google's terms and conditions.
Mozilla points out that if using a web API means accepting a specific vendor's content policy, especially one that goes beyond law, you're not really building for an open platform anymore. They call this a dangerous precedent, and they're not being dramatic. The web has always been the one platform where you didn't need permission from a corporation to ship code, but now Google wants to add an API where by definition, the company gets to decide what you're allowed to do with it. And what's really funny is that these solutions aren't even working consistently.
>> 60% of the time it works every time.
>> A report from February compared Chrome's Gemini Nano against Edg's 54 Mini Instruct using the prompt API. For generative tasks, 24% of Edg's responses and 15% of Chrome's responses fail to complete a task. For classification, around 30% of edges and 24% of Chrome's responses didn't categorize the input correctly. Edge hallucinated 17% of the time, Chrome 6%. So, at the end of the day, the pitch is to download 4 GB of model weights, accept Google's terms of service, exclude every mobile user on the planet, and in exchange get a feature that fails roughly one out of every five or six times. What's funny is that the Chrome engineer responsible for shipping the prompt API shares Mozilla's concerns, but he believes in experimentation and learning from mistakes, overstalling innovation out of fear of what might happen. Yes, you're right. This might break the web, but we're going to ship it anyway and see what happens. This is the kind of quote you see in those documentaries right before someone explains how the reactor exploded. And here is your awesome recommendation of the day. Chernobyl is the perfect merge of Eastern European communism nostalgia and catastrophic engineering optimism. This is also where the not great not terrible meme comes from, which is the perfect depiction of the future of the software developer job. On top of Mosilla's arguments, there is also this small issue of security and data privacy. A local LM that any web page can call is a brand new attack surface. Any random website you visit can now quietly prompt the model in the background and prompt injection becomes a browser level problem instead of a server level one. A malicious page can stuff hidden instructions into the dumb or just burned through your CPU and battery while you are reading the news. So that's where we are. Chrome is going to ship a built-in LLM tied to one specific Google model behind one specific Google terms of service agreement that doesn't work on mobile that excludes most of the planet that fails 20% of the time and that even the Chrome engineer shipping it admits has legitimate concerns and the rest of us get to build fallbacks for every other browser. Why is Google really doing this? Most likely because the cloud interface is expensive. So pushing it onto user devices saves Google enormous money while giving them control over the AI surface of the web.
If you like this video, you should consider joining our community where I'm posting more dedicated weekly content.
Please don't forget to smash all the buttons.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











