Google’s Gemini 3.5 Flash is a marketing illusion where extreme verbosity makes the "budget" model more expensive than the Pro version. This analysis provides a necessary reality check, proving that low unit prices mean nothing if the model burns through tokens unnecessarily.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Gemini 3.5: The Hidden Cost of the Flash Model (It's Not the Budget Model)Added:
Google shipped a new flash model at IO 2026, and it's both more impressive and more confusing than the headline makes it sound. Gemini 3.5 flash is the newest entry in the flash tier, which has always been the fast and cheap lineup, the budget workhorse. But this version barely behaves like a budget model anymore, and that's the part worth slowing down on. Google is openly calling this their strongest agentic encoding model to date, which means it scores above Gemini 3.1 Pro on a bunch of the harder tests.
A flash model beating a pro model is not the usual order of things. Normally you pick flash when you want speed and you're fine giving up a little intelligence. Here you're getting frontier-level results while keeping the quick response times flash is known for.
On output speed, Google claims it runs about four times faster than other frontier models, pushing over 280 tokens per second. And that speed is real. It's a genuinely good model, fast, sharp at coding, with a quality jump you notice within a few prompts.
The reason this model exists, though, isn't single questions. It's agents, and Google built it alongside their agent platform Antigravity, so the two fit together. They even said on stage the model needed a native place to live, work, and execute, which is a fancy way of saying the model and the tooling were designed as one thing. Look at this example where 3.5 flash generated a whole agentic city from a single prompt.
You tell it to build a tool that creates a city using AI agents, where one main agent makes the big calls on layout and style, then sends out smaller agents to do the actual building. That's the whole instruction. It spins up an orchestrator, the boss agent, and hands out jobs to a swarm of sub-agents, all working at the same time. You can watch the telemetry panel track every active one. Agent one laying a commercial foundation on one plot. Agent two, doing residential on another. Three, four, five, all running in parallel. What matters here isn't the pretty city, it's that you're watching the model plan a project, break it into pieces, hand those pieces to copies of itself, and keep them all coordinated while they run at once. That's what agentic actually means, and it's the whole pitch. But, my problem is the money, because the entire point of a flash model is to be cheap, and this one is not cheap the way the name suggests. It runs $1.50 per million input tokens, and $9 per million output tokens.
Set that next to its own family, and it's roughly three times the price of the previous Gemini 3 Flash, and about six times the older Flash Light. It's crept so close to Gemini 3.1 Pro, which sits at $2 and $12, that the gap almost disappears. So, the cheaper sibling is now priced almost like the expensive one. Per token cost is only half the story, though, and the other half is uncomfortable.
According to Artificial Analysis, running their full benchmark suite on this model cost over five times more than the previous Flash, and about 75% more than Gemini 3.1 Pro on certain workloads.
Part of that is the higher price, but a big chunk is that the model just produces a lot more tokens. It's verbose. It thinks a lot, it talks a lot, and on agentic work, it burns through turns. Their numbers show it generating around 73 million output tokens to finish the suite, while models in the same price range average closer to 36 million. You pay per output token, so a chatty model quietly doubles your bill without you touching a single prompt. There's one comparison that really drives this home, and it's against Google's own Pro model. Running the full intelligence suite, Gemini 3.1 Pro generated about 57 million tokens, scored around 57 points, and cost roughly $890 to run. Gemini 3.5 Flash, the supposedly cheaper one, generated about 73 million tokens, scored slightly lower at around 55, and cost about $1,550.
So, the budget model used 22% more tokens, scored a touch worse, and ended up about 75% more expensive to run than the pro model it's meant to undercut.
That's a strange result for something sold as the efficient option, and it's the single fact I'd want anyone to sit with before wiring this into production.
That tension is exactly the thing Google is trying to sell, and you can see their whole strategy in it. Also, if you mix Flash with their heavier Frontier models, they could save a serious amount. Use the fast, cheapish model for the bulk of the agent work. Save the premium model for the hard parts. And Google is putting its own money where its mouth is. Because they used anti-gravity with this model to build a complete operating system from scratch in about 12 hours, and the whole thing reportedly cost under $1,000 in API credits. That's the proof point they want enterprises to remember, a multi-week engineering project compressed into an overnight agent run for the price of a nice laptop.
Internally, Google is leaning on this hard, too. Gemini 3.5 Flash is now the default model in the Gemini app and in AI mode in Search worldwide. So, most people are using it without ever picking it. It's also powering a new, always-on personal agent called Gemini Spark, which runs around the clock and takes actions on your behalf under your direction. Google is clearly betting that Flash, not the expensive pro tier, becomes the workhorse behind almost everything they ship, which is why they were willing to push the price up. They expect the volume to be enormous. The benchmark wins back this up where it counts. On Terminal Bench 2.1, it hit 76.2% clearing 3.1 Pro's 70.3. On MCP Atlas, which measures multi-step tool workflows, it scored 83.6% and beat every model in Google's own table, including Claude Opus 4.7 and GPT 5 our head. On Finance Agent V2, a test of financial analysis and decision-making, it jumped to 57.9% from the previous Flash's 42.6, almost a 15-point leap. And that one matters because it points at real office work. Where it's still behind is long context retrieval and the deep academic reasoning tests where 3.1 Pro and the rivals hold their ground. So, this is a specialist tuned for agents and code, not an all-rounder. The partner list tells you who it's for. Shopify runs sub-agents in parallel on it to forecast merchant growth. Macquarie Bank is testing it to speed up customer onboarding by reasoning over documents that run past 100 pages. Salesforce is folding it into Agent Force to automate multi-turn enterprise tasks where the agent has to hold context across many tool calls. Box said it beat the previous Flash by almost 20% on their enterprise work evaluation.
These are multi-week workflows that companies pay people to grind through, which is exactly the audience Google is aiming at. If you want to try it, the easiest path is the Gemini app where you sign in and select 3.5 Flash from the model list, no payment needed. For building, it's in Google AI Studio with a free tier and no credit card. And you can reach it through the API and Antigravity, which got a new standalone desktop app called Antigravity 2.0. The model ID is Gemini-3.5-Flash.
The thing worth waiting for is what's next because Google confirmed Gemini 3.5 Pro is already being used internally and is expected to roll out sometime next month. If a flash model is already beating the last Pro on coding and agents, the pro version of this generation could reset the top of the leaderboard and it'll probably be the one to reach for on the long context and deep reasoning jobs where 3.5 flash still falls short. So this flash release feels like the opening move, not the main event. What I keep coming back to is that this is a strong model with a confusing identity. It's fast, smart, creative, and it holds its own against models that cost more. The catch is the name promises cheap and the behavior delivers expensive on short tasks while paying off on the long agentic runs it was actually designed for. So my honest take, use it where speed and agentic coding are the priority. Route your simpler stuff to something lighter the way Peach I literally suggested and watch your token usage closely before you scale it. It's a good model. Just go in knowing the efficient label has an asterisk this time and test it against your real workload before you trust the marketing. All right, so that's it from the video and I hope you enjoyed it. If you did, please like this video and subscribe to the channel and I'll see you in the next video.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











