OpenRouter's Fusion API demonstrates that compound AI models, which send prompts to multiple models in parallel and synthesize their responses, can achieve performance comparable to frontier models like Claude Fable 5 on specific benchmarks (69% vs 65.3% on the Draco deep research benchmark), though this approach introduces trade-offs in speed, cost, and applicability to non-research tasks like coding or visual reasoning.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
OpenRouter Fusion API Tested: Is It Really Fable Level Intelligence?
Added:When you hear that someone made a Fable level competitor, you probably think of OpenAI, DeepSeek, Google, Moonshot, that whole crowd. But the company doing this right now is actually Open Router, which is kind of unexpected because they aren't even a model lab. They are the routing layer most of us use to hit different models from one API. And their new thing is called Fusion. It isn't a model. It's a compound API that sends your prompt to multiple models at once and then stitches their answers together. They are pitching it as fable level intelligence at half the price, which is a pretty bold thing to put on a landing page. Actually, they take your question, fan it out to a panel of models, then a judge model reads every reply and writes the final answer based on the differences and overlaps. So, if I send a question about how attention works in Transformers, the prompt goes out to maybe Opus 4.8, GPT 5.5, and Gemini 3.1 Pro in parallel. Each one of those does its own thinking with web search and web fetch turned on. Then, a synthesizer model, often Opus 4.8 by default, looks at all three responses and pulls out where they agree, where they disagree, what they each missed, and writes one clean answer. The whole thing runs on open router side. So from your code, it just looks like a normal model call with the slug open router/fusion.
The benchmark they used is called Draco.
It was built by Perplexity for deep research style tasks. 100 tasks across 10 domains like law, medicine, finance, academic research, product comparison, UX design, needle in a haststack retrieval and a few others. Each task is graded against around 39 weighted criteria and wrong answers carry negative weight. So you can't just be wordy to inflate your score. They reported the mean normalized score from 0 to 100. The headline number is Fable 5 plus GPT 5.5 fused together using Opus 4.8 as the judge scored 69%.
Solo Fable 5 hit 65.3%.
Opus 4.8 paired with itself using Opus 4.8 8 as Judge scored 65.5%.
Which is interesting because that's basically a Fable level score from a model that on its own scored 58.8%.
The budget panel they ran Gemini 3 Flash plus Kimmy K2.6 plus Deepseek V4 Pro came in at 64.7%.
That's within one point of Fable 5 while costing roughly half the price. Solo GPT 5.5 was at 60. Solo Opus 4.8 was at 58.8. Solo Kimmy K2.6 was at 53.7. Solo Gemini 3.1 Pro was at 45.4 and Solo Gemini 3 Flash was at 43.1. There's a small footnote on the Fable numbers that you should actually read. Fable 5 only completed 93 of the 100 tasks. The other seven got blocked by Fable's own content filters and they chose not to fall back to Opus for those. So when they say 65.3% for Fable, that's 93 tasks, not 100. The other models ran the full 100. Direct comparisons are a bit uneven because of that. Another thing worth flagging from their own writeup is that when they first ran the benchmark, the panel models were finding the Draco grading rubric online through web search. That's a real contamination risk, even if it wasn't intentional. They patched it by adding the benchmark hosting locations to a blocked domains list. They say all the published results were after that block was in place. Good that they caught it, but it does tell you how delicate this kind of benchmark setup is. They also found that about 3/4 of the lift you get from Fusion comes from the synthesis step itself and only about one quarter comes from having different models in the panel. The Opus paired with Opus result is the proof of that.
Running the same model twice and then having it judge its own outputs still gives you a six or seven point jump from the second pass cleaning things up. Now, here is the part that bothers me a little. They tested this on one benchmark, a research benchmark. And then the headline says, "Fusion surpasses Frontier performance." The word Frontier covers a lot of ground.
Frontier means coding. Frontier means agentic tool use. Frontier means long context. Frontier means visual reasoning. A panel of models doing parallel deep research is good for deep research. It is not automatically good for everything else. So claiming fable level intelligence in general off a single deep research benchmark is a stretch. Let me show what I actually got when I used it. I tried two visual prompts that I usually run on new models to get a feel for spatial reasoning and code quality. First one was a 3D hourglass simulator. I asked for an interactive 3D hourglass with sand falling through the middle, draggable to rotate, with a button to flip it. The output worked, but the sand particles were kind of clumpy and they fell in this weird straight line. Instead of actually pouring through the neck, the glass itself didn't have any real refraction, and when you flipped it, the sand on the bottom didn't behave like a pile. It just teleported. functional but nothing I would say is impressive for the price you are paying because you are paying for three model calls and one synthesis call. Opus alone gives me a comparable hourglass in one call. Second one was a 3D geometric aura. So basically a floating object surrounded by orbiting rings and particles, the kind of thing you see in landing page hero sections. The geometry came out okay. The colors were fine. Lighting was acceptable. But the orbital math was off. The rings were intersecting the center object in ways that looked broken, and the particle field around it was static instead of drifting. I have seen GLM and Kimmy produce better-looking versions of this exact prompt in one shot. The Fusion version felt averaged out, which I think is actually a real problem with how this works. When you merge three model responses into one, you sometimes lose the sharp parts of the best response and end up with the middle. The slow speed is the other thing that hits you fast because the API waits for the entire panel to respond then runs the judge then writes the final answer. You are sitting there for a while for deep research style questions where you would have waited anyway. That's fine for anything interactive or agentic. It feels heavy and it costs more by default because you are paying for every model in the panel plus the synthesis call plus open router fee on top. Most agent frameworks don't really support it cleanly either. The open router/fusion slug works as a model alias, so you can plug it into anything that takes an open router model, but the tool calling behavior is different from a normal single model setup. And a lot of harnesses get confused by the timing.
So, if you were hoping to drop this into a coding agent and have it just work, you might be in for a bit of fiddling.
The honest reading of this whole thing is that fusion is a clever piece of engineering for a narrow use case. Deep research where you want multiple perspectives and a synthesized answer that is a real fit. If you have a complex legal or medical or financial question and you want to make sure no single model is hallucinating its way through, sending it to three models and merging the answers is a sensible approach. That part I am on board with.
What I am not on board with is the framing. The blog title literally says surpassing frontier performance. The Twitter announcement says the smartest compound model in the market. Those are claims that imply general superiority over fable 5 from one number that came from one benchmark that grades one type of task. It would have been more useful if they had run this on a coding benchmark, an agent benchmark, and maybe something like a long context retrieval benchmark and then shown where Fusion wins and where it loses. right now. You basically have to take their word for it. The other thing I would point out is that Fable 5 is gone for most users at this point, which sort of conveniently makes their main comparison unverifiable for the average person. You can't go run Fable 5 on Draco yourself and check. The numbers stand on their reputation, which is fine because Open Router is a real company, but it is still one vendor publishing their own results about a model that is no longer available. If you have a deep research workflow and the latency does not kill you, Fusion is probably worth trying for the budget panel alone. Getting close to fable level numbers from three cheap models is a real result and the cost math could make sense for high volume use cases.
For everything else, especially code and 3D and agent workflows, you are usually better off picking the right single model and saving the weight. Open Router is excellent at routing. this is a sensible product for them. The marketing just got ahead of what the benchmarks actually prove. All right, so that's it from the video and I hope you enjoyed it. If you did, please like this video and subscribe to the channel.
Related Videos
NEW Hermes Mission Control is INSANE!
JulianGoldieSEO
405 views•2026-06-11
The Man Who Named AGI Says We're Doing AI Wrong [ft. Peter Voss @ AIGO.ai]
arcanumventures
221 views•2026-06-11
"Netflix Knows What You'll Watch Next — Here's How" #netflixalgorithm
ClearAutomate
313 views•2026-06-10
Unlocking AI's Dirty Little Secrets: Domain Reduction Explained #shorts
AIExplainedHubX
848 views•2026-06-10
Certified LLM Security Professional (CLLMSP): 100% Free Exam Opportunity
cybersecmaison
107 views•2026-06-08
I Built a 24/7 Finance Analyst With Claude (Full Tutorial)
lukefinance100
302 views•2026-06-11
Apple gives Siri an AI makeover in bid to catch rivals
Reuters
5K views•2026-06-09
The terrifying reason AI will make humans politically and economically irrelevant forever. 🚨
FlashFunTV-o1u
628 views•2026-06-10











