Baidu's ERNIE 5.1 demonstrates that frontier-level AI performance can be achieved with significantly reduced training costs by employing a multi-stage reinforcement learning training pipeline with multi-teacher on-policy distillation, which avoids the traditional 'seesaw problem' where improving one capability (like coding) harms another (like reasoning), allowing the model to balance reasoning, coding, agentic behavior, creativity, and instruction following more effectively than traditional sequential training approaches.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
ERNIE 5.1 Beats DeepSeek-V4-Pro in Agents -- China's New AI King Is HereAdded:
Hey guys, welcome back to another new exciting video, another new model, and it is Ernie 5.1 and it is from Baidu.
This is one of the Baidu's efficient frontier AI model, and here you see they have compared their model with the Deep Sheep version 4 Pro, Claude 4.6, and also Gemini 3.1 Pro. And they have compared it on different different agentic and uh reasoning benchmarks like the popular AI ME 26 with tools calling capability, and also deep search QA space seat bench, and also So, for reasoning, here you see GPQA and AI ME 26 without tool call and MMLU Pro. And on all of these benchmarks, you see that this model is actually sometimes beating the top frontier model, and sometimes it is matching the same kind of score. Now, they have taken some different approach for this 5.1 model. Previously, they had the 5.0, and inside this 5.1, here you see, they significantly reduces the pre-training cost while compressing the total parameter to the 1/3 and activated parameter to 1/2 using only 6% of the pre-training cost compared to models at similar scale while achieving the leading performance in its class. Now, one thing I want to show you. Here you see, if you go to this arena.ai, now before discussing about this model in detail, guys, I have one small request.
Please follow me on Twitter. This is a Code Geass Twitter page. Here, you will get all of the latest information about AI. And for this search arena, search arena is basically when an LLM calls the search tool. So, inside chat.openai.com or inside Claude, you have seen that if you want to search any particular topic and it doesn't if the chat.openai.com or Claude don't have in their memory, then they call the search tool. So, in that category, here you see Ernie 5.1 is at number four. And if I show you the pricing for the GPT 5.5, here you see, its pricing is $5 for the input and $30 for the output token. This is GPT-5.5.
Now, if I show you that this on the search arena, here you see, it is giving almost same kind of performance like the 5.5 and also same performance kind of our past 4.7. Now, this 5.0 of Ernie, here you see that this is the pricing for the input 0.84 and for the output 3.37. And for the 5.1, they have not released the pricing yet, but they will keep the pricing same as the 5.0. So, you can expect that you are getting the almost same kind of performance like the 4.7 and 5.5 and also the cost is very very low, right?
Now, let's talk about the strategy and the uh the actual architecture they have implemented. So, instead of chasing the brute force scale, the focus is now on efficiency and also reinforcement learning and expert capability fusion.
And the result, it is a model that delivers the frontier level performance while dramatically reducing the training cost and parameter size that I have mentioned in this area. Now, you see the key highlights. So, first of all, strong agentic performance. And uh Ernie 5.1 shows major improvements in autonomous task execution and also tool using behavior. And you see on this T3 benchmark and uh and this spreadsheet bench verified, it is completely suppressing the Deep Tree version 4 Pro. And from this benchmark comparison, here you see, it is clearly visible that, okay?
And also uh this next point, powerful knowledge and creative writing.
So, strong world knowledge and creative writing. And here they have compared on this GPQA and MMALU Pro benchmark. And by the way, guys, you can try this model for free on their official uh website Ernie.baidu.com. I will give that link and there after the sign up with your Gmail account, you can see this kind of web page and here you will find two option. One is the 5.1 instant for the quick response and 5.1 thinking and it is for the deep thinking purpose. And they have also released one coding model. So, I will discuss on that coding model in a separate video. Okay?
Now, let's come back to this point. So, this model demonstrate the strong world knowledge and academic understanding with benchmark score on GPQA and MMLU Pro and it is approaching the leading closed-source frontier models. It is visible from the benchmark comparison also. And what is more surprising is its creative writing capability, which Baidu claim is nearing the level of Gemini 3.1 Pro and area where many reasoning heavy models usually struggle. Okay? So, here you see the creative writing capability nearing Gemini 3.1 Pro. Now, let's come to this next point, which is the frontier level reasoning performance. So, Ernie 5.1 performs extremely well on the advanced mathematical reasoning task. And on the difficult AI AIME 26 benchmark with tool usage enabled, it reportedly scored 99.6.
Here you see 99.6 on this AI AIME 26 with tool calls.
And it is second only to the only to Gemini 3.1 Pro. And this signals is that Baidu's reinforcement learning and reasoning pipeline is becoming genuinely competitive at the frontier level. Okay?
Now, next point, which is the deep search capability. So, Ernie 5.1 also appears optimized for retrieval and search-oriented workflows.
And if you see that this benchmark deep search So it is number four globally and number one among the Chinese model and also if you go to this search leaderboard of the search arena and you see it scored 1223 which means this suggests a strong capabilities in search reasoning, multi-hop retrieval and also wave assisted answering and also long contest synthesis. Okay.
Now there is another point if you go to their blog post you will understand if if you scroll down below there is a portion Yes, this portion and here one thing they have mentioned which is the stage three on policy distillation and multi-teacher. So what makes Ernie 5.1 different? And the biggest architectural innovation is its multi-stage reinforcement learning training pipeline built around. Okay, and this is this one this one multi-teacher on policy distillation. What is that? The short form you can say MOPD. So instead of training one monolithic model to learn everything simultaneously Baidu trains multiple specialist expert models independently.
And then merges their capabilities or abilities into a unified student model.
And also here you see in stage four it is using the online reinforcement learning after work to preserve the creativity and conversational quality.
Now what it will help you means where it will help you. So this avoids the classic seesaw problem where improving one capability harms another. Let's say inside your model you have the reasoning capability, coding capability, agentic behavior, creativity. Now um if you are uh if you're just uh making the model expert at coding, then let's say your reasoning capability got affected. Okay?
This is a kind of seesaw problem. Now, Baidu have uh actually improved that problem.
Baidu's model Ernie 5.1 avoids that problem. The model balances uh reasoning, coding, agentic behavior, creativity, instruction following, and conversational alignment more effectively than traditional sequential training pipeline. And also, they have mentioned many things like outstanding creative capabilities and uh explain it in a detailed way, so you can read it.
Not only that, uh there are other many areas they have mentioned like coding and also reasoning. So, uh let me just uh quickly go to this website ernie.baidu.com and there after login, you can actually test this model. And I have prepared a bunch of question uh using ChatGPT and uh 1 2 3 4 logical reasoning question like multi-step reasoning question, hallucinated test.
In this way, I have actually generated a lot of question. So, let me test this model with this question and let's see that uh if this model has the capability to give correct answer or not. So, first I'll start with this logic and reasoning. So, copy this question.
Okay, and uh go here and paste it. And I have selected this instant, okay, because I am just uh asking one simple question, not that much uh reasoning-based question. Means here, you don't need the deep thinking, so that's why I will go for this instant.
And here you see that I have got the answer within a second. The ball costs 0.05 uh dollar, means 5 cent. So, let's uh see the answer. Yes, the ball cost is 0.05 second and the bat cost is 1.05 dollar. Bat cost Where Where is the bat cost? Yes. Uh bat cost is uh 1.05 dollar, so total is 1.10. Yes, guys, so we have got the correct answer. Now, let's move to our next question, which is the multi-step reasoning. And let me copy this question.
So, the question is, if five machine takes five minutes to make five widgets, how long would 100 machine take to make 100 widget? Okay. Uh this is a great multi-step reasoning question. Let me go here and paste the question and press enter.
And here you see, within a second I got the answer. It takes five minutes. Okay.
And let me check the expected answer.
Yes, five minutes. And why five minutes?
And here you see, it has explained in detail. Okay. And it is actually amazing that in a detail it has given the answer, right?
So, now let's move to our next question, which is the hallucination test. Who won the Nobel Prize in physics in 2032?
Means, currently the year is 2026. And let's see that uh what kind of answer it is giving. This is a kind of hallucination test. Press enter.
And here you see that the answer is ready.
As of the current date, 26th of May 9th, the 2032 Nobel Prize in physics has not yet been awarded. The prize is typically announced in early October each year.
So, the 2020 2032 winner will likely be revealed around October 2032, more than six years from now. Okay. So, yes, it has given the correct answer. Means, it has passed the hallucination test also.
And after that, instruction following.
Write exactly 12 words about space exploration, no punctuation. Okay. Without any punctuation, exactly 12 words about space exploration. Okay, let's see.
Instruction following, we need to check.
Uh pressed it and press enter.
Okay. Okay, I got the answer. Humans have long dreamed of exploring the vast stars and distant planets. 12 words, 1 2 3 4 5 6 7 8 9 10 11 12. Yes, exactly 12 words I have got. Okay, and so it has passed that question also. Now, let's take about this coding.
Now, let's test this coding question also. Write a Python function to check whether a string is palindrome. Okay.
So, they have not published any coding benchmarks yet, means the competition.
But, I am thinking that this model have the capability to fight with the Claude Haiku 4.5 level up model.
As they have released another model for the coding purpose, so I will test that model in my next video. So, that's why I have not tested the coding purpose inside this video with the Arnie 5.1.
But, let's see that if it has the capability to give the correct answer for this small kind of score snippet or any code base exploration.
So, let's give this question and press enter.
Okay, within a second I have got the answer. Yes, so it has the coding capability also and debugging ability. What is the wrong with this code?
Okay, let's copy it.
And let's give it.
And here you see I have got the answer.
It will raise an index error, list index out of range. So, that means it is great that with a less cost you can actually use this model for any kind of purpose, for for reasoning purpose, for coding purpose, and also for creative writing purpose. So, this model really a great and golden model. And now, let's move to our next question, the creative writing.
Okay. So, this is the last question that I want to test because there are many other questions, but I think I am satisfied with this model. You also please test it and let me know your feedback, your thought with this model in comment section.
This is your last question for the creative writing. Okay.
And yes, paste it and press enter. Write a sad two-line poem about time. Okay, within a second I have got the answer. The clock keeps ticking, stealing moments we can never reclaim. Each second gone forever, leaving only shadows of our name. Okay, this is very very great poem it has written.
So, I'm very happy with this model, guys. And another thing is that I have not found any API in point information about this Ernie 5.1. Maybe in the coming days they will release it as a in in their API.
Till now they have given only these three links, means one is for testing and one is for blog.
So, you please go to this official website ernie.baidu.com. Link is given in description. Test it and please tell me that what kind of experience you are getting.
So, in detail I have expressed I explain it. If you found helpful and if you want to get this kind of latest AI model information, please don't forget to subscribe this channel. Don't forget to like this video also.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsโข2026-05-29
Long-Running Agents โ Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsโข2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsโข2026-05-28
BREAKING: Microsoftโs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsโข2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsโข2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsโข2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsโข2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 viewsโข2026-05-30











