Zaya1-8B is a masterclass in architectural efficiency, proving that smart routing and self-reasoning can outperform raw parameter scale. It’s a refreshing shift toward sustainable AI that prioritizes algorithmic depth over brute-force compute.
Deep Dive
Voraussetzung
- Keine Daten verfügbar.
Nächste Schritte
- Keine Daten verfügbar.
Deep Dive
Zaya1 8B - Intelligence Efficiency by Zyphra - Run LocallyHinzugefügt:
Zyra has just released Zia 8 billion.
Zyra, an AI lab based in San Francisco was quite active last year when they were releasing models like Zonos, ZR1 and few others and we covered all of them and then they just went silent and now after more than a year they have come up with this new open-source model Zia 8 billion. Now the model is quite unique in few senses. First up, this is the first model which was trained entirely on AMD hardware which itself is a first for a model at this performance level. The hall of mark for this model according to these benchmarks is that it has beaten lot of large scale models plus some of the open-source models. I'm going to go into more detail of this model and we will be not only discussing these benchmarks but also the architecture from various angles and don't worry and don't get scared by these diagrams. We will cover them in very simple language. This is Fad Midza and I welcome you to the channel. We are going to install this model locally and we are going to test it out to see how much true these benchmarks are. But before that let's talk bit more around this model as what exactly is happening here. So as I said it's an 8 billion parameter model a mixture of expert architecture with only 760 million active parameters out of 8.4 billion total. It was trained on AMD which is quite good. The goal of model is quite simple as per their blog post to squeeze maximum intelligence out of minimum parameters and the results which we see here are really hard to argue with.
For example, if you look at these benchmarks for the large scale comparison, Zia 18 billion with its Marovian RSA boost which I will discuss shortly actually beats Claude 4.5 set and Gemini 2.5 Pro on hard. Yes, both Claude and Gemini are bit older but remember we are comparing an 8 billion parameter model which has only 760 million active parameters.
It stays competitive with Deep Sea Car 1 and really comes close to GPT5 which is remarkable.
If you look at this comparison with open-source models, it has beaten Acriity Mini and Mistrol Small 4 on most of the benchmarks but it gets beaten by Nvidia Neotron 3 Nano on few task. The standout story is that an 8 billion total parameter model is competing with 100 billion plus models on hardware and we will also test it out when we install it. The tool which I'm going to use in order to get it installed is VLLM.
I already have VLM installed. If you don't know what VLM is and how to get it installed, just watch this second video and you should be able to get it installed.
This is my terminal where I am running this V LLM and I have Ubuntu and this is my GPU card Nvid RTX 6000 with 48GB of VRAM. I'm going to serve Zyra's Zia model with VLLM as you can see and they have their uh own tool call parser but they are still using the celebrated quen 3 reasoning parser to go with chain of thought. If you're looking to rent a GPU on very good price, you can find the link to master compute in video's description with a discount coupon code of 50% for range of GPUs. Okay, so now let's run this.
And the first time when you run this, it downloads the model. So let's wait.
There are four shards of this model.
And the model is now being served. Let's try it out.
So model model is loaded into open web UI. Let me quickly show you the VRAM consumption. So it is consuming just over 46 gig of VM around 47 which is quite high but anyway that is what these models are these days. So first up I am going to test it with um a very challenging prompt where we are testing [clears throat] the model's multi-step mathematical reasoning under a realistic scenario. So it you can see that I am asking it that it is an AI assistant which is helping a pilot in an emergency. A commercial aircraft is flying over East Java, Indonesia at 35,000 ft when Mount um Semiro suddenly erupts. The aircraft is currently at this nautical mile due east of volcano.
The volcanic ash cloud is expanding uh radially at 35 knots in all directions.
The aircraft has this following situation that some statistics around speed, wind, fuel, nearest safe airport is this one. And the pilot now need to calculate how many minutes before the ash could reaches aircraft. Calculate the correct heading to Bali. Calculate if there is enough fuel to reach Bali and then state the recommended immediate action. I will run this. Now what we are testing is that if it can handle geometry, wind drift, fuel burn rate and time calculations all at once, not just plugin formula. And this is a kind of problem that should expose whether a model actually reasons or not. Okay, it has come back with the response. The model is quite quick. Of course, it is fully loaded onto my GPU.
Okay, so I have checked the response and I will also scroll through. The model has correctly handled fuel calculation and the wind vector math and its ash cloud timing of 80 minutes is actually more accurate. Uh which is quite good uh because aircraft is stationary relative to the ash not closing at combined speed. But there is one mistake because it has calculated some heading of 183° which is wrong. It should be around 228° towards Bali. I think the model confused the wind correction angle calculation with the bearing calculation entirely.
But other than this mistake, I think it has done well in terms of very quick and targeted thinking and has given us quite a good example. But you know what? In these situations, we can't make these mistakes. So I will leave you be the judge. What do you think about that?
Okay. Now let's try out another one.
Okay. Before I run the next test, one thing you need to keep in mind that this model is primarily a reasoning and mathematics specialist that also does coding well. The post-training pipeline makes this very clear in their model card that the heavy RL phases were focused on math logic, puzzle solving, and coding. So next test I'm going to do this coding test where I'm asking it to build me a realtime collaborative code editor using Python and fast API with lot of other requirements. So this is going to test whether the model can architect a multicomponent system correctly in one shot webocket server section state management uh front- end integration and real-time broadcast. So you see it is thinking quite you know nicely. It is just slicing and dicing our prompt and then going from there. I will let it work and meanwhile let's talk about this architecture in very simple word. So think of this Zia 18 billion as a factory assembly line.
Your text goes in at the bottom passes through multiple processing stations stacked on top of each other and a predicted word comes at the top. The key innovation is the CCA block, which is Zyra's smarter, more efficient way of letting the model pay attention to words in context. Then a router decides which of 16 specialist expert modules should handle that particular input rather than using all of them every time. That's what keeps active parameters low at only 760 million despite having 8 billion in total. Now, this is where things are getting more interesting. This is Marovian RSA. Let me try to unpack this uh in a simple mere model lingo. Imagine you give the same hard math problem or coding problem to a classroom of students at the same time. They all work independently and produce their answers.
Then you extract just the final conclusion from each student. Mix them up and hand new students a question along with a few of those conclusions as hints. Those new students produce better answers. You keep repeating this cycle.
The context window stays fixed because you only carry forward the tail end of each reasoning trace, not the full thing. That's Marco and RSA. And this is what gives Zia 18 billion its big performance boost on hard math.
So hopefully now you know what this is.
Let's go back. It is still thinking. I will let it think and then once it produces a code, we will check it out.
Meanwhile, please feel free to follow me on X. If you're looking for AI updates and if you want to help out the channel, please become a member of the channel, like the video and subscribe and um you know, just spread the word. I would highly highly be grateful.
So after thinking for 4 minutes, it has generated the code. As you can see on the surface of it, the code looks okay to me. It has covered all the components and then it is telling me what dependencies which needs to be installed on the local system and then I need to run this uicon server and then I can open my different session ids. Okay, so let me just go and to my terminal and then I already have this installed. So I'm just going to copy this in a folder and then run this.
I have pasted the code as is in this file which model has generated as you can see and this is where I just created this directory and put it in VS code.
Let me run that app.py now.
Okay. So the server is running fine as you can see here. Let me go here and let me open the first session.
There you go. The session is open. terms of what I'm going to simulate here. I'm going to open two browser windows, not the tabs to simulate two different users collaborating in real time. Both will join the same session by using the same session ID as you can see which is for 123 in URL. So whatever one user types should instantly appear in the other window and no refresh should be needed.
Pure websocket magic. So I'll just say um or maybe let's say print something like hello there.
So this is first session. I'll just copy this.
I will open a new window.
This is a new window.
And there you go. So it is good. It is websocket has connected. I'll just put comment here.
comment added by second session.
That is good. Let's go back to the other one.
Where is it?
How many windows I have? So, I'll just Sorry.
There you go. So, this is the one comment added by this one. And it is also showing me some other stuff collaborative. So, whatever is there, I'll just maybe delete this. So, this is my previous one. My apologies.
You see this is the first session and let's go back and show you the second session. This is second one. Now let me go back to the first again and I'll just say command edit by first section.
Let me minimize this and let's go back.
So this is the first section. You see it's a realtime one and the session is working. Um you can also do the you know isolation if you like like totally different session maybe I'll just call it 1 to5 open a new session. This is completely new session by testing and then I will just open the previous one.
So see this is a previous one.
Nothing was changed here. The code works perfectly well. So I think its coding capabilities are also quite good as you just saw. So look, that's it. Let me know what do you think about this. I think I'm quite impressed by the model.
Uh Zyra has been really doing well um in the past, you know, few years except the last one. I don't know where they went, but they're back. And if you're interested in learning more about their previous models, just check these videos out. They were quite good. Let me know your thoughts. Again, please become a member of the channel and follow me on X as that helps. Thank you for all the support.
Ähnliche Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











