Mixture of Experts (MoE) models are trained by starting with random weights and using a router to dynamically select which experts (typically 2-3 out of 9) answer each question, allowing expertise to emerge naturally during training rather than being manually assigned by humans, with one expert often becoming a generalist while others specialize in specific domains.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
MoE Training Explained (Part 1): How Mixture of Experts Models Actually LearnAdded:
Let's talk about how mixture of experts models are trained. You know, the intuition behind it, not really the math behind it, but just the intuition how the how do they get trained? Now, MOEs are these new kind of neural networks where you use a collection of [music] experts instead of using a single neural network, you can think about the neural network as a collection of experts which are not all used to answer a question.
Only a subset of them are used to answer a question. And for example, when you're asking a medical question, it is a different question than, you know, a legal question in real estate. So, the expertise you need for answering the medical question is different a different expert can answer that question. Now, the question becomes, you know, how do you train such a network?
Do you actually come up with, you know, this expert will handle medical questions or scientific questions and this other question one will answer business questions. No, actually that is not how it is trained because in the beginning we uh humans, you know, humans do not make that judgment. The people who are training these models, they do not make the judgment uh what exactly these expertises would be. They just come up with the idea that there will be nine experts, okay?
Now, let's say uh you know, they start with random weights and there is something called a router. The router's job is to decide given the question, right? Given the tokens and decides which of the [music] experts to choose.
And usually, you know, out of let's say there are nine experts, they could be choosing only two or three experts [music] to actually answer any question, right? And one of these experts could be a generalist which has knowledge about uh all the domains and the other experts could be specialists.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











