Install our extension to search inside any video instantly.

HRM-Text-1B: A 1B Model That Beats 7B Models for $1,500: Test Locally
Added: 2026-05-30

1,081 views668:38fahdmirzaOriginal Release: 2026-05-29

HRM-Text-1B is a 1 billion parameter language model that outperforms larger models like GPT-3.5 on math and reading comprehension tasks, trained for under $1,500 using a hierarchical reasoning architecture with a two-speed processing loop (fast L module for quick representation refinement and slow H module for higher-level context updates) that achieves more internal computation than its parameter count would suggest, trained exclusively on question-answer pairs rather than raw web text.

[00:00:01]A 1 billion parameter model trained under $1,500 outperforms GPT 5 3.5 on math and reading comprehension. That is not a typo and you have heard it right.

[00:00:16]In this video, we are going to cover this HRM text 1 billion pre-trained model and we will see what exactly the fuss is all about. As you can see, there have been more than 100,000 downloads already.

[00:00:31]HRM text 1 billion is a pre-trained language model that achieves competitive performance.

[00:00:38]Now again, this is not a chat assistant.

[00:00:43]It needs further fine-tuning before you would use it as any I would say working model. It's a serious starting point though for anyone who wants to train a capable model from scratch without a data center budget and that is the whole story behind it. In this video, I'm not only going to install this, we will run this, but also I will tell you all about its architecture in as simple words as possible. There will also be a second video where we are going to fine-tune it on our own custom data set. This is Fahad Mirza and I welcome you to the channel. Let's get right into installation and while that happens, we will keep talking about this model. I'm going to use this Ubuntu system. I have GPU card and we have the RTX 6000 with 48 GB of VRAM. If you're looking to rent a GPU on good price, you can find the link to Mass Compute in video's description with a discount coupon code of 50% for a range of GPUs.

[00:01:45]Okay, let's go back and let's install all of the prerequisites which are simply transformers and torch and stuff and this is going to take a couple of minutes. While that happens, let me quickly show you these benchmarks.

[00:01:59]As I said earlier, it has already shown quite a good performance against 2 to 7 billion parameter models like Llama 3.2, Gemma 3, and Quan 3.5. Yes, bit of an older model, but that is not the point.

[00:02:11]The point is that very cheaply someone has created this model only in 1 billion parameter size.

[00:02:20]And not only that, but also they have used 100 to 900 times fewer training tokens. And it's Apache 2 license, you can use it easily.

[00:02:30]For me, this is the most interesting bit here.

[00:02:33]The core idea is a two-speed processing loop inspired by how the human brain separates slow, strategic thinking from fast moment-to-moment execution.

[00:02:46]Instead of one big transformer passing through layers once, HGRM runs two modules in a nested cycle. There is a fast L module that refines representations quickly, and then there is a slow H module that updates the higher-level context. This loop repeats multiple times per forward pass, giving the model far more internal computation than its parameter count would suggest without adding more weights.

[00:03:16]You can >> [clears throat] >> also consider it like model is thinking harder than thinking longer or bigger.

[00:03:22]The team also trained exclusively on question answers paired rather than raw web text, computing loss only on the answers, which forces every training step to focus on actually useful output rather than reconstructing prompt. And they have shared lot of other stuff on their card, which you can read through quite quickly.

[00:03:44]Okay, let's go back to my terminal. And by the way, if you want to help out the channel, please become a member and follow me on X if you're looking for AI updates.

[00:03:56]Let me now launch my Jupiter notebook as everything is installed.

[00:04:02]Let's download the model now from hugging face.

[00:04:06]And as you can see it's a very small model just 2.37 gig.

[00:04:10]It's already done.

[00:04:13]Okay, so now let's test it out. Let me quickly show you the VRAM consumption too because I don't expect it to use too much but you can just use it on CPU too.

[00:04:26]Just consuming you see just 2.6 or 2.7 gig of VRAM. You can easily run it on your CPU anyway.

[00:04:33]So let's test it out and I'm going to use these lines in order to run the inference.

[00:04:40]So this is a setup of the prompt with control tokens that tell the model how to respond.

[00:04:46]Now one reminder again, this is a raw base model no assistant tuning. The output may surprise you.

[00:04:54]The key idea here is such a small model you can fine tune it quite easily.

[00:05:00]Let me quickly explain what exactly is meant by all of these tags like I am start or quad and all that.

[00:05:07]So I am start and I am end in this second line, they mark the start and end of a single turn like opening and closing quotation marks.

[00:05:17]And then they have something called as sent it.

[00:05:21]Now uh when we say sent it is also called as you know quad end and it tells the model to respond in a clean structured style rather than messy web crawl style.

[00:05:34]Also then we have got which is internally called as object referent and that is what they are using.

[00:05:41]Now this stands for chain of thought meaning think through the answer step by step rather than jumping straight to a conclusion.

[00:05:49]Then we also have these Sorry.

[00:05:51][clears throat] Token type IDs, all ones, tell the model to read the entire prompt in both directions.

[00:05:59]And if you're wondering why on earth we are using these tags, because this model has no system prompt, no chat interface, no instruction tuning, so these tags are the only way to tell it how to behave.

[00:06:12]And that is the whole point of pre-trained model. Let's run it to see the output. I'm asking it explain why can't we have nicer things.

[00:06:22]And the model has come back with a response. If you go through this response, the output is coherent, structured, and actually reasons through the questions question in steps, which is exactly what the cot or chain of thought that was supposed to trigger.

[00:06:37]For an un- >> [clears throat] >> aligned base model trained on just 40 billion tokens, I think that is quite a decent result.

[00:06:47]And I will say it again, remember this is just a raw base model. But also notice that it cuts off mid-sentence uh at the end. That is because we kept generation at 256 tokens, and it simply ran out of budget. You can just simply bump up that max new tokens if you want the full answer, and then it is going to decode that one for you. Also notice that the raw control tokens are visible in the output, too. All the start quad That is because this is a base model with no chat wrapper cleaning things up.

[00:07:20]A tuned version would strip these automatically. And this model has never been through RLHF or instruction tuning, and never been really trained on anything specific.

[00:07:33]Another interesting bit is that you can even turn off the thinking here.

[00:07:38]So if you look at this new prompt, which is using the same prompt, but we are now using it with direct tag or object ref start tag this time, which tells the model to skip the reasoning and just answer. So, let me now run this. This shouldn't go with chain of thought.

[00:07:55]And look at the answer of the model. It just says no here. That's it.

[00:07:59]So, this is a direct mode. No [clears throat] reasoning, no structured, just a blunt one-word answer. And that is a contrast. Two tags, two completely different behaviors. So, this is a uh no thinking one, and this is a thinking one. You can see their condition is different. So, that's a 1 billion parameter model trained for just $1,500 or under, behaving like it has opinions. Imagine what happens when someone fine-tunes it, and that is what we are going to do in the coming videos very soon. Let me know what do you think. Again, please support the channel by becoming a member. Thank you for all the support.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29

Artificial Intelligence

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views•2026-06-01