Reciting textbook summaries on RLHF is a performance of literacy, not a demonstration of actual engineering expertise. It’s a classic case of using technical jargon to curate an intellectual image rather than solving real-world problems.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Mutabani Wa Bobi Wine Ayoleseza Obukugu Mu Tkinologiya Wa AIAdded:
21 Max simple MC. Simple MCNOman The world is evolving into what we call artificial intelligence, AI.
Also, the president decided one of the things is artificial intelligence.
Hello, my name is or Solomon Campala.
Today is day six of AI training with building applications with foundation models by Chip Hua. Hopefully I didn't butcher that name. Today we'll be talking about post training things like that you know sampling and outputs stopping conditions just things around that nature. Today I read 35 pages and I'm way too lazy to summarize that into a bunch of paragraphs. I'm just going to tell you what I wrote down in the summaries. It's not going to be too long. Hopefully you bear with me. Thank you everyone that's been sticking with me. If you have fallen behind, please just go back and read some of my previous paragraphs about about engineering. I am on page 105 of about 500 pages. I just want to be done with this book so that I can buy the machine learning foundations book so that we can start with that one. Okay.
So post- trainining post-raining exists to solve the problems of pre-training. So um what are these problems with pre-training? Pre-training has issues where it can create text but it can't have a conversation with a human being. Like it creates things but it doesn't know where to stop. It doesn't know what your words mean. All it knows is how to complete sentences.
Like if I ask what is your name? It it only thinks about a questions like where are you from? or what's what's your mother's name or or if I ask it what color is blue or it thinks about things like what color is blue the blue chair that's on the side of the street and something like that he doesn't know what to end. So post training includes supervised fine-tuning where we find a way to make the AI give the appropriate answer and preference finetuning where there is reinforced learning with human feedback and I'm going to or AI feedback and we'll talk about that later and we're going to I'm going to explain what these are. So pre-training I would say is like reading a book to get the knowledge and post training is like learning how to teach. So teachers first have to go to school and learn and then they then have to go to school to learn how to teach. So that's basically what post training is. Most of the effort is most of the effort, money and resources all of that is put into the pre-training 90% and then 10% goes into post training. That's why it's really expensive to train an AI model. It's like putting it through college times 1,000. So SFT which is supervised fine-tuning is where we make the AI complete the question where where we move it from completing the question to actually answering it and understanding like what so the the different parts of supervised finetuning where like there's a traditional where anyone can do it where have a bunch of pictures and the pictures like you say this is a chair and you you just press the A has said this is a chair described I said this is a a dog. Is this correct? Yes. So then there's demonstrated demonstration labeling that is kind of hard and in higher companies most of the people that do demonstration labeling actually have master's degrees because it's an advantage of different steps. So here you give the AI example answers to learn from or like first this this is a the process. So these people that do demonstration labeling, they design what questions, like the prompts, what questions or like fields for questions to ask cuz you can only have a few cuz it's so expensive and then they design what questions to ask and then the answers for those questions and that and then the A learns from those. So it cost like I'll say $10, $25 for each question and answer pair. So we go from there. So because of this this expensive nature of the situation many many people are just turning to AI generated data and then figuring out okay this is what the AI replied and the people liked it and you know just to cut costs and I I don't personally know about whether that's good or bad but my intuition tells me that human in the loop is always better when that's not true but go from there what do I know preference finetuning is like what kind of conversations the AI should be able to have like AI is trained on the worst worst worst parts of the internet and it sees everything.
It doesn't know what's good or bad. And the preference fine tuning is teaching AI what is good and what is bad. Uh what it can talk about and what it can talk about. And you know this is impossible because there are tough topics and it's like it's like making a human being have a perfect conversation with every single person and like it will always be perfect. There's culture clash, there's misunderstandings and it's impossible. So making an AI perfect is really hard. A lot of companies have decided to make AIC confantic in a way that it's always blazing and always saying, "Oh, you're so right. You're so good. You you know, you know, yes, yes, yes. You're so right. Your idea is really good. What you're saying is really good. Yes, yes, yes, yes, yes. Everything is good." I don't know. That's the downside, but also these people have to worry about shareholder value and always impressing their users. And it's also really addictive uh when an air is always blazing and always you know. Yeah. But I guess that's one solution because of the fact that there are tough topics like abortion, drugs, immigration, things of that sort. So one of the methods to do this is through reinforced learning with human feedback where they ask a bunch of people. They give them these questions and these answers uh like curated and then they ask people is this is this a good answer? Which which answer is right? They give them two answers. Which answer is better? Or they give them like four answers and then they ask them which answer is. And then they get a bunch of people and then they use mathematical function to figure out which is a best average response and then they go with that. So I'm going to run through this really quick. There is sampling. Sampling is how AI makes outputs. It chooses probabilities. Some of the times it's one way like what's a ball. The word round has a 30% probability of correlating with this question. And the word stick might have like a 5% correlation because of hockey and you hit the hockey ball with a stick and all of this is done with mathematical functions. Temperature there is a a form of sampling called I think temperature where there like it's better because like man it's too technical. It's too technical. I I can't really explain it. I didn't write down that many notes to remember but I have other things to explain. I have 30 seconds left. Okay. Stopping condition.
I don't have 30 seconds left. I'll just go for one more minute. Stopping condition. I want to stop. Air can output things forever. If you ask it what's what what's the use of a book? A book is used for writing. Writing. This writing can be about all kinds of things. These things can be used in school and for for studying and also to be used for learning a lot of things.
This book can be written down on a lot of names and a lot of I can keep talking and talking and talking and talking. But scientists think about different ways to stop it from talking so much so that companies don't run out of money because it costs money to make the AI talk. Test time compute is making outputs. Yeah.
Yeah. Yeah. So it mashes things together until it makes sense. Like like if if an output Okay. It's like what what colors do you like? But it starts with a one output like the letter I. Then what might come after I? I like Okay. I plus like. Okay. I like the Okay. The Okay.
Plus the I like the plus what? Plus what? Color. I like the color. Plus what? Plus what? Red. Final output. Red.
Red equals color. What was the question?
What color do you like? Boom. Finished.
It has stopped there. It's really smart.
I don't know how these guys thought about it, but they thought about it and uh it ended up working because they found a way to use math to describe what's going on. What else am I supposed to talk about? Structured outputs. The different structures. A can output can translate text to things like SQL, you know, different coding languages, different things, different you know.
Then there's nonstructured output that AI might need to form structured output to then form nonstructured output. This is especially useful for agentic AI like let's say you create like cloud code has a thing where you can connect it with your Gmail and then that Gmail uh it it can write it can save drafts in your email. So for it to do that I think it has to there's a format that Gmail likes their things like it it forms it I don't know what what language they use or something they they form it in a certain type like the AI turns it into a certain type of let's say code and then Gmail then uses its its translating capabilities to then turn that into natural language whichever language you're using. So yeah, those are two types of forms that AI can form structured outputs. There the there frameworks that can do this. Either companies can do this can create things that can do this themselves or they can use someone else's stuff they can outsource. They can use something called guidance to constrain the types of answers. There's something called options where you have this or that or like a group of things like the answer can only be let's say what color is the new 2026 Hilux and the constraints are maybe they made it in only black, red, uh yellow and green. So there are four options. So those are the constraints of answers and then let's say they have a previous statement plus that and that constraint is the end of the sentence.
So the 2026 Hilux is let's say maybe you're looking for a specific type.
Let's say is there a green 2026 Hilux?
Yes, there is a green 2026 Hilux and then only for options of what it can be yes or no. Okay, on that note you can also use rejects where no rejects for calculations and options is yes or no. I think there are different types of options. Either it can say yes or it can say no or you can or you can have it tell you that there's that type of thing going on that type of situation going on. Reject you can say how many species of flowers exist in Uganda and maybe can add up all the species of flowers that it has in its database and then it returns a value. So those are different types of constraint outputs. You can also use prompting on a model that supports it. You can just tell it what kind of format you'd like your answers.
But that's if it supports it.
Postprocessing is where you can just go and hand uh hand type that stuff. you can hand it's like you can okay if something breaks in my in my program I can see like okay it's not outputting the the thing the way I want I can just go and write like the code I can just go and write the code for it instead of going to optimize a prompt or like going to then retrain something some crazy thing right you can then come up with that and just fix that small thing really quick yeah and then yeah I already talked or constraint sampling did I? Yeah, I think I I I I talked about constraint sampling when I was confusing it with one of the options. So that's it for today.
Tomorrow I'll be talking about tomorrow be talking about the probabilistic probab probabilistic nature of the eye.
Okay, goodbye. Thank you.
Bobby is never and never predictable.
AI, artificial intelligence, Tick tock is about to become a judger.
AI artificial intelligence.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











