Installieren Sie unsere Erweiterung an, um sofort in jedem Video zu suchen

How AI Gets Trained to Sound Human #AI #LLM #DeepLearning
Hinzugefügt: 2026-05-17

101 Aufrufe42:15max-techieOriginalveröffentlichung: 2026-05-11

Large Language Models (LLMs) are trained through two key stages: first, supervised fine-tuning (SFT) uses human-written question-answer pairs to teach the model what helpful responses look like, and second, reinforcement learning from human feedback (RLHF) uses human rankings of outputs combined with PPO optimization to shape the model's behavior based on human preferences, enabling the model to predict text one word at a time while appearing to understand human intent.

[00:00:00]Did you know an LLM can feel like it knows anything? It doesn't. It learned to sound like it does. Here is how.

[00:00:07]Every LLM starts the same way. Reads trillions of tokens, news, books, code, Reddit threads, everything.

[00:00:15]One task. Predict the next word.

[00:00:19]No right answer, no wrong answer, just what comes next.

[00:00:24]After enough [music] of that, it absorbs syntax, facts, reasoning patterns. The shape of human thought.

[00:00:31]But helpful was never in the task. Give the base model a question, it might complete it like a Reddit thread.

[00:00:37]>> [music] >> Or a forum argument from 2009.

[00:00:41]It doesn't know it's supposed to answer.

[00:00:43]To it, how do vaccines work and how do vaccines cause harm are equally valid continuations.

[00:00:50]Helpful [music] and toxic look identical. Two training stages fix that.

[00:00:55]First, supervised fine-tuning, SFT.

[00:00:59]Thousands of human-written question and answer pairs. [music] Model learns to imitate. This is what a helpful reply looks like. The [music] weights shift just enough to reshape the output format.

[00:01:10]But it's still [music] imitating, copying a pattern, not understanding a goal. Second fix, and this is the one that changes everything.

[00:01:18]RLHF.

[00:01:19]Reinforcement learning from human feedback.

[00:01:23]Humans rank outputs. This reply beats that one.

[00:01:27]A reward model learns those preferences.

[00:01:31]Then PPO, a gradient optimizer, >> [music] >> runs thousands of steps pushing the weights toward higher reward behavior.

[00:01:39]The model doesn't get a rule book, it gets shaped. Not a filter on top, baked into the parameters. Remember the hook? It learned to sound like it does.

[00:01:50]That's not an insult, that's the mechanism.

[00:01:53]Same architecture, same starting weights, just shaped [music] by very specific human feedback. Now, how much can the trained model actually read at once?

[00:02:03]That's the context [music] window and why it forgets. Subscribe, that's next.

#technology #ai #ai training #large language model #llms

Ähnliche Videos

resume fixed instantly 😭 Comment “app”andI’ll sendyou the link #parakeetaipartnership #resumetips

Ritcareer

686 views•2026-05-31

Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)

theprophedu

636 views•2026-06-04

3D Basics in C

HirschDaniel

2K views•2026-06-05

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

Making Minecraft Clone with C++ & Raylib

PecaCSLive

686 views•2026-06-04

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

Instagram accounts got PWNed

EricParker

13K views•2026-06-03

So What's Odin Lang Even Good For

TechOverTea

131 views•2026-06-01

Trends

Why Batman Lets The Joker Live 🤨

zackdfilms

9222K views•2026-05-30

Making Ai Choose Where I Eat

Tyrecordslol

3080K views•2026-06-03

They're Complete Trash

penguinz0

558K views•2026-06-04

Künstliche Intelligenz

Can AI tell what accent I’m using?? #carterpcs #tech #ai #chatgpt

actuallycarterpcs

2732K views•2026-06-01