This roadmap offers a practical shift from passive consumption to active system design, focusing on the essential technical layers of modern AI. It correctly identifies that future literacy depends on building the infrastructure, not just using the interface.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
6 AI Projects Everyone Learning AI Should Build in 2026Added:
So, there's a real difference between people who use AI and people who have actually built something with it. And in 2026 and beyond, that gap is going to define your career.
The people who can actually build are the ones who will get that promotion or get pulled into [music] that strategy conversation. Not because they know every model but because they can actually pick something [music] up and put it together. Now, here's the good news, right? Closing that gap is honestly not as hard as you think.
>> [music] >> You don't need to be an engineer. You don't need to learn like all of these new frameworks. You just need to pick the right AI topics and then build [music] the right things.
And that's what this video is all about.
We'll talk about six different projects.
Whether you're trying to get hired into an AI kind of role, level up in the one which you already have, or you just don't want to be the person at work only uses AI. These are the projects I would start [music] with.
So, the first one is called skills. This is the one I would recommend starting with because it gives you the most leverage for the least amount of work.
So, what is a skill? A skill technically is a folder. Inside that folder, one file is mainly required. It's called skills.md plus optional subfolders for templates, scripts, anything else you want Claude or any other agent to have access to.
Now, skill.md itself has two parts. At the top, a few lines of YAML, which is just a structured way to write metadata.
Two fields really matter. Number one is name and the other one is description.
Below that is a plain markdown, which is actually the instructions for the task for that particular skill.
Here's the part that actually makes skills work. Claude does not read your whole skills every time when you open a chat, right? If you give it If you have like 50 skills installed, that would absolutely blow up your context window before you even type anything. So, instead, Claude reads only the description from top and then uses that to decide, is this skill relevant right now? If yes, then it loads the rest. If no, the rest never enters the context.
Same with reference files. They only load when the skill body explicitly tells Claude to read them.
Anthropic calls this progressive disclosure. Big phrase but a simple idea, right? Only load what you need and when you need it. Which is why the description field is the most important line in your entire skill.
So, the big question, how do you build one? You know the structure, you know the format of the skill. So, do one thing. Pick something that you do almost every week. Let's say status updates or vendor reviews or meeting follow-ups, anything where you keep re-explaining the same context. Then, head to your favorite agent, let's say Claude Coder or Anti-Gravity or Gemini CLI, and ask it in plain English to build this skill with all the instructions that will automate your task. You can see Anti-Gravity or Claude Coder build that skill live for you. If you want a full deep dive on building skills, please drop skills in the comments below and then I will work on creating that skills video for you. Okay, so that's the first one. Let's look at the second one.
Project two is RAG, which is retrieval augmented generation. It's how you build a search and retrieve layer over your own data. Let me walk you through how this actually works under the hood. So, let's say you have a pile of documents, your company docs, your team's notes, your product manuals, whatever. You split them into chunks. A chunk is usually a few paragraphs. [music] Each chunk goes through something called an embedding model. What that does is turn that chunk into a vector, which is nothing but a list of numbers.
But, and this is the magical part, those numbers are not random. They're arranged so that similar concepts end up close together in this high-dimensional space.
So, let's say we have a term called patient has hypertension and high blood pressure. Those phrases share zero words but they end up right next to each other in the vector space because they mean the same thing. You store all those vectors in a [music] vector index. Then when someone actually asks a question, you embed the question the same way and you ask [music] the index for the closest matches, usually the top five or 10 chunks. You hand those chunks to the LLM along with the original question and the LLM then writes the answer grounded in what you just retrieved. And that's RAG, which is retrieval augmented generation. You embed everything, store it, retrieve by meaning and not keywords. You generate from context.
Foundational models, which are large language models, don't know your stuff, right? They never will because that data was never in the training set.
Now, I know you might think that okay, we have Notebook LM and wouldn't we use that? Notebook LM is fantastic. I love the tool. But Notebook LM is a destination. What we're talking about here is RAG as a component, something you wire into an agent, into a voice app, into your team's internal tools.
Different beast entirely. If you want a deep dive on chunking, evaluation, all that production stuff, drop RAG in the comments.
Project three is build an MCP server.
This is where the first two projects start paying off in a real way.
MCP, which is model context protocol, is an open standard. It's how AI agents talk to external tools and data. Think of it as a universal adapter. So, the basic idea is you take a function in your code, you mark it as an MCP tool, and now any AI client can call it. The Python SDK has a framework called fastMCP that handles all the plumbing for you. You write the function, fastMCP makes it callable. So, you've got your RAG from project two. Right now, only you can talk to it. Wrap it in an MCP server and that changes completely. Now, suddenly Claude Desktop can use it, Cursor can use it. The agent you build later can also use it. So, that's the difference between a script and an infrastructure. The reason this matters now and not two years ago is MCP got super serious. Anthropic released it in late 2024. SDK downloads jumped roughly like 970X in 18 months after. Anthropic donated the protocol to the Linux Foundation in December 2025 and now it is an industry standard. [music] ChatGPT speaks it, Cursor speaks it, Google's Gemini speaks it. Take the retrieve function from project two, wrap it in fastMCP, a few lines of code, that's the project. Now, if you want the deep dive on how to create and design and deploy, drop MCP in the comment section and I will work on that for you guys. All right, moving on.
Project four is voice agents. This is the project that genuinely changed in the last 12 months. For years, voice AI was four separate modules stitched together. Voice activity detection, speech-to-text, then the LLM, and then text-to-speech to play back.
Each step added latency. Total round trip two or two to three seconds.
Voice was really unusable for production.
The whole stack is dead now.
Gemini Live API processes raw audio natively. Gemini 3.1 Flash Live launched in March 2026.
Fastest model out there for first token of audio. 90 plus languages already available. There's a built-in barge-in so users can actually interrupt mid-sentence. And it scored over 90% on a benchmark for multi-step tools calling from audio alone. Meaning, a voice agent can now reason through complex multi-step logic without a text intermediary.
It's a really big deal. Now, here is where it's get really interesting. You wire voice on top of project three, which is where you speak a question, Gemini Live calls your MCP server as a tool, the MCP server then queries your RAG, the answer comes back spoken in under a second. Imagine driving home and asking your own company's document a question. Couldn't do that two years ago, but you can do it tonight. Again, if you want a deeper dive on Gemini Live, drop voice in the comment section.
Now, with that, let's get into project [music] five, which is running a model locally. Think of this one as the constrained exercise. [music] Three things have to come together. Open weights, quantization, and a runtime.
Now, open weights is a model like Gemma or Llama or Qwen where you can actually download the weights and run them yourself. It's frontier closed models like Gemini Pro or Claude Opus, you cannot.
Quantization, this is how you fit a model on a laptop. The original model uses 16-bit floats for every parameter.
You quantize it down to 4-bit integers.
Tiny quality loss but then 3X memory reduction.
The runtime, and that's Ollama. Think of Ollama as let's say Docker for AI models, right?
One command pulls a model, another command runs it. Ollama handles the quantization, manages the memory, and exposes a local API. The model itself, let's say Gemma 4, which is the current generation from Google, right? It comes with four sizes, 2 billion, 4 billion, 26 billion, and 31 billion parameters.
Smaller variants run fine on a regular laptop with almost like 8GB of RAM.
Larger ones need a GPU. So, in that case, install Ollama, pull Gemma 4, point it at the same RAG you built in project two. Same RAG, same MCP server, but now the model answering you is running on your laptop. You lose a little bit of quality, you lose some space speed for sure, but then you gain privacy and most importantly, you gain offline. You gain zero per token cost.
So, working Ollama plus RAG setup is another one where if you drop the comment, I'm happy to make a video on that as well as a follow-up. All right?
So, with that, let's get into the last one, which is our sixth project, which is fine-tuning. Now, I want to mention this very quickly. This is the move when you need a custom model, one that behaves a specific way for your or business. Maybe it has to write in its own company's voice, legal terminology, medical coding, or your internal product jargon. Here's the important part.
Fine-tuning does not make a model smarter. What fine-tuning does is behavior shaping. You're adjusting how the model responds, not what it knows.
Under the hood, you don't retrain the whole model. That's the old expensive way. The modern approach is called LoRA, which is low-rank adaptation. Your base model stays frozen. You train a small adapter on top, usually less than 1% of the model's parameter, and that adapter learns the new behavior. But honestly, most people watching this won't need fine-tuning. It's a deeper rabbit hole that, you know, than the other five. And unless you're hitting a real wall, you're better off mastering the first five projects. If you want a dedicated deep dive on fine-tuning, when it's actually worth it, how LoRA works, the full Vertex tuning workflow, please drop fine-tuning in the comments, and I'll be very happy to make that as the next video.
All right. So, those were the six, right? Skills, RAG, MCP, voice, local models, and fine-tuning. You don't have to build all six, obviously. Pick the two that scared you the most, or pick the one closest to your day job. The whole point of being AI-enabled at work isn't to know every tool. It's to be the one person who actually built one of these end-to-end when the conversation comes up.
So, drop the project name that you want me to go deeper in the comment section, and let me know which one you're starting with. Thanks again for watching. Thanks for your time, and I will see [music] you in the next one.
Related Videos
VALORANT's Latest 'Exclusive' Tier Bundle is Rough...
KangaValorant
17K views•2026-05-28
Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power
SkyboundStories-b4r
184 views•2026-05-28
I FIXED My Friend’s Blown Turbo RX-8… Then Sold It
Cameron-RX8
134 views•2026-05-28
NewsWatch 12 at 5: Top Stories
NewsWatch12
1K views•2026-05-28
Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG
talkSPORTArsenal
6K views•2026-05-28
Botting is OUT OF CONTROL in Classic WoW (Again)...
SolheimGaming
108 views•2026-05-28
The "AI Job Apocalypse" is CANCELLED!
WesRoth
9K views•2026-05-28
STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔
RajmanGamingHD
12K views•2026-05-28











