Installieren Sie unsere Erweiterung an, um sofort in jedem Video zu suchen

Vibe Coding is DEAD (Meet “Vibe Training”)
Hinzugefügt: 2026-05-06

4,020 Aufrufe908:05KunalKushwahaOriginalveröffentlichung: 2026-04-29

This transition from "vibe coding" to "vibe training" represents the necessary professionalization of AI development for production-grade reliability. It smartly shifts the focus from massive model scale to the surgical precision of small, low-latency guardrails.

[00:00:00]Okay, there's this new term now in the AI space called wipe training. We have been wipe coding for a year now ever since Andre coined the term in early 2025. You know, feels like magic. You prompt, the AI builds, and suddenly you are a 10x developer. But the vibes don't survive production always. We are, you know, currently in the wipe crash, if you will.

[00:00:22]We've seen Air Canada's chatbot hallucinating its own refund policy in court.

[00:00:28]New York City's um a bot tells business owners to break labor laws and the agents panicking and trying to wipe out entire databases. So, if you're building something serious, you need two things. You need evals to monitor quality and find weak spots, and you need guardrails to step in before a mistake hits your customer. Because without these, you are not really shipping a product, it's more of a liability.

[00:00:57]So, agents are fully autonomous now.

[00:00:58]They are not just chatting, they are hitting APIs, managing your Stripe account, and talking directly to your customers. The problem is that the traditional testing is completely failing us. Most of the teams are stuck with unit tests that only check one single interaction, which doesn't really reflect how people are actually using your application, or they're relying on, if you've heard about it, LLM as a judge, which basically means you're paying GPT-5 to grade another agent.

[00:01:24]That is slow, expensive, and honestly, you're See, think about it. You're just using one inconsistent model to check another model. So, it leaves most of these agents dangerously under tested.

[00:01:35]You're essentially flying blind, and while the engineers have the keys to the cockpit, the business and the product teams are just standing around, you know, crossing their fingers. So, now there is this new term called wipe training.

[00:01:49]Basically, you know, instead of just prompting an agent and praying that it works, you are training it to stay within the lines in real time. So, a company called PlurAi launched on ProductHunt today to make this happen, and their framework is consistently outperforming the frontier models. The best part is that you don't need a PhD in AI. If you can describe the behavior you want in plain language, you can go from a description to a calibrated high-accuracy model in just a few minutes. Okay, extremely easy to get started with. When we talk about data, you can upload your own sample data. You can have your own evals, your own labels, and the reasoning why that label exists.

[00:02:30]Let's say I want to classify whether there is a health advice or not. So, if a user is chatting with me, I should know whether, you know, there's a health advice involved or there is no health advice involved. So, it will ask me some questions like health and non-health are fine as labels, wellness information, over-the-counter drugs should be used or not, mental health included or not. And I'm just going to skip it for the sake of this demo, so you can get started easily. And here you can see it just generated some samples for me. Some of the advice is health, some of the advice is in the examples is listed as non-health, and um let's say I want to change it to Q&A style. So, I'm just going to again wipe train, sort of like just ask it like, "Hey, generate regenerate this data set into a Q&A session between a chatbot and a user." And now you can see it has updated that. So, here you can see um it has updated that. And some of the ones are health advices, like, "Can I drink I'm drinking water?" Some are non-health because in this prompt it was like, "It's just personal. I'm not looking for health advice." And these are indeed boundary data sets. Um so, it's you know, also evenly distributed data set between different labels. So, now when the user is ready, they can choose if you want to train a very cheap SLM with sub-100 ms. And of course, it's also very cheap model, much much cheaper than, you know, when comparing to OpenAI. The other option is to optimize a generic LLM, and we choose the smallest configuration here.

[00:03:57]Um we can do like a prompt optimization with the smallest configuration that can still perform well on the task. So, we're going to continue and optimize this LLM. So, immediately the user is getting an endpoint, which they can easily use it in the code. And of course, we still are doing the optimization in the background. It's just going to take a few minutes. You can see the progress here on the left-hand side in the chat for the optimization. So, the user has a full visibility on the results. You also have comparisons to the baseline, which is GPT-4.1 with the initial prompt, and you can see the There's the incre- increment in the performances. And of course, you can also see the results on the task. Now, if the user is not satisfied, you can also enrich the data set and iterate and optimize again. So, the endpoint is now ready, and the user can actually use it.

[00:04:50]PlurAi, they use um small language models, right? And these are specifically trained for your policies. They hit three major gaps that the massive models aren't doing. First is the cost inference, which is over eight times lower than GPT-5 Mini, for example. At that price, you don't have to sample your logs anymore. You can just afford to, you know, evaluate everything.

[00:05:12]Second, it's more reliable. The failure rate is over 43% lower than using a general LLM as a judge. So, way fewer errors actually reach your customers.

[00:05:22]But the real magic is the speed. You are, let's say, if you're using a huge model, let's say GPT-5 Mini for your guardrails, you're looking at like a three- or four-second lag. You can't make a user wait that long for a safety check on every message. That's not good user experience. PlurAi's wipe-trained SLMs run at sub-100 ms latency. That is the magic number for real-time guardrails because it's fast enough to actually block a bad response before the event is sent, and it's not just like an observation anymore. So, it's actual enforcement.

[00:06:01]So, you get the full visibility and a production feedback loop without really blowing up your budget or killing your performance. So, there's a nice middle ground. So, who is this actually for?

[00:06:10]You have the wipe coders, the vibers.

[00:06:12]These are the builders, PMs, curious engineers who just want to move fast.

[00:06:16]And if you're, let's say, driven by the hype cycle and want to ship without waiting for a dev ticket, this is your playground. It's likely the first time you can actually shape how an agent behaves yourself. So, to you, this isn't just about a metric, it's about having a level of control that actually feels right. Then there are professional users, the tech leads, the data scientists.

[00:06:38]You don't care about the hype, right?

[00:06:40]You care about accuracy and reliability.

[00:06:42]For you, wipe training is the way to, you know, finally replace inconsistent LLM judges with something that actually scales. So, for you, it's about unlocking production-grade control and getting a continuous feedback loop that won't cost you a fortune. So, look, the wipe era, you know, been a blast, but if we want to have AI agents to actually run our businesses, they have to be predictable. Current observability tools, you can use like LangSmith or BrainTrust, pretty cool for postmortems, but they basically take a high-res screenshot of the disaster after your agent has already accidentally refunded, let's say, 50 grand to the wrong person. So, PlurAi is that layer that stops the transaction from happening in the first place.

[00:07:29]They're live on ProductHunt today. If you're building agents, go check them out. I'll leave the links in the description below. Get a free trial. Show them support on their launch by, you know, upvoting if you like the launch, and leave some feedback and your thoughts in the comments for them in their launch. And if you have any questions, feel free to just tag them on their ProductHunt launch, and they are happy to answer.

[00:07:51]And I will see you in the next one. Let me know if there's any other cool stuff I need to check out and make a video on.

[00:07:57]>> [music]

#Vibe Coding #Vibe Training #Andrej Karpathy #Plurai #AI Agents

Ähnliche Videos

Künstliche Intelligenz

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Künstliche Intelligenz

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Künstliche Intelligenz

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Künstliche Intelligenz

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Künstliche Intelligenz

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Künstliche Intelligenz

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Künstliche Intelligenz

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Künstliche Intelligenz

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trends

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Politikwissenschaft

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30