Install our extension to search inside any video instantly.

AI Dev 26 x SF | Ara Khan: Evals Are Broken Use Them Anyway
Added:

1,779 views40likes24:36DeeplearningaiOriginal Release: 2026-05-22

AI evaluation systems (evals) are neither useless nor absolute truth—they require critical interpretation through specific heuristics: (1) Never take model lab benchmark scores as absolute truth, as they are approximations; (2) Stay current with new models but don't be the earliest adopter, as AI capabilities change rapidly; (3) Always use problem-specific evals rather than generic benchmarks; (4) Track multiple metrics including turns, tool calls, tokens, and runtime to understand trade-offs between performance and cost; (5) Containerize evaluation environments to prevent interference between tasks; (6) Understand that evals test three components simultaneously: the model, the agent harness, and the problem itself; (7) Use iterative hill climbing to improve scores while avoiding overfitting to metrics; (8) Always pass the 'vibe check' to ensure the agent makes sense and solves real problems.

Related Videos

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views2026-05-28

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views2026-05-28

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views2026-05-29

More tests are always better? How to use AI to identify tests that bring little value

Alliance4Qualification

335 views2026-05-29

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views2026-06-01

People of Game of Thrones using JavaScript DOM

AltCampus

296 views2026-05-30

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views2026-05-29

So What's Odin Lang Even Good For

TechOverTea

131 views2026-06-01

Trending

The Casino Had Us Guessing All Day

VegasMatt

157K views2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views2026-05-29

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views2026-06-01