Harness engineering represents a paradigm shift in AI where hundreds of agents work in parallel to accomplish complex tasks, as demonstrated by Kimi 2.6's ability to run 12-hour coding sessions with 4,000+ tool calls and 200 tokens per second, achieving 20% better results than humans at 95% lower cost than previous models. This technology enables agents to perform specialized research tasks, organize data into structured formats like websites and spreadsheets, and execute multi-step workflows autonomously, fundamentally changing how complex projects can be accomplished through coordinated AI collaboration.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
2026 is Changing Engineering Era! Ft. Kimi K2.6Added:
2026 is the year of harness engineering, and for that, Moonshot has just released their flagship Kimi 2.6 model, and I'm going to show you what magic it creates.
Because first of all, this model is giving us to us the future that 2026 is not just a year where you can run one agent or two agents. It's actually the year of harness engineering. That means you can imagine that not two or three agents, but hundreds of agents working together and giving results. And let me tell you what is possible when hundreds of agents work for you in parallel. So, let's go to Kimi's blog, and I'm going to first of all show you benchmarks that, of course, in Humanity's Last Exam, Kimi is actually shining a lot competing with Opus 4.7 plus Codex. And beyond coding abilities with these benchmarks, I'm going to show you something really special where they have real innovation. That is long horizon coding. So, with long horizon coding, they have actually tested and ran it for over 12 hours of continuous execution.
How they ran it. And that also, they have tested on M3 Max. Just imagine if you buy a very powerful chip from Apple, you can run couple of models locally.
And how's that been happening? So, let me tell you how they did it. So, in the graph in the beginning, you will see that MacBook M3 Max was not that not as powerful. It was only generating 30 to 40 tokens. So, as as we see tokens, one token is average three to four words.
So, imagine 100 words max coming per second coming from the M3 Max chip of Apple. But then, Kimi did the magic.
They did a lot of innovation, and they progressed to now 5,000 like 5,000 tool calling, as well as this this is actually a tool calling cumulative on the right. And on the left side is the token. So, the tokens were like very few. Now, higher tokens and more tool calling. Tool calling as in more functions. You can call it call more APIs. You can make model do much more and beyond. So, that means they reached 4,000 plus tool call with 12 hours of long horizon code execution with up to 200 tokens from 15 tokens to 200 tokens per second and all tested with 20% faster and better results than humans.
Now, the cost difference which is as massive as 95% as compared to Opus 4.6.
That's the reason even Cursor's composer was trained using Kimi. And when I say 20% better results than humans, I'm going to show you a bigger example. So, now let's go to kimi.com and I'm going to show you where the magic is. So, here on the left you will see slides, websites, docs, typical which you see in many models, but here is something their holy grail or like the key feature which no one no competitor I've seen do it that well, that is agent swarm. So, what is agent swarm? Agent swarm as I mentioned in the intro of the video, the idea that you can deploy not few agents, but like 20, 50, or hundreds of agents all together working for you. So, I am going to show you what I've been achieving. So, for example, I have good software abilities, but I don't know much about semiconductor industry. So, I really wanted to like, you know, learn about semiconductors industry. So, I went to Obsidian and I used this prompt called for semiconductors. So, I went to Kimi. I was like, let's create a prompt for semiconductors because what I'm trying to say here is prompt does matter. I showed you the website. It was It can be created today with one skill which I'll show you later in some videos, but what I'm saying is prompt really matters. Like for example, if I say that, "Oh, this model killed it." It doesn't mean that it killed with a with a knife. It killed it means like it nailed it. So, similarly with models, they don't understand when I say killed it means nailed it or killed it means like, you know, something else. So, that's why writing the right words is important. That's why you have to like ask the model, "Hey, build me a prompt which is testable for learning semiconductors." And when I say semiconductors, I told it, "Hey, I want to learn about why two of the three leading labs including Anthropic and Google and OpenAI, these three are leading labs, why two of them use TPUs for training?" Because Anthropic used TPUs, not Nvidia for training. So, I was like, "Let's figure out all the markets." So, I wrote this big prompt. I asked AI to write it, and it wrote that I want to understand semiconductors. I want to know what is CPU, GPU, TPU. Why like the two of the leading labs used TPUs. What is the mode they have? What is the mode Nvidia has? All that prompt.
These two lines I gave it and gave a beautiful fleshed out prompt which is testable. So, I asked the prompt that, "Hey, you're going to be a mentor. You have to test that prompt, actually give results." And so, it divided into seven phases. And then I copy this prompt.
Okay, so I just paste my entire prompt.
So, here's something you need to be careful about. You cannot just say, "Agent, just build me this." Because I told you prompt is important. So, it is a testable prompt. I just pasted.
So, there are seven phases. It's going to kick off multiple agents. So, as I as I'm going to fast forward into this. So, here you will see couple of couple [snorts] of agents it has kicked off kicked off. The first one is Mrs. Lim. So, this is Mrs. Lim. And then next we have is Shannon who's going to research Nvidia customers. Number three there is Feynman who's going to research the foundry supply to understand the semiconductor.
And then there is Manco who's going to research the software ecosystem. Then there is Jasmine who is going to research the inference plus training.
Then there is Mrs. Lim who is going to do AMD research. So, all of these all of these like agents are running. They're going to do research into why leading labs use TPUs versus Nvidia what mode Nvidia has lot of these multiple agents launch and look at this. So now agents form is like assigning tasks, matching which responsibilities should be taken by which agent and it's feels magic, feels like actual employees are taking over the work and all these agents run in the cloud. So if you see for the first agent named Woo, it's doing couple of things it's executing first batch of five web searches going to search for five first five things on Google, then execute the next five searches and then synthesize findings into structured markdown and save it. So it's going to first one is just doing the search and nailing it. And then after the search there are couple of agents creating Excel sheets of data, massive data with all the customers. So let's fast forward through this. I'm going to fast forward through all the reasoning of these agents and by the way you can click on the agent on the left and on the bottom and see the task completed by each each agent. For example on the left I see that Woo has finished, Shannon has finished finished, Stinger has finished, Fineman has finished. So these four agents are done and the remaining agents are still working in progress. All right, so let's fast forward fast forward through. But wait wait, hold on.
So I'm going to show you some of the work done by agent number two. So this guy did research and figured out what chips XAI, Meta, Anthropic, cloud providers, new clouds, all of these providers what kind of chips they have and why, figuring out the why. So it's trying to do the nitty-gritty in hundreds of agents and now I'm going to just take you to the result. So let's go to the results. The results are not just Excel sheet, not just report, but actually a website which I can learn from. But let me show you one by one.
First of all, if I go to all files, there are hundreds of files by the way.
Let me show you first of all a Word document report. If you need a Word document you can just preview in just right here itself. Let it open. Here we first of all have a word report. Number two, we have actually Excel sheet with all the data of all the companies, chips, data centers, which data centers have which kind of chips, workloads, cloud providers, all the homework done into different sections of Excel sheet. Oh my god. Look at the the number of chips we can see and the power they have, which data center has which kind of chips, and is Nvidia king or not. So, to figure out Excel sheet is not too fun. So, I'm going to just open the website which you created all in one shot because I worked harder on the prompt. If you want to make best advantage of Agent Swarm, write a good prompt just like I showed you. All right, so let's go to the report. This report is something you can sell as well. So, for example, Semi Analysis, it's a hundred million dollar company doing agents research. Of course, you cannot do just with just with agents, but agents do help. I talked to the founder of Semi Analysis.
I today I talked, he said we deploy a lot of agents in researching as well.
And that's where the moat lies. So, now I can go start learning and this website does not just teach me the foundations of semi semiconductors. It has a beautiful animation. I can learn about what a CPU, GPU, DPU, ASICs, the basics.
But, the data is important. As I showed you, it has Excel sheet of Word doc. So, let's go to architecture. You can learn about what power each chip has including ampere, Nvidia's Blackwell, Nvidia's Blackwell Ultra, Blackwell Nvidia's Rubin. How much power they have, you can see the report. And you can click on each the report and go as much in depth you want to. But, I'm going to go a step beyond. You can even see software ecosystem. In terms of software ecosystem, Nvidia has CUDA, other has AMD has the rock, Google has XLA. Every every company is trying to create the ecosystem. Next, strategy. You can learn about which customer is using which kind of chips and is Nvidia majority or not. So, you can go through all the customers, you can see the reports, and what I'm going to go is right here, AI accelerators market share. Nvidia is dropping. That is true. Nvidia is dropping, but still it has around 60% of the market. So, it is cool. Even though the leading leading labs are using like TPU, Nvidia is still majority share in the whole cloud industry. So, you can learn and see if you want to invest in Nvidia or not, and all the report is beautifully presented.
Some of the reports will not even be in the some of the reports of semi-analysis you will see. So, that's why Agent Swarm is wonderful, but I'm going to show you number two coolest feature that I've used as a dev rel. So, I'm hosting a dinner for series A+ startups only, and that also for their founders plus developer relations. So, what I did was I created this finite prompt. I can show you the prompt because prompt is something I worked hard on. So, I said, "Go and find series A+ beyond startup founders, and I need their email, Twitter, Gmail, uh what company they work for, how much is the revenue, how much is they have raised, and I want to know that that is related to my company that I work for, which is HydraDB." So, I was like, "Let's do that, and I'll host it at a fancy restaurant in SF." So, I gave this prompt, and then the rest is all added by itself. I just asked Gemini, "Hey, organize this prompt so that it is testable." Every prompt you write should be testable, just like we write code.
So, it it wrote some rules to test it.
So, ranking criteria it added, which is very basic, but it just expanded the prompt. Here is the important details it added, and deliverable, which is which makes it testable. So, pretty much the the short is just one line, then prompt just got expanded because I wanted the prompt to be testable. And then it it ran a couple of agents, and then it created a website. Here's the cool website I want to show you that it stands out. All right, so here's the website it made and the cool part is it has semantic search. For example, I just don't have to type series A and like search in all of these filters. I can just type in the search and it'll figure out I'm asking for series A. It has figured out all the founders. I can say series B and it'll just search it for me. I can type a kind of companies. For example, I can hit memory caching. So it'll show me all the memory caching startups and then vector DB. So all the startups working on vector DBs. There is a list as well as I showed you. All stages are like it's a dense list.
Series A, series at 10 million, series at 11 million. It created like a very dense list of and a kind and category of startups. So that's why semantic search is useful so that when you say series A, it include includes all of these 10 categories of series A plus A plus. Let me tell you what semantic search is by the way. So semantic search is that when you type a word like Hydra, it doesn't match the exact matches the meaning. So when you say like food, it can automatically figure out food has fruits as well as vegetables. You don't have to match the fruit f r u i t exactly. So that's why semantic search is so useful. You can figure out the categories of startups just by typing and by by typing in English. You can even say give me and because it's semantic, it is not it doesn't care give me is written in in the result. It will just show you. So the the So these startup founders list is very useful for me and I can DM them one by one on Twitter and or email them with agents running and I can invite them the top 10 in the dinner. So that's why this feels like a magic for me. You should just think about whatever you're doing and if you had hundreds of agents, what would you do differently? You can one shot it. You can multitask. I see it SF dinners people talking to their agents through their phone and getting stuff done while they're working, while they're out, and validating the results when they go back home. So, tell me what are your favorite features and what are you going to build with hundreds of AI agents running together for you. Check out Kimmi with the link in description and thank you so much for watching.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











