Prioritizing HTML over Markdown correctly identifies the shift from information retrieval to functional tool-building in AI workflows. While it sacrifices speed, the gain in interactivity and information density is essential for the next generation of agentic interfaces.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
HTML vs. Markdown: Pushing AI Agent Responses to the Next LevelAdded:
So, some of you may have seen this article on X the other day. This is from Theariq from uh Anthropic and he talked about the effectiveness of HTML versus Markdown, which I'm sure we're all used to. Agents tend to use markdown a lot for reports and just for configuration files. markdown has kind of been the go-to, the default uh for agents because they're so easy to to read and write, but he's arguing in this X article that HTML is actually a better format and pretty interesting article. It sparked kind of a debate in AI community.
Thought it was interesting, especially because if you watch my videos, you see a lot of my slides, they're all HTML files. So, I'm pretty familiar with using HTML in cloud code and how effective that can be. I thought this was particularly interesting, so I wanted to do a video about it. So, in this video, I'm going to basically break down what Thri was talking about in his article. Um, and also a tweet from Carpathy today, kind of backing it up, as well as some of the counterarguments against using HTML versus Markdown. And then we're going to take a real look at it. I'm going to give Claude code a somewhat complicated task to write a report for me. We're going to see how it looks in Markdown and then see how it looks in HTML. So you can kind of see visually the difference.
And if you like this video, please subscribe to my free weekly newsletter where I give my honest thoughts about the week's AI news that I can't share here, as well as interesting research and papers that I found and what projects I've been building behind the scenes. There's a link in the description or go to onchain.com.
Every word is written by me, mistakes and all. So, if you're sick of AI slop articles or all hype with no substance, subscribe and give AI Garage Weekly a shot. Now, back to the video. So, the main thesis from The is pretty simple.
He's saying that as models get more capable, the format we ask them to render in is becoming the bottleneck.
But first, let me kind of give you a breakdown of what markdown is. Um, and it one default output or for genuinely good reasons. The syntax is plain text, so it round trips through any tool, any pipe, any terminal. You can use it in any kind of editor. It supports headers, bullets, code blocks, tables, and links.
And it has enough rich text capability to handle 80% of agent output. It's spellch checkable, if friendly in git, and trivial for humans to handedit. I'm sure most of us have edited some markdown files in VS Code or elsewhere.
So there's nothing here that's actually wrong. Markdown is perfectly fine. So the argument that he's making isn't that markdown is bad. It's just that we've outgrown it and there's a better option.
So framing of the whole piece was that markdown started as an enabling format and has actually become a restricting format now. Um so the line he leads with here you can see his quote. I find it difficult to read a markdown file of more than 100 lines. I want richer visualizations, colors and diagrams. I want to be able to share them easily. So this is a practical problem. So the model we have now is going to be happy to generate 500 line specs very quickly.
But as humans, it's not so easy to read that much text. So argument one is just information density. HTML's vocabulary is just enormously wider than markdowns.
You get tables, design, illustrations, code, interactions, spatial layouts, images, video, embedded scripts. Greek's strong claim is that almost no information claude can read is something it can't represent in HTML markdown capped out somewhere around bullet list and fenced code and when you constrain a capable model to a thin format improvises around the constraint. So Claude has gotten generally good at ASCI diagrams, at building little tree structures with box drawing characters, even at approximating colors with shaded unit block codes.
That's an impressive workound for a problem we don't need to have. And when you're using HTML, you can use these different elements here. Input, CSS, image, tags, position. There's a lot of different ways to kind of format and display the same data.
So the argument two that made is visual clarity and this is the readability argument. As the model writes longer plans, specs and analysis, a wall of markdown stops being scannable. And three is candid that he himself doesn't read past 100 lines. Uh I myself will admit I don't usually read the spec files that are in markdown more than just kind of skimming it. Um and he says he definitely can't get his teammates to do it either. The artifact you can can't get anyone to open is worth less than the shorter one they will.
And HTML lets CL organize structure visually. So it can even be mobile responsive so the same document reads differently on a phone than at a desk monitor. And the fix is a structure that markdown just can't express. Tabs to compartmentalize sections. Sidebars and tables of contents. Collapsible details.
Illustrations next to the pros they describe. Hyperlinks that go somewhere useful. mobile responsive layout.
The same 500 lines of content that you get in kind of a spec file that you just have to scroll mindlessly becomes this kind of scrollable and navigable format which the reader can actually finish.
So argument three he makes is that it's just a lot easier to share. And this one is kind of mundane, but it's very real.
Browsers don't render Markdown natively.
If you want to share a markdown file with a non-technical colleague, you're either copy pasting it into a Google doc, taking a screenshot of it, or asking them to install something, which nobody wants to actually do. With HTML, you can just drop the file anywhere on any kind of static host and send a link and then you're done. Anyone can open up easily in a browser and it looks exactly the way you wrote it. So the chance of someone actually reading your spec is much higher if it's in HTML.
The argument four he made was two-way interaction. And this is the one that's hard to do in any other format. You can ask Claw to render a design with sliders, drop downs, toggles, actual controls that you can manipulate. Then you could do like a copy as a prompt button down here that captures your tweaks and feeds them back to Claude Code as the next instruction. So the artifact actually becomes the interface.
So, say you're adjusting this button here. You can tweak it.
Change the shadow as well.
Make it less, make it more.
You can change the different styles and then just do copy as a prompt and then add that back into claw code to tell it the way the exact type of button that you want. So argument five is data ingestion and specifically why claude code is the right tool for this. Um obviously the is a little bit biased because he works for cloud code but it says that the agent in your terminal can read your file system hit MCP servers like slack or linear walk git history or even drive a browser. So that context turns make an HTML page from a generic task into one that's grounded in your actual data your actual code and your actual conversations. So now the counter arguments and three engages these directly. So the real objections are that HTML costs more tokens to generate and it takes meaningfully longer to produce. It's a hassle to view sometimes. You have to open it somewhere. Markdown files open in any editor. Version control. Um it makes gits diff noisy because tags and attributes can change a lot. Uh and these are legitimate arguments certainly. Editability as well.
markdown. You can just quickly open it and edit it however you like. HTML, you have to actually get the agent to to change the HTML file unless you want to go in and edit the code directly yourself.
Um, but he does have rebuttals for these arguments. On the issue of tokens, so with million token context windows now standard for Frontier models, a few thousand extra tokens for tags is just kind of noise on time. Yes, it is is slower. He says it's around two to four times slower to generate, but the resulting artifact is dramatically more useful and usually you only need to generate it once.
On terms of viewing, um he says you can open it locally or push to an S3 for a sharable URL. So neither is especially hard to do. And the diff noise is the real cost. And he agrees with this. He conceds this one. HTML diffs are noisy and that is perhaps a a real cost.
Um, and then today, uh, Andre Karpathy had a tweet kind of agreeing with this.
You see the whole thing here. Um, he kind of agreed that structuring responses as HTML gets a better response because it's much more visual. He goes in a lot more depth about kind of what the future holds for AI responses.
Um, he said here, "This works really well, by the way. At the end of your query, ask your LM to structure responses in HTML. Then view the generated file in your browser. also had some success asking LM to present its output as slides etc. So like I said then he kind of makes the argument a lot bigger and he sketches it as a progression of output formats in increasing order of human friendliness.
So step one was raw text. So that's an effort to parse obviously. Step two is markdown which is the current default much easier on the eyes much more formatted and clean.
Step three is HTML and this is what we're discussing here, right? More flexibility in terms of graphics, layout, interactability and the format he thinks is forming a new default as we speak. Then he talks about kind of f further steps at four, five and six eventually heading in the end to this kind of interactive generated video, interactive neural video. He calls it diffusion generated kind of an interesting futuristic topic. He goes in to the end talking about this.
So why does vision win as an output channel? Carpathy points out that roughly a third of the human brain is massively parallel processor dedicated specifically to visual input. It is in his words the highest bandwidth type into the brain we have. If you're optimizing for humans getting information, you should be optimizing for a channel that can actually absorb at speed. And walls of text just don't do that.
The practical takeaway from all this is that you don't need a special scale, a plugin, or a new tool to do any of this today. Just add ask Claude um or any agent that you're using to render the output as HTML file instead of a markdown. So the interesting work isn't just asking that. It's being specific about what you want to do with the artifact. What it should show, what it should be interactive, who's going to read it. Treat the output format as a design decision, not as a default. The big payoff is the feeling that you're back in the loop with cloud again.
Instead of just skimming a plan you never have read, you're actually looking and understanding a lot.
And I think that should be one of the major goals for a lot of these AI tools is to enhance your own understanding.
Not just getting to the end of whatever project or whatever task you're trying to do, but actually try to enhance your understanding as you do that. So lastly, I'm going to try to show you this uh giving FOD the same task but having it produce one as a markdown file report and one is an HTML just so you can see visually quickly uh the difference and why the is kind of suggesting you use HTML in this way.
Uh you can see I have two instances of cloud code here and I'm going to give it the prompt research the difference between MTP and Dlash. write a comprehensive report about these differences and present me the findings in a markdown file. So that's going to be the same prompt I give it.
Um, and we're going to do one as a markdown file, one as an HTML file.
Okay, so the oneclud is done and it was faster than the uh the HTML one. So we'll see how long the HTML one takes.
This is the the markdown file MTP versus Dlash and this is for an actual experiment I'm looking to run. It's a comparative report and you can see this is the report.
Um and I I opened it in VS Code which is usually what I do.
Um, so you could see it's comprehensive but it already you can already see kind of the issues with markdown file is like this is supposed to be a chart right mtpd flash um and then what it is but this is almost impossible to read like which part like it's broken up with these little um these little symbols but it's very difficult to read anything because nothing is properly formatted.
It's not a like a proper chart and I think anyone who's kind of used markdown has had this issue they've faced. Um you could see I mean the report itself is good but it's very difficult to read when you have issues like that. Um and it's quite comprehensive goes into the architecture um MTP loss.
So the advantage of markdown is it does have these kind of headers and you can format it a little bit which is why it's better than just like raw raw text.
but it's not quite doesn't have as many uh capabilities as as HTML. So, we're now getting to the 100 line mark. Uh I'm probably not going to read all of this.
This is one of those things I would just skim. Okay, it has enough. So, another head-to-head comparison with this kind of broken chart that is impossible to read conceptual diagram.
Um I mean like this stuff as well. You could see it's kind of like some of these arrows are broken.
It's not a great visual at all.
Like this is I think this is supposed to go here, but it's pointing to get MTPD.
Maybe it's supposed to go there. Like I don't know. Um there you go. The report itself is good. Like I said, it's got sources on the bottom here. Just the formatting and it's just long. If I was probably going to send this to somebody and ask them to read it, they're probably not going to read it. But let's see what the HTML version looks like.
Okay, so it took a little bit longer.
Um, three commits, it takes like two to four times longer. But it did produce this. And I I told it just to format it as an HTML file. So it's not a slideshow. It's just one file. But you can already see it's has a contents here with links to specific parts. Like if you click the background, it goes to the background. Stuff like that. small stuff, but it's it's nice to have that. Um, and then it is a lot of text as well still. Um, if I did tell it to try to format as a slideshow, it would be more uh perhaps less text, easier to see visually, but it has stuff like this that is a lot clearer to see. MTP architecture simplified gives you a very clear format here.
And then what is Dlash? Similar thing, this little simplified thing. What makes it different? sideby-side comparison.
And this, I think, is where you can kind of see a big difference, right, with the this chart right here. Very easy to read. It actually used different colors.
I didn't tell it anything formatting wise. I just said write it as a HTML file. Um, but it actually used different colors here to differentiate. Makes it a lot easier to read and it's actually readable like we saw with the the markdown file, which was just impossible. Um, and then you have stuff like this. they these two separate cards much easier to understand um when to choose which like this is if I wanted somebody to actually read this this report this is going to be much easier for them to see and then some code snippets on the bottom which are much easier to read and key takeaways and then the sources as well. So this I think is pretty obvious that this is a better option in terms of a report. Much easier to read and much clearer.
Everything's formatted properly.
So I think this would be a lot better of a report. And you can continue to format this. The thing about the markdown files, it's pretty much kind of set. I don't know what else you could do. You could probably try to tell it to fix some of this formatting stuff, but there's only there's kind of a limit to what you can do with it. With this HTML file, I can tell it to make this a slideshow, add different interactions.
Um, if you've watched any of my presentations, that's often what I do.
All my presentations are HTML files, usually built in cloud code.
So, you can kind of iterate on this as much as you want to make it more visual.
I'm in my agent right now who has a lot of skills and I'm going to convert what you just saw into a slideshow and then use image gen. I'm using uh GPD 5.5 here to add images that explain the concepts even further. So, we're going to convert what you just saw to a slideshow. And while it's not yet a neural interactive video like Karpathy imagines we might have in the distant future, you can actually convert these HTMLs pretty easily into videos of different formats.
I'm going to be using hyperframes, which my agent skill in my Hermes agent, but there's also stuff like manom video, and I'm sure there's other types of video generation skills and tools out there.
So, this is just one example of kind of taking it to the next level. What we might see if it becomes easy enough to do after we kind of evolve from just HTML.
You can see what we got here. It took it did take a while. That took like 15 minutes, but um this is the the slideshow we got. We got some nice images here that were made. GBT images 2.0 and some of this you'd have to kind of fix, but see much much better a lot clearer. And this is kind of ways you can upgrade if you want to present this, especially if you're providing it to other people and you want them to actually read through it and understand it. You can make it look really nice like this. So this is the same exact data, the same exact report, just hold it to make a slideshow and use images.
And that's what we get. So the last step to kind of push this to the next level is to make this a video.
Okay. So this is the video um I made using hyperframes and it's kind of a vertical short style. Try to give you a little different look from the slides that we had earlier. You see it's the same concept the MTP versus D flash image and this is just a video. Uh pretty good visualization though.
I know a lot of people are are used to this kind of short vertical videos.
But if you wanted to do this to kind of explain to people uh on your team or just the larger audience that you might have uh you can do something like this. And I think what um Arpothe is talking about is something a little bit more interactive. Like you could click on certain terms here, then have like definitions popping up and stuff like that. Something a little bit more interactive. But just to kind of give you an image of what it what's out there today in terms of visual tools for your uh models responses, you could do something like this to make it a video.
And it's certainly a lot more attractive and going to keep your attention more than something like a markdown file. So, not necessarily practical right now for everyday use, but just kind of an image of the direction we're headed towards.
There you go. That's going to be it. I thought this was an interesting topic.
Uh, it's been some conversation on X. A lot of people have been talking about this. I know there's a lot of other smaller formats that people have been recommending, but I think the kind of the benefit of HTML is that everyone knows HTML. It's accessible anywhere.
You don't need to download any special tools or editors or anything like that.
Everyone has a browser. You can open HTML in.
And hey, maybe down the road we get this interactive neural video, which would be pretty pretty cool. But this is where we are right now with HTML and markdown.
But that's going to be it for this video. I hope you enjoyed it. Please leave a comment. Let me know your thoughts on this topic. If you like this video, please subscribe, leave a like, and I'll see you in the next video.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











