A sharp reminder that modern software bloat is a choice, not a necessity, caused by our growing disconnect from the underlying hardware. It masterfully argues that true engineering excellence requires us to stop hiding behind abstractions and start respecting the machine again.
Inmersión profunda
Prerrequisito
- No hay datos disponibles.
Próximos pasos
- No hay datos disponibles.
Inmersión profunda
Mechanical Sympathy: The Skill Faster Computers Made You ForgetAñadido:
In 1999, you could run a chat server that held a few hundred people in 64 kilobytes of memory. Today, a single browser tab, idle, doing nothing, can sit on the order of 10 to 20 megabytes.
That is roughly 200 times more memory to do less. The machine got faster, much faster. So, this is not a story about the machine getting worse. It is a story about what we forgot how to do once we no longer had to. There is a name for the thing we forgot. It came from racing. Jackie Stewart, three-time Formula 1 champion, said you don't have to be an engineer to drive a car fast, but you have to have mechanical sympathy, a feel for what the machine wants to do and what it hates to do.
Drive against the machine and it punishes you. Around 2011, an engineer named Martin Thompson borrowed that phrase for software. He'd built trading systems that processed millions of messages a second on a single thread, while everyone else was throwing hardware at the problem. His point was simple, and it should make you uncomfortable. Most slow software isn't slow because the problem is hard. It's slow because the code is fighting the machine, and the author never noticed.
So, here is the question this whole essay is built on. That browser tab using 200 times the memory of a 1999 chat server, is that the machine's fault, or did we just stop listening to it? And by the end, I want to convince you of something that sounds backwards in 2026. Understanding the hardware makes you a better engineer even if you write Python all day and never touch a register in your life, especially then.
There are two famous quotes in this corner of programming. Both of them are used as a license to stop thinking. So, let's read them properly. Donald Knuth, 1974. Every engineer has quoted the part you just read. They quote it to win arguments. They quote it to avoid profiling. Here is the rest of the sentence, the half nobody prints on a sticker. The famous half kills your curiosity. The forgotten half is where engineering actually lives. Knuth wasn't saying don't understand the machine. He was saying know which 3% matters, and you cannot know that if you have no model of what the machine is doing at all. The second quote is darker. It's called Wirth's law. Niklaus Wirth, 1995, in an essay called A Plea for Lean Software. And here's a detail I love.
Wirth himself credited the line to another engineer, Martin Reiser. The law got stuck to the man who made it famous, not the man who said it. Three decades later, look at that browser tab. Wirth won the argument. Hardware sprinted and software spent every gain and then some.
Which brings us back to the 64 kilobytes. The machine didn't get 200 times worse. We got 200 times more abstraction and we stopped checking the bill. Now, I'm not going to tell you to go write a chat server in 64 kilobytes.
That would be nostalgia and nostalgia is useless. I'm going to show you four things the machine cares about. Four questions. And how each one bleeds into the code you already write. Every performance question the hardware actually cares about reduces to four.
Not a hundred. Four. Where it lives, how it's packed, what arithmetic you're doing, and whether the CPU can guess what you'll do next. Hold those four.
Every chapter from here answers exactly one of them. And none of them require you to write assembly. The hardware doesn't read your language. It reads your access pattern. Question one, locality. This is the big one. The one that explains more slow code than everything else combined. So, I have to show you a number that engineers pretend they know and almost never internalize.
These are approximate. They vary by chip. But the ratio is the lesson.
Reaching into main memory can cost on the order of a hundred times what a hit in L1 costs. Your CPU is not slow. Your CPU spends most of its life waiting for memory. Read that again. The fastest part of your computer mostly sits idle waiting for data to arrive. And the thing that decides how long it waits is not your algorithm's big O. It's whether the next byte you need is already nearby. The CPU doesn't fetch one byte.
It fetches a whole cache line, typically 64 bytes, betting you'll want the neighbors. Walk your data in order and every fetch pays for the next 63 accesses. Jump around and you pay full price every single time. Same matrix.
Same total work, same number of additions. The only difference is the order of two loops. On a large matrix, the second version can be several times slower, and a profiler will just tell you the loop is slow. It won't tell you the loop is slow because you're insulting the cash. Python. Same machine, same cash, same bill. And before you say, "I write Python, I don't have loops like this." You do. Every time you iterate a NumPy array the wrong way, every time you access a Pandas data frame column by column instead of row-wise or vice versa, you're making this exact choice. You just can't see the cash line. The machine still can.
Question two, density. If locality is about order, density is about how much you can fit in one fetch. Pack tighter and more useful data rides in on every cash line. This is the heart of something called data-oriented design, and it's a quiet war against the way most of us were taught to model the world. We were taught, "An object is a thing. A particle has a position, a velocity, a color, a flag." So, you make a struct with all of those, and you make an array of them. Array of structs. It reads beautifully. It is also often the slowest possible layout. When you update only the positions, which a physics loop does every frame, the array of structs version drags the color and the flag and everything else into cash along for the ride. You fetched 64 bytes and used 12.
The struct of arrays version brings only the floats you're touching. Same data, restacked, often several times faster, and the source barely changed. This is why game engines, databases, and high-throughput systems quietly abandoned the textbook object model years ago. Not because objects are wrong, because the machine charges rent on every byte you drag in and don't use.
There's a second move in the same family, and it's older than all of us, packing flags. Suppose you have four Boolean states for a thing. The obvious way is four Booleans, four bytes or more depending. The machine's way is four bits in one byte. Four states, one byte instead of four. Multiply that across a million records and you've shrunk your working set fourfold, which means four times as much of it fits in cash, which loops you straight back to chapter four.
Density and locality are the same fight wearing two coats. The cache doesn't reward clever code, it rewards code that wastes nothing. Question three is the one I expect the most pushback on, so I'm going to be honest about it, including where the old advice is now wrong. The old story goes, floating point math is slow, integer math is fast, so represent money and physics as fixed point integers. In 1999, that was true and it mattered. In 2026, on any CPU with a dedicated floating point unit, which is every device you're likely deploying to, that speed argument is mostly dead. Modern floating point units are fast. If someone tells you to switch money to integer cents to make it faster, they're optimizing a problem you don't have. I want to be clear about that because half the internet would tell you otherwise and they'd be wrong.
But fixed point didn't die, it just stopped being about speed. It's about two things the machine still cannot fake for you, correctness and determinism.
0.1 + 0.2 does not equal 0.3 in floating point, it never has. On a single sum, who cares? Across 10,000 transactions in a ledger, those tiny errors compound and now your books don't balance and an auditor is asking questions. Money is never a float. That's not an optimization, that's correctness. And determinism is the other half. In lockstep multiplayer games, in financial settlement, in anything where two machines must compute the bit identical answer, floating point betrays you because the same expression can round differently across compilers and hardware. Fixed point gives you the same answer everywhere, forever. So the lesson isn't integers are faster, the lesson is the type you choose is a claim about what the machine guarantees. Speed is the weakest reason to choose it, correctness is the strongest. And you only see that distinction if you understand what the hardware is actually doing under the float. Question four, prediction. This is the strangest one because it means your CPU is gambling on your behalf and you can make it win or lose. Modern CPUs don't execute one instruction at a time and wait. They run a pipeline dozens of instructions deep doing future work speculatively. When they hit an if, they don't stop and wait for the answer. They guess which branch you'll take and start running it. Guess right, free speed. Guess wrong, the pipeline flushes and you eat the cost typically on the order of 15 to 20 cycles gone. Here's the most famous demonstration in all of performance engineering and it looks like it can't possibly be real. Read that carefully.
Compile this without optimizations turned up and sorting the data first, adding work, makes the loop run several times faster. Not because sorting is magic, because sorted data lets the CPU's guess come true almost every time and a guess that comes true is free. At higher optimization levels, modern compilers may generate branchless code and eliminate the predictor entirely and that's actually the point. The machine scheduler is smarter than it used to be, but it is not omniscient. Write enough unpredictable branches and you find the ones it still misses in a profiler, not in a textbook. You will not write this loop, but you will write code with predictable branches and unpredictable ones and the machine quietly charges you for the unpredictable ones whether you know it's happening or not. Now I owe you the other side because if I don't say it, the most experienced person watching is already typing it. All of that is true. If your function spends 99% of its time waiting on a database 3,000 km away, restacking your structs saves you nothing. A network round trip can cost on the order of 10,000 cash misses. Optimizing the cash there is malpractice. Knuth was right. Most of your code is the 97%. So, here is the turn and it's the whole point.
Mechanical sympathy is not about hand optimizing everything. Understanding the hardware is not the same as fighting it on every line. You don't tune the 97%.
You recognize the 3% on site and you can't recognize what you can't model.
The engineer who understands the machine doesn't micro optimize more. They micro optimize less because they know exactly where it's worth it and they stop guessing everywhere else. The model is what lets you tell Knuth's 97% from his critical 3%. Without it, you're just superstitious, sprinkling cash and async like salt and hoping. I'll tell you why I think we stopped feeling the machine.
It wasn't laziness. The abstractions got good. Garbage collectors, JITs, managed runtimes, they're genuine engineering and they let us build things the 1999 version of us couldn't. I'm not nostalgic. I would not go back. But the cost of a good abstraction is that it whispers a lie that the thing underneath stopped mattering.
It didn't. The cache line is still 64 bytes. Main memory is still 100 times slower than L1. The CPU is still gambling on your branches. All of it is still down there, running every line you write, whether you've ever looked at it or not. That browser tab using 200 times the memory of a 1999 chat server? The machine didn't get worse. We just stopped checking the bill because the abstraction told us we didn't have to.
We can. We forgot we could. The machine never went away. It just got polite enough that we stopped listening and that's the part that should stay with you in 2026. There's a lot of anxiety right now about what's left for an engineer when the tools can write the code. Here's one answer. A model can auto complete the loop. It cannot feel the cache miss. A model can write the loop. It cannot feel why this version stalls and that version flies because that feeling lives in a model of the machine, not in the text of the code.
That friction, that judgment about where the 3% hides, that's the part that doesn't compress into a prompt. If you want the longer version of that argument, the decisions that stay human no matter how good the tools get, there's an essay on this channel called four architectural decisions an AI will never suggest. Start there.
Videos Relacionados
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03











