The video sharply exposes the gap between corporate performance marketing and actual algorithmic rigor, particularly regarding GitHub's questionable O(1) claims. It serves as a necessary reminder that over-engineering React components is often a poor substitute for fundamental architectural choices like virtualization.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Git Performance Improvements追加:
You guys ready for this one? The Git perf improvements. I have a story that I want to tell about this, but we first need to understand what GitHub is doing and then I will tell my story. So, GitHub made this post celebrating lapping lapping the old celebration because they said, "You know how you can render 10,000 line diff without melting the browser? By focusing on simplicity."
Now, I do want to throw this out here.
This has been like a multi-year issue at this point that we haven't been able to look at a PR that's larger than a you know a thousand lines of code. And so, this has been a problem going on for an exceptionally long period of time and the fact that they're just addressing it now, though I'm very happy, does kind of make me upset that they're just acting like a Dude, actually it's like it was it's pretty much impossible to do a large PR without melting your browser. Okay? I just want to let you know that it's like impossible. So, the fact that we're doing that like you're an idiot. And the part that really tickles me is here's the here's the like the breakdown.
Here's the big three right here that they did. They reduced React components from eight per line to two. Eight React components per line to two. Now, I get that it's very easy in in the React world to make way too many components.
Wait, okay, so this is pretty ridiculous statement and having two React components per line, I guess that's impressive, right? Okay, cool. Cool.
Because remember every single if you go to GitHub for those that don't know, you got I think what they consider as part of a line. There we go like this. Do you see how there's like on this side there's like a little down thing right here that allows this little menu to come up? And then they also have this little plus button over here. So, you know, every line comes down that way. By the way, how they do I I was actually really curious how they did the syntax highlighting cuz I was like, "Oh, actually that's pretty impressive how they do syntax highlighting." And I went and looked at this and I discovered it's not as impressive as you might think it is, right? So, if we just go like this test query. Hey, there we go. So, this is how they do it.
They send it down from Where are you?
Where are you? No, that's that's the wrong one. I'm looking for the Why aren't you showing up? Why aren't you showing up? There's a big data blob in here that actually show Look at this.
Okay, there's a big data blob in here that actually shows the data. I thought it was Yeah, cuz look, this is the one I'm looking for is this line, but it's going to the wrong line. Well, that's annoying. Okay, well, it's going to the wrong line for whatever reason in here.
But, inside of here, somewhere in here that it actually it actually shows 003c span. There you go. So, this is how they do it right here. Every single token effectively on here is already pre-parsed and it's said right away. So, here's the text or Claude or Claude or Claude sonnet 45. And so, then it does this this little like highlighting for you already. And so, they just have to hydrate the little They just have to recreate those little spans.
So, there's no client-side It's not client-side at all. Yeah. Why? Cuz client-side syntax highlighting would require you to bring down something like tree-sitter or some way to understand syntax of every single language and then from there, you have to Like, I actually think it makes perfect sense. You generate this one like if I were to solve this problem, I think this is actually a pretty good solve. You parse the PR once, you generate the exact HTML needed for each line, and then you just you just walk away, right? To me, that actually makes perfect sense. Cuz or else you have to have something that can tokenize and understand every single language. Well, I mean, it's not a lot of compute. It's really not that much compute. Client-side highlighting would require you to parse the entire thing.
You need a full tree for accurate highlighting. If you pull down part of the source code, you need to pre-hydrate it. Yes. Right? Cuz you got to remember that they I mean, I think tree-sitter could do a decently good job of doing it correct even on like little partial trees. But, you got to remember that when you look at this, like notice that where's the rest of that code? You don't see that code, right? And so, you have to like load the code, which that clearly went I assume that went off and did a server request cuz when I press this, it's not instant.
Pressed. Ding.
That that has to be from the server or this is just the slowest thing in the universe. I just I just I refuse to believe that it's actually slower than going to a server and back. And so, this makes sense. They just do all the parsing once on the server. Everything's there.
Uh they have infinite storage. It's not a big day. Let's see. How to render 10K diff in browser efficiently. You generate the diff and render the HTML template. You store the cache and serve it from cache. We did in the 2000 in PHP. These kids with their SSRs and their SSRIs.
>> [applause] >> It's not the it's the SSRIs that are really they're screwing our performance in our browser, okay? All right. How dare you?
That clonazepam out there is just screwing everything up. All right. So, now that we got that kind of out of the way. So, here's the big thing that they say they do. O of one data access with JS maps. I'm just going to throw this out here.
I somehow doubt. Here, let me just open up. Let's just pretend for a second that this is Do I have a JavaScript file in here? I think I do.
TS, I bet you have a TS in here somewhere. Right, TS. There we go. These are almost exclusively AI generated files in here. So, don't look at this.
It's disgusting stuff. All right.
I somehow doubt that the data access of this thing is that much worse than that thing. And what I mean by that is Oh my gosh. Oh, by the way, they took out this feature and I wanted to try it. Which makes me sad.
So, let's go B.
So, I somehow doubt you using an object is somehow so much performance problems of data access that you had to use a map to speed it up. Like I actually don't I just don't believe it at all. I'm going to be real with you. I I I I actually refuse to believe it. Right? Like I I literally refuse to believe that you're the problem the problem with Git or the problem with GitHub was that they did it slow. Okay, I just have to talk about this cuz this makes me nuts. And remember, I have the story to talk about. So, I don't believe this. So, if you actually go in here and look at this, if you actually go and look at their article, I believe they say uh n.
Yes. In version one, we gradually accumulated a lot of O of N lookups across shared data stores and component states. Okay, so I honestly I don't know how you do O of O of N lookups. Like I actually can't I genuinely can't even figure out how you would do that. I keep thinking about it. I've sat down and over and over again been like, how would one do this? So the story should not be O of one data access with JS maps.
That's not what the story should be. The story should be, hey fellas, we [ __ ] up so bad we accidentally created O of N lookups on a line diff in which we know every line before it starts. No, I mean I understand how O of N works. I just don't understand how you would even accidentally do this. Like, hey bro, I got to look up this line in this file.
Oh, okay, just put use an array. How did how how else would you look it up? A map, but all the keys of the hash are the word function.
Right? Like that's what I'm confused about. I'm just confused as to how this is done. Even lazy prototype. Like you know, someone said, okay, well lazy prototype went into production. Even a lazy prototype, the lazy thing to do would be to put it into an object. You know what I mean? Like, oh, okay. Um here. Here's the file name. Here's the line number, right? Like I don't know how you don't arrive to this conclusion.
Like the most lazy ass version of this would be this. By the way, did you guys just see that? I just hand ripped out that straight line in a frustrated manner. That that that deserves an award of some sort. So I don't even know how you don't just settle on an object with this interface. Like genuinely, I have no freaking clue how you do that. Uh anyways, so this is I just don't know how you don't get here. And so if you used an object, I believe it actually has N or it has a log N. It has log N lookup. So technically, it's not because it's it's actually it uses like what, like a B-tree or uh some sort of red-black tree or some sort of tree underneath the hood to look up the keys, if you didn't know that. I'm pretty sure that's how JS objects work, generally speaking, is that they actually have like a static allocated memory region in which they lay out all the keys and values and remember everything in JavaScript land is JS values, which are pointers to other values. And so they have this all laid out in nice memory and then they have to reallocate these memories to make more room, if I'm not mistaken. be mistaken, but I'm pretty is generally how it works. And so they have to have a fast way to be able to jump into this memory region and points to something.
And so they use something like a tree with offsets. All right, so let's keep on, let's just keep on looking at this.
So let's look at what they did. Our with our team's goal of improving pull request performance, we had three main objectives: reduce memory and JavaScript heap size, reduce DOM node count, reduce average INP and significantly improve P95 and P99 measurements. So for those that don't Let's see, on There we go. Interactive to next paint. Interaction to next paint, so it's like from when you click your mouse to when the next thing happens, for when the thing changes.
Okay, for most users before optimization, the experience was fast and responsive. But when viewing large pull request, performance would noticeably decline. For example, we observed that in extreme cases, the JavaScript heap could exceed 1 GB, DOM node count surpassed 400,000. Why don't they just virtually scroll? It turns out they that is like their That's kind of like their final boss. So all the way down here, they decide O of one data access. And let's do it By the way, just stating that you're doing this makes you look like such a dweeb.
Like bro, there you go. Look, they see exactly what I mean. Comment map looks something like this. Exactly. This is exactly what you would expect it to be, but to to have an entire section being like, "Next, we redesigned our global our global and diff state machines to utilize O of one constant time lookups by employing JavaScript map." Dude, you sound like a dweeb. Don't don't write this. Don't write this. Oh my gosh, just don't even write it's like painful. Like you got to tell me what were you doing?
I got to know. I got I have to know. How did you design something with O of N lookups? I don't even like I genuinely don't know how it's possible. All the science to do stuff file number line count. I know like I don't know how to design it any other way, right? Like I don't know how to design a data structure to look the stuff up other than this. That's it. I could not come up with something better off the top of my head. We optimize our nail delivery system by utilizing a hammer. Yes, this is what's happening. No, use effects is the only way to accept use effect in a code base. Yes. Wait, they're using react for GitHub. Yes, they So here here's what I actually propose for them.
And so they did this on so did it work?
They did let's find out the other thing they did. So they did O of one access moving complex state to conditionally rendered child components. There you go.
So reduce reduce the amount of components simplifying the component tree. So they did some simple simplification of the component tree.
Now it looks like this as opposed to that. Getting better they use tables TR TD based based did it work look virtualization for our largest pull request. Honestly, here's the thing that kind of drives me nuts is that in a pull request, let's just pretend for a second that I I know very little about the system. One thing I can tell you is that any one of these files in here every single line is of the same height. The only time a line changes is when there's a comment. But this is actually an element in between, but you actually can just not consider these elements in between. You don't actually have to think about the elements in between. You could have a virtualized scroll list completely available where it just simply uses line height of in the worst case situation for how many elements you create. And then you just recycle as you scroll. And that's that like that's what I assume they did in the end. And so it absolutely shocks me that this is a thing. Why are the numbers on the left duplicated?
That means they're in both diffs on the same line. this line only exists on this diff. That's why they're not duplicated, they're different, right? Cuz that's 338, that's 358. And instead of scrolling the whole page, yeah. And then you virtualization is mandatory with large lists and components in React, yes. And then not only that, you just have to override what control F does.
And when you control F, instead of find in page is a browser thing, you just have like a little special find right here, which you could make it much nicer anyways. You could be like, "Hey, do a quick regex look up. Make it much easier." You could have like a nice little auto find in here that's really nice. Instead they don't do that.
Instead they just have find in page.
It's just kind of crappy. I can let's see, I cannot rely on line height as the CSS is done by a team in Montreal and they keep pushing in rubbish. That's crazy. That's crazy. Anyways, to me this is very, very funny problem that they have. And I'm sure there's plenty of other ways in which you could solve this, but I always just wondered, why didn't they just make this into a video game? Hear me out here. Why isn't this just a canvas? Why not just use a canvas? None of these interactions are hard. One person over the course of a couple weeks could probably get a pretty good, pretty well working canvas version of this. But my SEO with canvas, text selection, it's I mean text selection is not that crazy hard. I've done it. It's not that crazy hard. I had to do it. It took me like a week to get it perfectly working. We had our own text kerning, we did our own layout and everything. So that's why I wanted to talk about this cuz I actually have some experience with it. I had to do all of my everything on on a canvas. Everything we rendered was canvas-based. We did all of our own text highlighting via canvases. We did all of our own kerning. We did everything. And so when I look at this stuff, it just it does make it does tickle me a little bit because you don't need to do your own kerning because you can just rely on diffs looking slightly different between computers and browsers. You just can use a you know, like you can just use a whatever font they're using. But it just seems so silly that they didn't do that.
Okay, if they didn't do that, why not just use plain HTML? Like why even use React considering you know React is crazy. So why not just use plain HTML?
Like why React? Like, this one component could just be a web component. Each one of these could be some sort of self-sufficient web component, and you just keep it tight. I like I I don't I don't get that, either. Like, why even use any of that crap? So, anyways, they did all that, which to me just seems crazy. But, when I had to do this, when we first started programming and building out our own canvas tools and all this for this exact experience right here, one of the very first things you run into is that when we are doing some sort of uh search, or we had to do a scrolling. We had to place where the scroll was. For whatever reason, uh my boss, who at the time wrote the first one, he'd take each line and calculate its offset and do a loop all the way through to the bottom every time you scrolled to find out where the next line is being placed and where in the scroll port are you. Because this is long before we even had a lot of the utilities you guys do. This was, you know, 15 years ago, 16 years ago. And so, we would calculate our own offset into this as you scrolled. And that was a an a log n operation. And within like five documents, it was immediately obvious, or five pages, it was immediately obvious that this was not going to scale on an iPad 2. Like, there's just no way that this was going to work on an iPad 2. And so, and even if it wasn't, there was no way we would allow this to go out for customers to use ever. And so, for me, it was just so shocking to hear that not only did GitHub release what had to be the single worst implementation, like, you actually have to try hard to write an implementation that bad. I am shocked. I am shocked that you could go out with that bad of an implementation and not go, "Oh, we should fix that immediately." That should be our first thing we fix absolutely immediately, and it was not fixed immediately, like at all. They let it sit for multiple years, or however long it was that you couldn't It's been over a year. Let's see. It was Higher Agency. Dude, it just blows It blows me away how little responsibility they had, because I always got constantly hounded for every single decision I made. You know what I mean?
Like every single decision I made, I had to be so on top of why did I write this kind of stuff this way? Because we're dealing with large text documents and they take a lot of memory if you don't do it correctly. And so even whenever you would click your finger onto the canvas, how did I know what letter you were highlighting? Well, on the back end, I had to develop a data structure that encompassed every single character, every single offset into the document, and then I had to get these relative positions within here and map everything out to be able to do highlights and all that. And all of it had to be done with effectively O of 1 lookup. We used just a series of arrays to look things up.
And so it took like six array lookups to be able to figure out where you were.
And array lookups are O of 1. O of 1 means that they didn't matter how much data there is to search through. It's always the same amount of time. Yes. O of 1. So for those that don't know, I I forget that people don't know these things these days. But when it comes to it's very very simple. Complexity is always big O and then some sort of letter.
Meaning N, it could be log N, oopsie, I was going to do that. Log N, it can be N log N. These are all pretty much very consistent ones or N squared.
Like these are all pretty much the consistent ones you normally see.
There's also constant. And what this means is based on the input coming in, how does your algorithm's running time your space or your speed, how does it grow? So if you that means in general as you increase your input, the amount of time it takes to run will linearly increase. That's generally how you can look at this if you're looking at N. So O of 1 means that no matter what the size of your input is, it's always a constant amount. Now they could be doing a lot of operations to where it's better to do linear for the first some amount of time than it is to do constant time lookup. Like a good example of this is with JavaScript objects, to look up a key on a JavaScript object or on a map is slower than if you have a few items inside of an array and you linearly search through an array. Sometimes it's better to use linear searching than it is to use constant time because that constant could take a while to compute, right? Just because something's constant does not mean it's it's instantaneous, if that makes sense. It just simply means it doesn't grow. After learning about hash tables, I have decided that O notation doesn't make sense. I think it does. Generally speaking, it makes sense. O of 1 is not Let's see. It's also often not just O of 1. It's like O of 5. Yes. That's what I mean is that O of 1 it's it's Well, O of 1 is always O of 1. So, what it really means is you're supposed to specify it this way is that there is a C in front of all of these, which is some constant time multiplier, but that's irrelevant because you're only looking at growth when it when it when it comes to big O. You don't care about Oh, this thing's 5n versus n when it comes to big O. Now, practically speaking, you do care. You care a lot.
That's why using arrays is better than using something else.
関連おすすめ
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











