Decoupling the diffing algorithm from the event log is a sophisticated architectural choice that prioritizes long-term interoperability over rigid data formats. By leveraging Git's native delta compression, this design achieves state purity without sacrificing storage efficiency.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
HAXY git forge devlog - simplifying the in-repo issues idea even moreAdded:
Hey guys, it's your boy Zach.
Um, this is a new death vlog about Haxi and I'm wearing my Nintendo 64 shirt, which I got from Tar.
Some of you have been incorrectly pronouncing it as Target your whole life. So, this is your chance to finally correct yourself. It's Tar. Okay.
Um this is a uh a devlog about um the the core feature of haxi the uh the git forge I'm working on which is the fact that it stores issues and other metadata in the repo so that you can push all of your project metadata to different servers and they all will receive it.
Um, so this is a this is the most important feature of Haxi and I've been refining it as the days have gone by and really engaging in uh a pretty hardcore design process here. And I know this is cliche, but it's often said that great design is about taking things away until there's nothing left to take away.
And that's what I've been doing. I've been taking things away. I've been making it simpler and simpler. And I found a way to actually make it even simpler. So, first let me explain how this worked before and then I'm going to explain my new refinement of this idea, which makes it even more elegant, I think. And you're going to think it's actually crazy. Um, the new idea I have, I'm going to explain why it's not. Uh, and I only know it's not because I've been obsessing about git internals the last three years. If I hadn't been, I don't I I would not have come up with this idea.
So, um, let's talk about how the idea was before today. So, the idea was, um, and remains that, um, we're going to use git as essentially an event store. We're going to have a special branch in your repo.
And this branch will probably not even have any files in it. It will be an empty branch because we're storing the events in the git commits themselves. If you went to this special branch and ran git log, you would see something like this. You would see these JSON blobs in the commit messages themselves.
The [snorts] reason it's it's doing this is because the events need to form a directed as cyclic graph just like git commits. So we may as well just reuse the machinery of git commits here to form it. Um these these events are going to do things like creating an issue or editing an issue. They're going to have a bunch of data obviously and when you push this special branch to a haxi server, it's just going to process them.
It's going to go through these events one by one. It's going to parse the JSON. If it's create issue, it's going to create a new issue. If it's edit issue, it's going to find the existing issue with that ID and it's going to make the the changes that are specified here.
Um, so this was already a pretty solid idea and I know how to build this. What we're essentially talking about here is event sourcing. We're using Git as an event store, not a database. We're using Git as an event store. This is the canonical representation of your data.
We push it to the server. The server then processes these events and puts them in a form that can be quickly and efficiently queried through a UI. So the database is on the server, the event store is in your git repo.
That's the core of the idea. And that part has not changed. But I made a further refinement of this idea because I realized um this can be even more elegant. So um first of all we have this edit issue event and uh the what it's doing here is it's just a map of changes and what it's going to do is it's going to find that issue and it's going to essentially update the part of the database that has priority status etc. Now, this is fine. You can do this. What what this is is essentially a data diff.
It's not the kind of diff you're used to. It's not a linebased diff, right? Um because we're not diffing text. We're we're diffing a data structure. This is not a new idea, of course. Um there are many libraries that can make data diffs.
And I was probably just going to write my own, but you know, there there are many libraries. This is a a common one in the closure world called edit script.
Um, which I actually made a small contribution to. You can see right there. Um, and it it can diff closure data structures and and create essentially a uh a for a type of patch that you can apply to that data structure to change it. Right? This is not a new idea. You can find stuff like this for every language.
And uh so my plan was for uh for edit events is I was going to do kind of something similar. I was going to create a data diff of what changed and put it right here. And then my code on the server would read this edit issue event and it would essentially apply the diff.
It would look at the changes and it would you know update the part of the database that it needed. And this is completely doable. This is totally doable, but it's not the best idea, I don't think. So, let's talk about my new idea.
So, here's the new idea. What if we eliminate the distinction between creating and editing events?
You're going to think this is a little nuts, but what if there was just a an event kind called issue and you use the same event kind whe whether you're creating a new issue or editing an old one. So, uh you know this event right here is creating an issue with an ID of 9A whatever. And if it's the first time it's ever seen this ID, then this data will uh will just be used to create a new issue.
Now, right above it is another issue event with the same ID.
And uh some things are the same and other things are different, right? But it contains the entire text, the entire JSON of the issue. It does not contain a diff.
So if you uh um if you look at it here, previously the priority was high, the status was open. In the in the next event here, priority is medium, status is triaged, but other things are the same. The the the title is the same, the description is the same. So instead of editing an issue by sending a separate event that has a diff, it's literally just duplicating the issue and contains all of the same uh all of the data um for that issue.
Now you're probably thinking this is crazy, right? Um yeah, I mean this is not that much data, but obviously in the real world issues can get really large.
You can have a lot of text. The description may not be one sentence like this. It might be a freaking novel. And so every time you edit that issue, you're making a git commit with the entire uh data. Again, you probably think that's crazy.
Um, it's not. And here's why.
Um, git treats commit objects just like it creates blob treats blob objects which are the objects that are created when you add a file.
So when you um have a repo and you have a let's say it's a gigantic text file a megabyte in size and you edit a single line in that file.
Um, initially git will just store them as separate files internally with z-lib compression, but when you go to push both of those commits up to the server, it's not going to send those two files to the server separately, it's going to create a pack file, okay, with uh this special format.
And this pack file is able to efficiently create deltas between two objects that look similar. It has two different object types that uh allow it to create deltas. And this is why it can efficiently push a a lot of changes to a file without having to send that file separately every time. Now the interesting thing here is that this delta format works for all object types.
What that means is a commit object.
Let's say you have two commit objects that both have a gigantic commit message, but one of them changed a single bite in that commit message.
Um, it doesn't have to send both of those commit messages separately in full.
It can pro it can apply the exact same deltification that it applies to files that you added to the repo. Isn't that cool? This is actually uh one of one of the things about git that's so elegant is it created this deltification and then just applied it to everything.
So this is my point. You might think it's crazy to uh to just send the entire issue again every time, but it's not because it will it will benefit from the same deltification that files benefit from. Just like it's not crazy to have a big file and then edit one line and then make a commit and then edit a different line and make a commit.
We do that all the time and it's efficient to do that. And it's equally efficient to create two commits that have giant commit messages and only a few things are different. Git will deltify them in the exact same way.
So um this is not actually a crazy idea.
It's uh it's perfectly efficient to have extremely duplicative commit messages.
But that doesn't really answer the question of why why should we do this?
Why why not just have the um the you know separate event types here and let the editing be a lot smaller.
Um the point here is not to reduce the number of event types. That's that's not really a big deal. There's a much more interesting reason why I don't want to include any diffing in the events themselves and that is because then you're it's not that the diffing is not being done, it's that it's being delayed.
So if you can imagine my server, my haxi server receiving the first copy, the first event with this ID 9A whatever it's never seen it before. So it creates an entry in the database and it puts all this data in it. Well, next it receives this issue, this event which has the same ID, but now some of the data is different. What my server is going to do is at that point it's going to make a diff. It's going to look at the data that's already in the database and it's going to diff it with what it sees in this issue in this event and then it's going to surgically update priority status and whatever else is different.
Okay. So what hap what the difference between this idea and this idea is not that one is doing diffing and the other isn't. It's when the diffing happens with the old idea that the diffing is happening when you create the event and it's hardcoded right there in the event.
Right here in the new idea the diffing is happening later on. It's happening when my server receives the event.
And that's the only difference here.
Now, why is it better to delay diffing until later instead of doing it when you create the event? The simple reason is that then I can continuously improve the diffing mechanism uh in my server and it will benefit your your events will benefit from it immediately.
Whereas if I I mean there's more than one way to diff something. there are different algorithms, right?
If I include a special event that provides the diff, then it's hardcoded and I can't change it ever again.
So, um, that's the real difference is that when you delay the diffing, then you can let the server diff it however it wants to.
That's a very powerful idea.
And it's also a huge simplification for anyone else who wants to make a competing server that is compatible with this event store because that's something I really care about. Um I I eventually want this what you're looking at right now to become a standard.
I want this uh particular idea of storing events in in this particular format in git to become a standard.
And I want you to be able to push this branch to haxi but also push it to other git forges and they all will understand it.
And and one way I think to make that possible is to put as few assumptions as possible in the events themselves. So the events should always contain the entirety of the particular um thing that it's modifying. It should not ever contain a diff. The diffing should be done by the server and the server should be free to do it however it wishes.
That's the new idea. So now there's just one event type for now. Of course, there will be others because we need discussions. We need uh pull requests. I don't know if those will be different event kinds or um slight variations of the same event kind. I don't know. But right now there's only one event kind.
It's just issue and you just always send the entire freaking issue every time.
Okay. So um that's the idea. We benefit from Git's um built-in deltification which it performs every time you do a git push and also sometimes on disk as well.
Um I'm all too aware of this because I had to implement it myself in my version control system zip. Um this was probably the most difficult thing I had to do in the entire project was was implementing a um reader and writer for git pack files.
And it's relatively compact. It's uh you know one and a half thousand lines of zig. Beautiful zig I must say.
Uh but it was insanely difficult. Not because the format is complicated. I mean, it's not like a PDF file. I mean, that's a complicated format, but it was very finicky, especially trying to get it to be efficient.
So, um, yeah, it was one of the most difficult things I did in this project, but it it made me all too well aware of, uh, how the format works.
And so, I know that this new idea will be efficient, even though it doesn't seem like it will be.
All right.
Um so yeah and uh now if you compare the old idea to the new idea, some of you, some of the more astute ones out there might realize that this is a perfect analogy to the difference between the two different kinds of version control systems, snapshotbased and patchbased.
Um, if you're not aware about the difference between the two, well, you're in luck because I wrote a document explaining it. I'll link to it in the description.
Um, because I had to understand it for for my version control system.
Um, but I'll I'll give you a quick description now of the difference between them because this is really interesting and uh it it it's the ex a perfect analogy to the old idea and the new idea for for my event store.
So what you need to know is that snapshot and patchbased version control systems are basically two sides of the same coin.
They are essentially the only two kinds of VCS's that can exist because um they are essentially just looking at tracking changes in slightly different ways. Um tracking the states between changes or tracking the changes themselves.
Those are really the only two things you can do. Um, a a good analogy here would be uh if you're making a schedule for your day, how do you do it? Well, um 9:00 a.m. wake up. Uh 10:00 a.m. eat breakfast and uh maybe at at 12:00 p.m.
um go to uh go to the store. I don't know. Well, that's that's uh the snapshot based way of tracking time.
What you could do is is track the changes between the times. You know, you could say uh 9 hours after midnight, wake up. An hour after that, eat breakfast.
Two hours after that, go to the store.
Now, nobody does that. But, um you could actually do that, right? you could track the difference between the times rather than the absolute times. That's how you need to think about snapshot-based and patchbased version control. Snapshot-based version control is like tracking the absolute times. Um, git of course is snapshot based. So, it does it never stores the diffs between two commits. It stores the objects of the each commit themselves.
And then after the fact it it uses the pack format that I just mentioned to create um a to deltify between the objects. But that's purely an optimization. It's never actually storing a diff between two commits. So if I for example in a in a different repo here um if you do get show this will show you a diff of the last commit the most recent commit.
It is not storing this diff anywhere. It actually generated this right when I uh typed get show.
Okay. So, it's never storing the diff anywhere. It's literally just storing all of the files as they existed in this commit and in the previous commits. And when you run git show, it diffs them right then and there.
Um now if you read this document you'll see that neither is better than the other.
There are advantages to tracking the changes between states like Darks and Pool do. Um but there are also advantages to doing what git does tracking the states between the changes.
That's why I realized the ideal system is actually a combination of the two. Um but uh here's where the analogy comes in. It's not even analogy. It's almost the exact same thing. A patchbased system in order to track the changes between the states when you uh create the equivalent of a commit um it it generates a patch. It generates a diff and that's what you push to the server. But that means the particular diff algorithm is hardcoded right there in the commit.
Um you can't change it after you make it un unless you throw it away and make a new one. Whereas with git, it never hardcodes the the diffing algorithm. So it can change it later if it wants to.
So this is an interesting difference between the two patchbased systems because you're pushing these changes to other people. they have to use whatever diff you created. They don't get to generate their own.
Um, and so that's sort of a downside.
The the upside is that patchbased systems are generally better at merging than git is. Uh, they have fewer conflicts, but I'm not going to get into that now because that's, you know, you can read about it in this document if you wish.
Um, there are advantages to both. And I found that if you try to combine the two, you can actually get the advantages of both. But I'm not trying to sell you on my VCS right now. Um, I'm just telling you this is a interesting um parallel to what I just realized about these two ideas.
If I create um a separate edit issue event, I'm now hard- coding the details of the um diffing into the event. Whereas what we really should do is is make this behave more like Git does and just send the entire issue every single time you make even a tiny change. Um like changing the priority and then let the server generate the diff as it wishes and and uh and use that to update the state of the database. I believe this is a superior idea.
it it uh it definitely simplifies the um concept of events because now there's no distinction between creating and editing.
Um the diffing happens a little bit later but it would happen either way. Um it it's just about when the diffing happens and uh I think this will actually simplify uh creating alternative implementations of this idea because you now have control over how the diffing works.
Um and uh and certainly any any tool that wants to create if you want to create your own client side tool for editing issues and creating events, it's much easier for you because now you don't have to care about diffing at all. um when when someone makes a change to the issue, you just generate a new event with all of the content and uh my server will take care of diffing it with the original issue.
So, I think this is a significantly better idea and simpler idea, but it's one that sounds crazy until you dive into VCS internals and realize this actually will be efficient, even though it doesn't seem like it. So, that's my idea. Um, I haven't really implemented it yet, of course, because I just came up with it. I just realized it today.
But um let me know what you think or if you found any holes in my reasoning here. But that's my new idea.
Um events will just contain all of the data every single time. Uh even when you're just changing one part of it. So I hope you like it. If not, screw
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsβ’2026-05-28
How agent o11y differs from traditional o11y β Phil Hetzel, Braintrust
aiDotEngineer
450 viewsβ’2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanationπ―β
LearnwithSahera
1K viewsβ’2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsβ’2026-05-29
Search Algorithms Explained in 60 Seconds! π€π¨
samarthtuliofficial
218 viewsβ’2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsβ’2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 viewsβ’2026-05-29
π BCS613C Compiler Design | Module 1 to 5 Schema Evaluation π₯ | VTU 6th Sem π― #VTU #bcs613c #exam
Pranavaa-y4y
104 viewsβ’2026-06-02











