Software and digital code are inherently fragile because they depend on external infrastructure that can fail or change; when platforms like Google Code or GitHub shut down, all associated URLs become inaccessible, making it essential to archive code repositories and use internet archive references for long-term preservation.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Code Is Incredibly FragileAdded:
Data is very fragile. Software is fragile and a lot of things are not backed up anywhere and they like people have this idea, you know, if it's posted online it never disappears. And you know, it's it's a good thing to live by because you don't want to post things you shouldn't be posting. But it's not actually true.
Like that's not Data can just disappear.
Yeah, so one of the I think a joke we had in our topic in our chat channels in the very beginning of Software Heritage is was from XKCD about digital preservation which was something like digital objects last forever or 5 years, whichever occurs first, right? And but so but the point you're touching upon there is that the half-life of URL references actually is impacted very differently depending on whether we're in an internet which is fully distributed or in an internet which is very much centralized. Essentially, if it is fully distributed, the problem you have is that nothing is You don't buy our source on data, right? You just rent it. So you rent a domain. You rent a server where you have a virtual machine. So at some point your rent for whatever reason expires and so whoever was referencing your your your URL no longer can no longer resolve it. Well, if you are So basically, you can have an organic way in which references are created and at some point disappear. But it's very much, you know, is a smooth distribution of these kind of things. If you are in a centralized internet, what will happen is that your references stay stable for a long time, okay? And here I'm assuming that there is no way on the platform itself to invalidate those references, which there is, okay? It's our conversation from before about tags and changing the history of a post. Let's forget for for that about that for a second. But then at some point the entire platform can disappear. So when I was young, who would have thought that Google Code would disappear? I mean, it it's a forge. It's operated by Google, one of the tech giant. It will never disappear.
And then at some point they decided that they didn't have a business case anymore to operate it, and they shut it down.
So, GitHub today, nobody imagines that it's going to disappear, but it might disappear. I would say that it will disappear. And so, this is when that will happen, you will have a big blip in the off time of your URLs. Essentially, a gazillion of your URLs will disappear at the same time when the platform is shut down. So, it's depending on the shape and the structure of the internet, we have very different effects on the on the off life of URLs. But, we need to to think about that. So, there is no magic solution, okay? But, we need to be aware of the problem, and you know, take measure to minimize it. So, that's why one of my mission with my colleagues in academia is every time as a reviewer, I see a GitHub URL in a paper, I say, "Okay, but that's not stable. Archive it somewhere." And the same is true for URL on the web, okay? If your ever has a URL on the web, use an internet archive reference, not a bare URL. And the same [clears throat] for for software heritage with GitHub URLs.
Mhm. Mhm.
I know you mentioned that like running a mirror obviously is not viable for the average person, but if somebody does want to help out in some way, what can they do? Obviously, they can promote the existence of it, but is there anything else I would love to hear it?
Yeah, so maybe three things. So, one, we we do accept donations. So, we are not we've not run big donation campaigns yet, but this is a way to to support the project, and something that we might want to scale up more in the future as a additional diversification of our source of source of funding. Second point is that as a one of our funding principle for software heritage is that that the software we write ourselves to run the archive is free and open source software. So, we we are really hardcore about that. So, if you go to gitlab.softwareheritage.org, that is our own code is all there, okay?
And people can can help contributing as they usually do with any piece of free and open source software, okay? It's a bit more challenging on average than hacking on your, you know, random web framework because the, you know, the testing something on a small scale is very different than testing something at the scale of the archive, but that's one way they can help. And finally, talking about that is indeed a very important a very important aspect. So, advocacy. So, that can be done either in a completely informal way, so as you're doing. So, it it's great. And by the way, thanks a lot for for this. I know you have talked we've spoken about the Software Heritage Archive in the past already and you're doing it again today, so that that's great. But then, there is also some additional engagement, so people can become ambassador of Software Heritage and then go around and give talks and presentations uh about Software Heritage to wherever their communities are.
So, there are three very concrete ways.
So, donation, technical contributions, and and promoting the project. And also, maybe one last thing, so if you care about the preservation of a specific piece of code, then we do have functions in the web UI to ask for the archival of a specific repository that maybe we have not archived it yet or it's not has not been archived recently or even, as we discussed before, a specific forge that we have not archived it yet. Those are all super useful contributions that people can can make to to our initiative.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsโข2026-05-28
How agent o11y differs from traditional o11y โ Phil Hetzel, Braintrust
aiDotEngineer
450 viewsโข2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation๐ฏโ
LearnwithSahera
1K viewsโข2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsโข2026-05-29
Search Algorithms Explained in 60 Seconds! ๐ค๐จ
samarthtuliofficial
218 viewsโข2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsโข2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 viewsโข2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 viewsโข2026-06-01











