A clever metadata layer that brings modern attribute-based querying to the minimalist Plan 9 world. It’s a solid piece of engineering, even if it tackles a problem that has remained "to be determined" for decades.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Still Working on the To Be Determined File System (TBDFS)Added:
So I presented the 2B determined file system at the recent international workshop on plan 9. My recent version of it is different from what I showed previously and what I laid out in the uh IWP9 paper. Uh the big change was removing the library category. So the original intent was of that category was if I had two sets of files that both had an attribute that shared the same name but were used differently, it would be a way to sort them out.
Uh, however, when it came time to sit down and start thinking about how I wanted to string together multiple queries, the extra parsing of the library category didn't really seem to be worth it. Um, so now the metadata is laid out like this. So instead, I'll just have an attribute, a type, and the actual key for it.
So before I would have like a library, and then the attribute, and then all that. Um, in the end, I think that the extra sort of um, you know, the thought that's needed to sort of lay out what attributes and the types you have is kind of outweighed by the added flexibility of this.
So, when it comes to actually like um, you know, browsing the stuff, oops, lsnbd So if I just end up like you know browsing through the index um it does sort of make it simpler because you just jump straight to that. So there's one less sort of u level of tree to go through and the same sort of works for the queries. So if I want to do uh let's see lsn tbd query and we can do author pike and there we go we get all the papers with author of pike.
Now I do need to sort of finish up some sort of you know query language for this. It would be nice to have something like a not function to you know not to you know narrow down results. Um, talking with others, I do have some other ideas I may pursue.
Um, for one, I ultimately want to do this as like an on-d disk file system, you know, and I'm mostly aimed at kind of long-term archival storage, you know, for all these files I have. Um, but there is a lot of interest in, you know, how this runs right now as something that sort of uh reads and sorts files that are actually stored on some other u on disk file system. So, right now all my files are actually stored on disk using GEFS. And this just sort of runs on top of it. Um, another idea was uh around how to expose the queries on the system to um basically expose it the metadata in some format that's useful for like another query language. You know, the example I got was um you right now if I read uh the metadata on something um let's just do meta for this one here.
Um, so right now it's sort of laid out in my format, you know, where I have an attribute slash type slash and the key.
Um, the specific one I got was to do it in sort of NDB format where you have the tpples, you know, something equals this, something equals that, and then you could use a NDB or the new sort of SDB functions to um, parse through it yourself. Um, and then another one I had was that actually came up at IW9P was to have something um where I don't point to like a whole file but something else.
I've had a few other people ask for it.
So, you know, say you're working on like a C file programming files. This is like the example that kind of came up at IW9P where you could have something like um you know uh C function and be a type of text and it might be for something like you know Malik. Um, but this somehow would then point to and spit out something that would be the equivalent of like, you know, something plummeable like main C, you know, at line 1 2 3 4. Um, so you could say like, you know, not only is there, you know, a particular function in a file that you could call up and search through, but that you might be able to actually point to a specific location in a file to do that. Um, I did leave some like open-ended stuff in there to have different types of things you point to rather than just files. Um, but yeah, that'll be an interesting idea of something to come up with. Um, oh, and as like I sort of demonstrated there, you know, I don't currently have a really sophisticated query system, but you can do something like you can bind.
Let's see what I have here first. Let's um let's make a new directory in home maker. Um bar I could do something like bind from NTBD. I could do a query directly or author, you know, ori and put that in the bind after.
run another query for author Moody and put in the same place.
And then now if I actually read out I get a combination of all files that have both you know author Moody and author Ary.
So some of these will have like you know two of them. So if I go and actually this be kind of a big one here. This is going to be a whole IWP9 proceeding here.
And so somewhere in here is probably going to be papers by both. So yeah, here we go.
Here's a paper for Arya here.
Oh, he actually had two of them in here.
And then yeah, here's one for Moody.
So, yep, I can do some basic stuff like that right now.
So, I do have this up on GitHub. I'll put like a link in the description to go fetch it. It's currently has quite a few bugs. Again, this is basically a prototype I kind of threw together to have something in time for the uh IWP9.
So, it works like a lot of other kind of file systems. You know, I got my main function here parsing through some standard stuff. Um the thing you kind of give it at the beginning is uh you know this B flag which says what the base that I'm currently pulling from.
Again, I'm not reading a disc. I'm reading a directory um on an already existing file store.
Um you have two options actually. You have B and S. S is for start. If you just have a directory say full of PDFs or MP3s and you haven't used this yet, you can do S and then put the uh path to the directory. And what that'll do is it will make the metad. It'll make copies.
I think that's should be up here.
Yeah, it's in this whole mess here. Um, it'll make the base. Um, it will make the meta directory and then go through and make copies. Um, well, not copies of the files, but make a bunch of files named after the name of the file with a meta on the end. Um, and then in there, I start off by adding the name and length. Um, actually be this one here. I've had some issues trying to add m time. Um, that's a bug I currently have somewhere in when I build the trees. it chokes on it for some reason. So, I have that commented out currently. But, I do want to add kind of all the standard um you know, the standard stuff that that shows up in stat, you know, your your regular metadata, your mime, a time, um user ID, all that sort of stuff. Well, eventually sort of is going to be in there, but then it goes down and does the whole sort of, you know, runs the file system by doing postmount serve here.
So that's going to be my next step is let's get these out of the way be in here and I should be using start for that.
I think it's call it TBD start if I remember right or FS start.
It's either start or attach that I'm using.
Oh, here we go. Yep. Start. So, you can set a start function for any sort of standardized file system that will run as soon as you start it. Um, and this here basically sets up how the file system currently works. Um, so right now I set up a hash table and the hash table's for matching a unique ID to every file to a location on currently a location on the existing file storing you know system.
Um, so it basically takes a unique number and then returns a path to where the files are actually stored on the hard drive.
Um, you know, in the future that would do something like take a unique ID and then actually point to something like an address on the hard drive to go find the file.
Um, I'm kind of like I went with this originally. I might end up changing this to make this part also a tree. Um, but we'll see. I mean, right now the, you know, I'm trying to do as much to stop, you know, right now I'm actually trying to build table sizes using primes and all sorts of stuff to try and like prevent collisions and make it work, you know, smoothly. Um, but in the end, it might just be easier to just scratch that and also use like a B+ tree for it.
So, we'll see. That's, you know, something I might change. Um uh the next is basically sort of going through and um you know setting up the table here.
Um so I build the table fresh every time it runs. It's not actually stored anywhere. The only thing that's currently stored or is that stuff you know the the raw you know metadata. Um you know so um you know this sort of stuff is stored on disk and that's the only thing that currently stores. Everything else gets built in memory every time it fires up.
So the first thing I do is I fill out the index. It'll be in here.
And this builds basically what you walk through in this uh index thing here. So it's a very very basic tree. Um so I go through and I basically parse the metadata.
Um so right now I have that. I try to set up everything to be like 4K blocks because that's Intel page sizes and also what you'd be dealing with for um you know on disk blocks um for the most part. SSDs can get a little weird with that, but most stuff still holds to the uh the 4K standard.
Um so it's going to parse the metadata out into individual lines. Um, and then it's going to build a node with that.
So, this one here will say it'll go through and originally be like, okay, I found a node called title. And it'll make a title node. And then in there, it'll start adding stuff to it. And so, I break it out, attribute types, and keys, and just start building um, you know, first I see if it already exists.
If it does, I can just add it to, you know, a listing on an existing node. If it doesn't, it'll make a new one.
Um, and this is sort of a trick that I do too to make it so I don't have to do a lot of reax, you know, regular expression stuff on um, strings. I take every string that comes in. So something like, you know, this one here has quite the subject. You know, I just copy pasted something out. It will go through and split all these individual words.
They're separated by spaces.
Um and then just sort of you know recursively go back through the whole thing um and insert those as individual you know key values for a given attribute. So that'd all be you know say subject type string and then key it's going to be you know first it will put in the entire string and then it'll add plan and nine and from and bell all individually so you can search for just individual words without having to go through every load up every string and go through it some other way.
Um, yeah, this just goes through and sees if there's an existing node. If there isn't, it'll go through the stuff to add a node. And this runs through and bas builds a basic just sort of index. And um, you know, but the index isn't really a quick searchable thing. Every, you know, something like the node for, you know, title is going to just have everything that has title in it, like just all lined up in whatever order that they happen to be loaded in. Um, the actual queries use the B+ tree. Oops, which a little faster.
And whoops, wrong one. Close too many.
So after building the index, I can then parse the index to build the tree.
So, and that'll be in here.
So I build the tree and basically what I do is I start off at like you know the I do have like a root node that then all the you know it used to be the libraries under that now the attributes under that. So all the attributes are listed in the root node. just starts going through them um and um you know goes through the attribute tree and then just starts going through um something I have I don't really like is that the way I'm currently handling the stuff in the trees I do want to have like unique types and the reason for that is is like you know strings get handled one way integers another and I should be able to do things like say you know search for papers that are greater than the year 2005.
Um, you know, and that makes sense with numbers, but running greater than operations on strings will get you some strange results. Um, you know, not that isn't possible. You know, in the end, all the stuff is just stores stored as ones and zeros on the computer. You can treat them all, you know, like math. Um, but in practical terms, it doesn't make sense. Um, and so I do have to treat a tree made out of integers and decimals and numbers slightly different than how I treat one made out of strings. And so it does lead for a big chunk of code where I have to first check for what it is and then do kind of unique um, you know, tree building code for each one even though they mostly do the same thing. So at some point I got to sit down and think about how to clean this up and try to just use you know one function. Um, and then yeah, there's a bunch of stuff in here for, you know, doing the typical sort of tree stuff. You know, am I splitting a leaf? Am I splitting a node?
You know, at what point does it get too big that I have to do that? Um, and this I spent a lot of time on this because if you get one little error wrong, it makes for some really wonky unbalanced trees.
Um, so Yep. So, it builds the index. It then builds a separate um pool of uh you know B trees for every um you know every key and um yep after that it runs then you're just basically in the file system code.
So you know I have stuff for if you do a read if you or currently there is no writing um you know if you're this here is for like you know creating directories for you know the root you know if you're looking at the index the tree um or different methods for bringing up you know generating tables or not tables um this one uses the um the hash table to just generate flat list for the pool and meta so I'm just pulling from there.
So each one of these has to have their own. Um, so let's see. Let's actually look at this one here. So this is what gets ran. If you look in index or if you run a query, you know, and you want to get back like a directory listing.
So right now I'm doing kind of stuff to go pull the original data from the files to actually fill in, you know, user ID and stuff like that. Again, eventually I want to move that into the actual stored metadata.
Um, so the interesting part really is whole the whole walk process because every time you go through um, you know, just sort of check a directory out. Whoops, wrong one.
Yeah, this is the whole walking process here. You know, if you walk through root, this is what you get. in case index, this is what you got to do to actually generate some output for the index, the tree, all that.
So, yeah, it's kind of a mess right now.
Um, but again, this is just sort of a prototype and as I talk to more people, um, definitely some ideas getting kicked around about what would be cool to add.
Um, oh, another one I was thinking of doing was right now everything just has to just, you know, to make it easy on me, everything just has to sort of sit inside one directory, but what if you could have something like, you know, some sort of I'm using the double struck P to mean like sort of internal kind of things.
But if you could have something like path text and then have that actually be, you know, path to file be in there so that you could build this using um files that are already scattered across your hard drive. Um you know, another one I got was like, wouldn't it be it'd be nice to have something like this to sort your emails?
Um, but you definitely wouldn't want to say, you know, move your emails around the hard drive or put them in some sort of special thing. You know, just leave them where they're at currently on your hard drive, whatever your email program does with them, and then use this to point to wherever those emails happen to be stored and then include the metadata to make it easy to search your emails.
So, anyway, yeah, I'll put a link to this down below if you want to play with it. And, uh, as always, have fun.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











