Blum masterfully distills complex matrix theory into actionable engineering, making high-level model optimization accessible without losing technical rigor. It is a rare, no-nonsense bridge between academic research and practical implementation.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
LoRA Fine Tuning AI ModelAdded:
Hello. Welcome on in everyone. Happy Friday. We're going to be building some AI today. Low ranking adaptation for fine-tuning. This is where we use PyTorch, import a model, and then we can fine-tune it. We can fine-tune it to make it better or to make it make it follow our new training data. Like we give it more information. The neat part about this is uh it sounds complicated, but it was actually really simple. I was looking at this over just this morning and I'm realizing, hey, hey, how's it going there, Sigma Edits? Welcome on in.
Good to see you. Happy Friday. We're going to be doing some low ranking adaptation today. Have you ever done that? I haven't. We're going to try it out. We're going to try it out today.
Let me get the link shared here for our Discord so everyone knows. And we're starting to stream in a few minutes.
We're starting right now. You have You haven't either. Okay. Well, today is going to be the day. All right. So, let's go paste at everyone.
Here we go. All right. I'm excited. I'm really excited for this because there's It seems really powerful.
It seems extremely powerful.
There's also other things. Hey, Crafty anime. Good to see you. Nothing.
M. What are you doing? What do you mean nothing? Good to have you here. Happy Friday. We made it to the weekend. We made it to the weekend. All right, let's see here. I want to So, I've got an example that I pulled up here. And hey, Nea, me sitting down to do some side project. Steven making Laurel stream.
Yeah, that's what we're doing. Yep.
Yeah, I got a I got an example here that I think we can follow and it will be really easy. In my head, I've got it.
It's pretty straightforward. You just have add some layers into the to the output one of the um essentially parameters, right? You got a couple a couple of matrices and you take the outputs. The matricy needs to match the outputs from the previous model and then you can pipe them into a dimension where it has a lower rank, right? a lower rank and then the output you need to restore the original output destination. Hey Bonzupi, good to see you. Welcome on in.
Happy Friday. We made it to the weekend.
Crafty. Uh oh, you're so tired. Oh, hey, no worries. Hey, can I ask a question?
Absolutely. Sigma edits, what do you got? What's your question? How do you advertise for a product or solution? I find it expensive to pay someone to self-promote seems bad. So, what should I do? How did you advertise your company at first? Yes. Okay. So, it's called OPA. This is neat. Opa. It's called other people's audiences, >> which is Oh, you your internet went bad.
Oh, it did. Oh, crap. Okay. OPA is approach where you take where other people exist, other communities, and you integrate into them. Right. So, for example, let's go to PubNub. We have got a a PubNob React Native example here. This will get you started using React Native and PubNob.
Then we integrate into that community and then we work with the community to make it better so that way it perfectly fits into their community. That's what we did. Can't even watch live properly.
Oh, Crafty. Oh no. I hope everything works out. Hey, Bonzupi. Howdy. Good to see you. Welcome on in. Happy Friday.
I've been awake since yesterday. And my internet sucks, too. Oh, does it you guys? Hey, dreamer. Welcome on in. Good to see you. We made it to the weekend.
Wait, what? You are starting today. Yep.
Hold on. Let me get my other PC. I'm going to code with you. Ah, Michael, that sounds great. Yeah, we're going to do some Laurel fine tuning today. It's actually pretty straightforward.
>> My company is using tags to see percent of PRs done by AI. Are they really?
Uhoh.
>> Oh, wait.
>> Some other music was starting there.
Let's see. Is that going to start? We're good. Are we good? Are we good now?
Okay, we are. Okay. I'm very sleepy. Oh, really? Hey, Alexander. Good to see you.
Welcome on in. We made it to the weekend. Hi, dude. Good to have you here.
Let's see here. Um, see. All right. So, let's get into it.
So I've got we've got an existing model here and I kind of want to laurel fine-tune it. I think we could model see model.py.
Here we go. So the idea is so we've got these extra these this output layer here. This is the output layer with Laurel. You're going to create unbiased additional parameters. So essentially a linear layer without the bias self dot uh laurel a which is going to be the input. I'll say laurel n equals torge.n.parameters I think. And then it needs to match the output side which number of words maybe I think.
Yeah. Is it number of words 32? Yes.
Number of words. And then you have a rank.
So here's the laurel side of it. Low low rank adaptation.
And then you have to freeze these these these layers. So basically we need a fine-tuning mode.
Hey dreamer first like. Hey nice. That's great to hear. Loading again and again.
All right. I hope it works. I hope it works.
I'm lazy Bonzupi. Wait what? Oh. Oh.
Yeah. How do you mean done by AI? Like I hate you don't like typing a lot of projects. I let AI type it for me. Yeah.
That's what we're doing now, isn't it?
That's what we're doing these days, isn't it? Is that AI doing a thing for me or not? I don't know. Hey, Mark Lemon. Good to see you. Welcome on in.
Happy Friday. We made it to the week and you're here. You made it. Happy Friday, Mark Lemon. Thank you for clicking the high button. Uh, no, Stephen, that's wrong. It's a transformation of the weights. Really? Are you sure?
The entire point of Laura is that you specifically don't add layers. Uh, so it's Do you have an example? because the one I was walking through, the one I read through specifically was about uh so h how how would Okay, so I know you'll have to freeze the layers on a model when you're training when you're doing fine tuning and then you have some new weights that you you there you you have to have something new there cuz if the model's frozen, it won't be able to do anything.
So there's got to be something new.
The entire point of Laurel is that you specifically don't have ad layers. Which Linux distro currently? Uh Crunchley.
What do you mean currently? Well, for me, I'm running on uh Darwin. So, you name a There you go, Darwin.
Let's see. My company is targeting 80% of all PRs co-authored or reviewed by Claude Pilot. If you don't hit that target, we let you go. What? That's crazy. Like at least reading the paper in front of me, you replace a linear layer with a laurelator.
You replace it. You replace it. Okay.
So, you basically make a new one. You train it and then you disregard the old one. Is that what that is? That seems not like fine-tuning though, right? I'm thinking like Laurel finetuning.
if I'm going to do it Laurel where I completely replace the model.
All right. Hey Sergio, good to see you Sergio. Epic Blox. Good to see you.
Happy Friday. We made it to the weekend you guys. You've been using Nyx OS. What is that for a past few months and it's a few years before using Fedora. Before that you were using Arc Linux and before that I was using Debian. You did Bonzupa. You're using anime all over the place.
There is a bias in there. AB. So AB would be start at zero. Yes, I understand that part. I understand that part. So parameters we go uh torch.tensor uh wait torch zeros something like this, right?
Uh la I think something like that, right? And you you're having bias though in yours.
You're having you're adding bias. The point of the W is the old weight. Okay.
So, hold on. See here. W + A * B and then X + B. Okay, let me let me write that down. We'll try it. We'll try it. W, which is going to be linear, too.
Or I I suppose I assume plus A * B X + B, which is the input, and plus a bias.
Now, is the bias the original bias? And A and B are going to be the laurel, the new laurel layers. I there's new Laurel layers. I knew there had to be.
So instead of normal fine-tuning where W is learned, you're adding a change with significantly less parameters. So it's more data efficient. Yes, I got that part. I knew that part.
So we're going to have a a lower rank rather than So let's see. Let's say our rank is eight.
Use use Arch Linux with Hyperland. All right, we're getting some This is some good one. That's a good one. That's like if you want the most powerful use of your hardware and you want the most efficient running system and and make it look nice too with Hyperland.
What's the difference between Java and JavaScript? Hey, Epic Blox. I remember you mentioned that, right? Okay, so Java and JavaScript are completely separate languages. They're completely independent, right? Java came out first and was an object-oriented programming that compiled to a a bite code that ran in a Java virtual machine. That is its own complete separate thing. JavaScript is a just in time compiled language that was originally called Mocha by Brendan Ike out of Mozilla.
uh through marketing reasons. I believe when Oracle, maybe Oracle did this, maybe there's some Oracle businessy things that happened, I forgot, they renamed it to JavaScript. So Mocha was the original name of JavaScript. They renamed it to JavaScript because of marketing purposes. It has nothing to do with the actual Java.
Slack word. Oh yeah. Hey, how's it going there? Uh what is the stream about?
We're going to do some laurel fine-tuning and Nea is giving us some pointers. Essentially, the point is that you use a small uh perutation to the existing weights. Yeah. So, we've got our existing weights here, right? We got our existing weights and then we're going to add this little small one. And I'm calling it Laura in, Laura out because I think that makes more sense to me.
And then I believe this is going to be rank. And I think these are rand. I think it's rand rank. And then the output will be the number of words. So it's it scales it in and out, right?
In and out. All right. So I think that's a and b, right? That's a and b. This is a and this is b. And it looks like this.
Yes. B is the original bias. Oh, B is the original bias.
Yes. Okay. Got it. Got it. We're figuring it out. We're figuring it out as we go, you guys.
Uh, Nix OS is a Linux distribution. Oh, got it. Okay, understood. I've never I never heard about There's so many Linux flavor distributions, you guys. Let's see here. I'm catching up with the chats. Catching up with the chats, you guys. Stephen, can you please share your code that you are starting for Laurel today? Absolutely. Absolutely. So essentially normal fine-tuning can be thought of as W plus new X plus B. Okay. For a laurel new is a product of two very low ranking matrixes matrices. When you say low ranking does that just mean smaller?
Is that what that means? Does it just mean smaller? When we say low rank why can't we just say smaller matrices?
Void is the only one that list I haven't tried yet. Oh, Bonsupi. Okay, so many options available, right? There's so many options.
Uh, Nan Arc. Wait, I recommend opening up the paper for reference. All right, let's actually do that. Let's do that.
Let's open up the paper, you guys. So, let's do Laurel finetuning R archive.
Here we go.
All right. Is there going to be a nice pretty? Let's do a PDF here. Okay.
Let's take a look. Let's take a look.
Pre-trained weights.
You've got A and B with a low rank.
These are input data. Oh, you pass the input data identical to both sides. Oh, so it's got to match the input D. Oh, okay. Wait, wait, wait, wait, wait. Oh, and then we add them together and it's the new output.
Oh, I get it now. All right, I'm getting it. I get it. I get it. I get it. Thank you. Like and subscribe, Mr. Bumpy Pickle. Good to see you. Happy Friday.
Welcome in. See, I was right. Root.
Yeah, Michael. I thought you would be using different data specific to your use case.
Different. It's this. So here the part that I I I assume that's here's the part that I don't think that makes sense to me. All right. If we're fine-tuning the model, then we need to keep uh both we need to keep both things together because the original model is going to hold all the previous data that we've trained into it. The new the new model that's on the side which is the laurel, right? The low ranking the smaller matrices, right?
It's just less data, just smaller number of parameters. You guys lives over here on the right side. They sit side by side. It's input on either side. Rather than just adding a layer onto the bottom, bolting on the bottom or top, it sits on the side and it works in tandem.
So, there is extra. There is extra.
Page four of the p uh paper is the equation. Page four. Page four. All right. Let's go into page four here.
It's gonna be some fancy math here, isn't it? Some sort of math. Is this it right here?
Okay. AB X. Okay. Oh, that's easy. Yeah, that's easy. Oh, isn't that great? Look how simple that is. Oh, that's so easy, you guys. It's like multiple hundreds of years ago. Wait, uh why why we say rank?
I don't know. I I guess he named it the thing. Hey, right. Exactly.
Rename all these things some fancy name.
It seems really awesome. It seems amazing.
You can inference take AB matrix you started with and just add it to the weights. You just add it. You just do some addition.
I'm sorry. It just seems so simple and it has such a advanced complex terminology.
Low ranking parameterized update matrices. What is this? What are we doing with words you guys? What are we doing here? Just add the just add it together. Just add.
Has anyone uh tried Ader for CLI against your local LMS? I haven't. Crow engineer.
Wonder if anyone else has though. But you won't have that original data that was trained. Yep. So you have Exactly.
You'll just have to have you have to use it with the base model and you have to include your laurel layers as well together. You don't get to just throw away the original model and you don't have to retrain your low rank either with all the original data which is one of the nice things about this approach.
So easy. Too Exactly. Too easy now.
Crafty. Exactly.
Hello. Just discovered your channel.
Hey. All right. Welcome on in. Good to have you here. Happy Friday. We made it to the weekend. You joined the right channel for software engineering. We're going to be doing some low rank adaptation today. We're going to try it out after watching this stream.
I'm trying to be present, but chat, I can't stop myself from coding. Hey, it's all good. That's what we're here for.
That's what we're here for. We're here from coding. So, the output of Laurel just new weight to the same model architecture. So, it's really nice for inference people. It's really nice for inference people. Yeah.
Good old inference people. It adds just a little bit extra. It's not free. It's you can't say it's free because you're doing extra steps, right? There's an extra matrix multiplication. Well, actually there's three matrix multiplications here, right? Your input A and B and then it's addition. So that counts as multiple operations to me. No, it is literally free. How?
No, wait. What do you mean?
Hey, Sequence K. Good to see you.
Welcome on in. Good to have you here.
I'm second year CS focus on artificial intelligence, machine learning. Your kind is interesting. I'm looking forward to learn. We're going to learn together because you're mutating W.
It's supposed to be frozen though.
It's frozen. You can't mutate W, right? It's frozen. The gradients are disabled.
Like you make a new weight, call it n equals w + ab.
What's n? I'm missing something here.
Ah, welcome to the fine-tuning universe of madness, Stephen. Hey, Mr. Dunle, good to see you. Welcome on in. Happy Friday. We're trying to figure this out.
My understanding is that it just a way to create your own custom LLM without creating and training one from scratch.
Yeah. Yeah. Yes, it is. It's it's one of the fine-tuning methods, right? So, there are several approaches. One is you can just bolt a layer on the output so that way you add more and you add more weights into the model.
Basically, you start with all ones. So, it's a path through initially.
So, that's one option. The other option is low rank where you have smaller matrices and then you just add them on to the the end, right?
This is what I'm understanding. And so uh then h equals weights times yes and it can be written as nx h. And that n is now exact same size as w.
I I'm taking your word for it. I don't get it yet. It's not clicking for me. I see in my mind right now I see extra steps or is it the same? So see delta of weights x is same is equal as this.
It's so it hasn't fully clicked to me yet but it's not free. See that's what I don't exactly. It seems like it's not free.
Yes. You're seeing training which is not free. Oh, right.
Okay. So, let's go. Let's go through the inference is free.
Uh, let me let's let's Oh, no. That that didn't like that. Reload. Okay. Here.
So, during So, you've got your original model right here on the blue side.
Here's the original model. All right.
And then over here on the other side, this is our laurel, our low rank adaptation, which are going to be a smaller set of matrices that's going to take the same input shape and the same output shape.
And the resolution here is the rank, which is the lower part, right? So it's smaller because in your inference you change w instead of doing sum basically w and then you run that. It looks like you're still doing extra steps because it's right. It's extra step. So it's wx right versus w x plus abx right.
Am I seeing it wrong? I feel like I see you're you're saving new weights.
So only this right here.
Oh, call it n plus. Okay.
N is the new output. Here's the problem though.
If you do that, the pre-trained weights, you still have to don't you still have to pass all of the data that's ever been trained on the original model through So that way you have where cuz there's only weights here.
There's no extra weights anywhere else.
You're getting input and output. So N can't be No, no, no. N is the new weight. Where's N here? I Well, I see N right here.
I do. I see N here, which is A. And B is a vector or is a a matrix with zeros. all zeros.
You do not pass the all the OG data.
Right. Exactly. That would be the point of pine fine tuning, right? You wouldn't need to pass in all the f the original data.
Not that N is just normal distribution.
You chose a bad name.
Okay.
Well, I feel like this makes sense to me. I feel like I understand it. I just I'm don't Do you need the original model and the Laurel Laurel weights like call it a new weight T? Okay.
So, what what do you keep here? I feel like you I feel like you need both cuz you can't just reintroduce the entire data set to train this model over here.
You need both because you're fine-tuning, right?
If you have both, you're fine-tuning. So you can't not throw this away and you cannot throw this away. You need both of them together or otherwise it won't work.
For inference, you need just T. You can compute T. T= W + AB.
Uh and that is then for any input X you do TX. I get that. That makes sense to me. You still have w and then you have a uh a and b and you need those in order to make t right. I know you have tx which is now your your new function or your you function for so the point is you just save dxd you save a what uh you can't get w out of t.
Well, you wouldn't, right?
R R is rank. R is rank. It's just a lower dimensionality. It's to say that the the matrices are going to be smaller. Hey, good to see you. Welcome on in. Happy Friday. Good to have you here. We're currently debating some Laurel stuff here. Uh in a way that helps me understand what's going on. I feel like I fully get it. There's just some some architectural pieces that are not all the way clicked yet. Our reparameterization we only train A and B. Got that? Understood. These are frozen weights.
Math final boss. Exactly. Exactly.
It like basically inference you have function that does X. Yeah. So we get our X input which are our features or input data. You pass that over into the original model. And then on the side you have this low rank model. You pass that you pass the E features in here. Initially, you'll get zeros out because B is a bunch of zeros for P and model parameters. PW plus equals laurel weights.
And you save a new model. That new model is the Laurel model.
And you only save do you still need both? You still need both models, right?
You don't save the model. Actually, I think you save the adapters. Yes, exactly. Exactly. Only save the adapter.
Only the orange boxes over here. We only save this side. We don't need to save this cuz we already have that. You need neither?
Really? Okay. Wait. Uh, wait. Um, hold on.
You don't save A or A or B or W. Okay, I know you don't save W, but you don't save A or B.
Uh, N 0 sigma 2 is Gaussian distribution is what I s I saw that here. I saw that right here. It's trying to zoom in on this PDF. That's so you get your go G go Gsian distribution here.
And T is just their sum. Okay, you save a new thing that's called T. Laura helps avoiding catastrophic forgetting. the biggest issue in fine-tuning for LLMs.
Okay, got it. Then how do you use T in this situation to do a feed forward? Where does that come in?
Cuz I get the forward. I understand the forward part. It's just that is there going to be a new kind of forward afterward?
I feel like I mostly got it. I'm just not understanding this final bit here.
You just feed forward normally. Look at page four of the paper for W times and W here. No additional inference latency.
All right, let's let's look at this part here. Zoom in zero.
No additional inference latency. Okay, let's read this let's read this little paragraph right here real quick.
So mean is zero. Yes, I got that. Yes, that makes sense. Initially, it's zero.
All right, let's read this. All right.
When deployed in production, we can explicitly compute and store. Let me zoom out a little bit here so we can see a little bit more on the screen here. Okay. W= W + A * or B * A and perform inference as usual.
This looks like what I was thinking it was. I don't see how it's different.
Note that both W and BNA are in rank uh DX DK when we need to switch to another downstream task. We can recover W by subtracting B and A and then adding the B prime and A prime. A quick operation with very little memory overhead.
Crucially, this generate guarantees that we do not introduce any additional latency during inference compared to fine-tuned model by construction. Okay, so we've got that in the paper.
Didn't It's not It's not clicking yet.
There's a piece missing. I'm missing a specific part.
The original is still there. Okay, so we still have the original model. We prevent fine-tuning on the original weights, right? So, we freeze the original model weights.
We fine-tune a smaller weight through low rank weight. So, we add on A and B.
Add and add that smaller weight to the original weight.
I Okay. So, uh yes on the output on the output. No, you store that new W and keep that W forever instead of the old one. So, here's the problem. Here's the problem.
Because the promise of Laurel says that you do not need to duplicate the original model because that would be very expensive, right? So, say you've got a big ginormous large language model that takes up like a uh 500 GB. If you want to make, you know, a million varieties of those fine-tuned models, you don't want to store a million copies of that 500 GB model. You only want to store a million copies of the little tiny adjustments that you made in Laurel, right? That's one of the promises and advantages of fine-tuning in this approach is that you can easily keep the original model and you don't modify it at all, but you still keep it and run it. And then you adapt like you had an adapter under the side.
So Stephen, you're thinking of it as an inference being the sum of two layers, but they optimize to have to to just one linear layer. Oh, sure. I mean, that's I assume that's fair. If you're thinking of it like a newer variant where some of it's quantized, then yes, sometimes you add latency, but you can also requantize it and that also works. Requantize.
Wait, I've heard of quantization. What is requantization?
Quantization versus recquantization.
Oh, it does. That's not a thing. Did I type that right?
Is that a thing?
Did I type it right? Re-quantization.
Is it worth back uh to the MSC in AI specialization or just learn it off the internet? Uh hey hexgrad uh ex hexgrade.
I recommend learning everything on the internet. I just recommend it. I recommend learning as much as you can on the internet. And if that doesn't work, then go to plan B. All right? Cuz plan A is free. learn it all all on the internet. Start there first.
All right. Ne, but thank you for the hearts you guys. Appreciate it. All right. Like quantiz quantize the existing weight plus laurel. I don't know if they have proper name for it. Okay. Learn it off YouTube. There you go. Hey, how's it going there? Roll Rolex. Hey, Rolex.
Good to have you here. Happy Friday.
Welcome on in. We're doing some fine-tuning with Laurel low ranking ab adaptation. And I'm trying to do I'm trying to do it the way I think it needs to be done here.
So we've got a rank we've got a number of words as input and the number of words as output. Right? So our input here number of words number of words out.
And then we use this.
Let's see here. Thanks Mr. Blum. Hey.
All right. Good to hear it. Good to hear it.
And archive. Yes, archive. Exactly. It's called Q Laurel, which is the more advanced. That's more advanced. This is just regular Laurel. Uh, we've got or OG Laurel.
Stephen is low ra.
Oh, Lorra. Yeah. Oh, so right. Low ra.
Lorrah. low rank adaptation, right? Low ra low raw. Anyway, Stephen, uh, optimization isn't that big of a deal for algor. It's just a thing you can do in the end. All right. It's just important as performance of the thing, which is why we use low raw.
All right. Good to hear it.
See, it's like Lorra, the name like Lorra from Tomb Raider. Oh, Laura, right?
That's what I thought it was. That's the way I was saying Laura right from Tomb Raider. Yes, Tomb Raider.
I like I like Tomb Raider. I like Tomb Raider. That was I like the OG. I like the OG. the OG you guys. Who likes Tomb Raider? Do you like Tomb Raider? Did you play the OG? All right, start poll.
You play OG?
Have you played the original?
I'm going to type up the code in Discord so it's more obvious why it's free.
Thank you. That sounds great, Nea. Thank you. Thank you. I really want to see. I want to see it. Oh, your speech text is showing it weird. Laura. Yeah, there's it's not except Yes, Laura. Like Lorra, I see I see what you mean. I see what you mean. Hey, Bro, good to see you.
Welcome on in. Happy Friday. We're doing some Lorra fine tuning today. Good to have you guys here. All right, so this is this is the way I understand it. So, uh, rank is going to be the definition of the middle part of our of our matrix here. So, we got matrix one, matrix 2.
We'll get A and B. And we're going to do some some interesting stuff here. I call it Laura in, Laura out. Although, I feel like A and B is supposed to be set.
You're supposed to say A and B, I suppose, here based on the paper. Little bit of A and B in the paper here. Let's go up to the output. All right. So, yeah.
And then what do we save?
What do we save? Let's see here. Let's keep going.
What's happening today? We're doing good, bro. Thank you for asking. We're doing some uh low rank fine-tuning.
And let's see. We're just kind of reading through the paper so we understand it cuz it's it is really simple. From what I'm seeing, I think it's really simple.
What is the base open weight model you're using? Oh, the base model I'm using is just uh another model that we've been playing around with. Another model that I've been playing around with is our embedding embedding model that we created from scratch.
Thank you for the hundreds, you guys.
Where did you find those Laura papers?
This is an archive. This is archive here. Uh I'll paste over here to our Discord link here on our uh let's see la our link share channel. There we go.
It is right here. And this is just the original Lura. Just the original Laura.
There is modern versions. I believe a new Nvidia one called Dora.
There's a Dora and there's a Qura.
Are you not using something from Hugging Face? We could we could do that. Do you work with AIM ML fields even? I not not as much as I want to. Let's put it that way. I have I have another business. I have a business here called PubNub where we connect users through live interactive connectivity. So like things like multiplayer experiences, on demand delivery, sports and media live updates and gaming, right? So many different things. Also healthcare and tele medicine. So I've got a communication network and we don't do that much AI though. We are getting into it. We're getting into it.
All right. If it's smaller model, the difference you see after performing lower will be quite noticeable, especially if you have good data. Oh, really? Really? So, it works well with large models, but you're saying it works exceptionally well with smaller models.
If money were were not on the table, right? Oh, you're not using something from Hugging Face. I suppose I suppose we could I mean we could we could do something pretty interesting. Something interesting.
All right. I was just going to do this with my my toy model here. My toy toy embedding that we've been making just to make sure that it still works. I should be able to pass this through and it should be zeros all the way through. So I want to be able to activate the Laurel part of it. The Laura. Uh, so Stephen, I have it in Discord to make my show the optimization. Okay, you probably put in general. Whoops. One second. General chat. Here we go. All right. Here, I'm going to copy and paste this into my my uh editor so I can read it a little bit better. Let's copy paste it here. All right. So, here we go. PB paste. We say neurora.py.
Okay. Uh, well, it's not really a Python file. It's kind of a Python file. Okay.
Someone's been programming in a different language.
Okay.
Laura layers. A fite. There we go. Model up. Wait. Oh, wait.
All right. So, you've created a B for in linear for All right. So, model.parameters.
So, wait, there's How is this where you're not freezing?
H how are you freezing the original model, though?
Let me read. Let's read through this.
I'm going to read through this. Okay.
Laura needs a properly large model to be worth it. Otherwise, not worth it. Okay.
Okay. I think the embedding model isn't the correct one because we don't have any linear layers. I I do though. I do.
I do. I have the linear layers only for the training purposes. Obviously, for non-training, for inference, I remove the linear layers cuz we only need the embedding. So, you're absolutely right.
You're absolutely right.
This isn't JavaScript. I know, right?
Lon. Hey, Lun. Good to see you. Happy Friday. We made it to the weekend. It's not real code. Yeah. Under got that.
See, this is the part that I don't understand. This is the actually the very first line here. You're referring A and B to the original model parameters.
You're supposed to create new parameters, right?
Uh Laura weights contains for P and models parameters if not weight contains P continue otherwise existing weight new weight and you have P. Okay.
So, this is much simpler. Where's the matte molt there? Where's the m isn't there? There's something missing here. I feel like there's something missing.
Uh, it works well with lower the total number of parameters are cause when you do Laura, you don't need to tune all of them because you're tuning more percentage of the total weights of the model is small. So you are mutating the original model. Is that actually happening? It's pseudo. Yep. Got ourselves some pseudo code here, Michael. We absolutely do. My agent on the way. Hey Lun, great to hear it.
Looking forward to it. How do you make subtitle real time live? Oh, uh, it's right here. We've got live subtitles right for you here. Live subtitles.
join on our Discord. I will paste a link here again on our Discord link share.
Just click this link and you can add them to your own OBS. So you could put them on Google Meet or Zoom or Microsoft Teams. You could have live subtitles even if you stream on YouTube or any other platform.
The imprint there is no new Matt Molt.
Okay, I I get what you're saying. I don't see how though. He may be using small or tiny whisper model for transcription.
Yes, very small. You have 0% left on CEX. Oh, you ran out. You only had 4%.
You used it all. My agent making agents almost finished. Oh, 5.5 vibe coded.
Nice. That's great. For training, we have A and B. And then we train them alongside the original weights. Sure. We don't do that at inference time at all.
Okay.
Let me let me write the proper full thing. Okay.
Okay. So, we're looking through this.
So, this must be the inference here.
Let's read what you said here. Uh, go back to general.
Somewhat pseudo cody, but close enough.
This new model has the exact same shape but different weights. So it is the same cost.
Okay. Gotcha. Okay. H well a comma b. This is this is interesting.
Okay. Uh got move something coming back.
All right. Sounds good. Sounds good. Lun see you in a bit.
Okay.
Uh, I thought it was easier than I thought it was going to be. Okay. It seems really simple.
Okay.
So, let's find Let's find a model. Let's Here's what we'll do. We'll find a model on Hugging Face and then we will fine-tune it. Let's do that. Let's do that. Hugging face. Let's find a small model. Something easy.
something very easy.
Let's do a very small something. Maybe maybe text generation like an LLM. Let's do like a really small LLM like not deepseek for example. We just got a trillion parameters.
By the way, you made a list of all the quirks in blocks AI skills. Oh, right.
Thank you very much. I I saw I saw.
Appreciate it. We will address that.
Loons, thank you at loons. Thank you.
In my personal experience with VLMs where I have done fine-tuning, they did not improve a lot. I may have been an accurate in the process. That's part of the observation. Data is the most important part. Okay, we got we need Yes, you're right. In AI training, data is critical.
Data is absolutely critical.
Let me see.
Is there is there like a good Let's see. This is like a 1 billion. What is this?
A 1 billion parameter model checkpoint built on the hierarchical reasoning model architecture trained by uh Sapiient Intelligence from scratch.
It's a dual time scale recurrent architecture. Two transformers over the same input of embeddings for so many cycles with additive state injection. This gives effectively unbounded compute depth. Okay, that's nice.
Jimma model be a good one. There is a So the Jimma model doesn't run on my architecture yet. I've got a Mac. I've got a Mac. I tried it. I tried it. We tried to make it work. We tried. I need Nvidia for it to work well.
Smaller, cheap model. Make it more accurate use cases. Uh, did you went live stream every day? Yep. We're doing live stream every day now. You've got it at Muhamad.
Yes, every day. Every day, you guys.
Amzad. Good to see you. Welcome on in.
Happy Friday. We made it to the weekend, you guys. Also, I don't think you can.
It's not open weight in my understanding. Okay, got it. Yeah. Hey, bogateers. Good to see you. Welcome on, Bogoteers.
Here's Stephen. I made a less pseudo code version that helps.
All right, here we go. Let's copy and paste this. Thank you. All right, so this is we want. Let's do this. Uh, set paste. Okay, here we go. All right, let's walk through this laurel layers.
So, we're going to get our original model. We get our original model here.
Model parameters no grad, which freezes freeze original model. All right. And then train Laurel weights model Laurel layers.
This will do an optimization does. So, do we pass through both?
All right. So, what is this train here?
It uh we need that. We need we need help.
Uh what are we doing today? Boogateers.
We are doing low rank adaptation fine-tuning. We are learning it from scratch.
Uh did you review the repo? Oh, I mean uh we don't really have one yet. I mean, we kind of do. It's just not ready. It's not ready for anything yet. So, where are we at here? Yeah. So, this is where I was and I had here's Neode. So, we've got Necode.
We're currently walking through it right now. So, apparently we get we copy the original model's weights.
We just copy them. But where's the rank?
Where's the low rank come in at? I think we're missing R. All right. So questions where is our rank and where is uh and then so how training how does that look I think I understand the training part uh and then where do we store uh what are new weights let's see did you review the repo uh hey Stephen your live streams are inspiration thank Thank you. Hey Mike, that's very nice to say. I appreciate it. That really motivated you. Good to hear it. Software engineering is a lot of fun, right? It's really great. What's the parameter count limit that you can fine-tune on your hardware? Um, I could probably be fine with something like uh 20 billion. I could see, look, I've I've got I've got plenty of memory. Well, sort of. I mean, I only have so much memory. Uh maybe around 20 billion is what we're looking at there. However, I don't want to do that. I think we need to do something that's smaller, like maybe 1 billion.
Train is just fine-tuned using the two layers.
The original thought about or you originally thought about okay, the new A and new B are depending on R.
Yeah. Where is R?
Uh, rank R.
Yeah. Okay. See, I don't I don't see where our rank is cuz A and B need to now need to be reshaped, right? Oh, what about Quinn? 3.5 0.8 billion. Oh, we could do that. Bonzupi, that actually works.
Uh oh. If you can train 20 billion then GPD OSS. Yeah, we could do that as well.
Though I don't want to cuz we only have like another hour here. So, we I want to see some sort of results. And I feel like if we do a smaller model, it'll give us more compute for our time.
So yeah, I think the Quinn 3.5 0.8 would work. You omitted R because the point is to show the optimization.
Got it.
Okay.
Laurel layer items. Uh-huh.
Existing weights.
Where's P?
What's P? Where's P at? Where is P?
So, we've got some some weird things here. Uh, new weight X equals old weights.
Oh, all right. It's starting to click a little bit. So, where's new weights at?
So, this this is what we save is what you're saying.
P is the key for the dictionary.
Yeah, but you don't have it in this scope.
Where did this come from? It's out of scope.
Oh, see there's no P here. Where' P where P at? Where P at? Where P at?
Yes. Okay, you got it. Okay. So, this is the new the new weights that we saved. We saved these guys here.
Old weight. Old weight is going to be the original model.
Okay.
And what is what is P again?
Existing weight. If is this are we literally replacing the last layer in the original model? Is that what this is? Okay. So, are we lit are we re replacing the last layer in the original model?
Uh yes, old way is the original old/ old. All right. So, we've got our original here. And how does that compare to existing weight? So, we got our existing weight. Existing weight a plus b. Oh, I see. I see. Now, that's this is pseudo code here. And then this is our old weight. So, we've got old old and then we've got our new and we're replacing.
Yes, old weight is the original. Yes, we are literally mutating the weights. Not the last but all the laurel layers.
What? Wait, wait. No, no, wait.
I know. I understand.
So, we when we do back propagation, when we do the training, A and B are definitely trained. Uh, and then so we train here. Obviously, we train here.
So, A and B are being mutated here, right? We're mutating A and B here during the training, right? Optimizing.
And then we do a replacement.
We get our lower layers A plus B. And then we just add those modifications to the very last layer. Hey Auto, how's it going there? Automation effect. Good to see you. Welcome on in. Happy Friday.
Sent star for two jewels. Hey, thank you very much. Scribbloop.
Thank you, Scribblenook.
Scribblenook, thank you for the for the star. I appreciate it. Happy Friday. My gratitude to you, Scribblnook.
All right, so I think we I think we understand the final parts of this now.
I think we got it. All right, Neva.
Okay, let's walk through it, everybody.
Let's lock in. Let's lock in. So, the point is we train A and B and then we mutate the final layer for each laurel layer, which is there's only two, right?
Only two. As far as I know, there only two. do the mutation which is here.
Here's the mutation right there. Original model and we are updating something here.
Something here. And when we when we when we mutate the existing weight there are many laurel layers. How many?
Oh, so for every every layer there's an AB for every layer. You can have a a lower layers at the base. You can have more than one. Okay. All right. Okay.
I think it's starting to click now.
Let's draw it together.
Let's draw it together, you guys. Okay.
Two. So, we got original original model.
We're going to do original model here.
So, this is the original model. See, original model. And then we've got let's let's keep it simple. Let's do our then we've got our our laurel here. Laurel A, Laurel B. And we could do another one at this layer here. So, this is our low rank uh A and then we could do low rank B. So low rank beat.
This is the this is like the original weights of LA of of of final layer. Last layer.
Just to keep things simple. L E Y E R.
Okay. Uh you can have as many as you want. Have as many as we want. Nice.
So then we go through this training process where we do feed forward.
So then we're going to have original vector X. So this is X here.
And then we're going to run feed forward on that as well. Number five here. So we've got an X going down there. This is going to be uh la b a x.
This is going to give us the output. And really it's more like ab cuz b is the output.
It's going to match the dimensions of our final layer weight.
And that's going to give us some new output here. LA where are we? Number five.
And then this is going to create something some new final output here.
This is during the training process, right? So this will calculate. This is during training. This is our training process here.
Uh Sergio, I know, right? This is getting crazy. This is getting crazy. Um so this will be our essentially our our Y, right? Uh we we've got two Y's, y1 and y 2. like y1 and plus y2 equals y the new y look at general please. All right, let's take a look. Yes, this is correct. All right, so far so good. So, this is the training process.
XX throwing it through you get and then you do your your loss and then you only update the weights here in your laurels.
And this process here is kind of more like this.
like that down there. And then we do uh let me then finally once we once we do this training we do back propagation only on this right side of the screen and you can have you can have multiple laurel layers if you want. So you can have another set of laurel layers for each of these other layers that we have in the model if you want. Uh in this case we're just doing the last layer just for simplicity.
And then when we're ready, we're going to save the the final the final layer here to mutate the final layer and only the final layer. We we're only going to do one final layer like for example. So the final output we zoom zoom in over here. Okay. Once we're done training, we're going to let's see here. So let me sort of zoom in over here. After training, we're going to go down here and we're going to create a new layer.
It's called original original weights plus ab uh wait. Hold on. Let me go back to this. Where is this at? Here. Oh, a * b. Okay. Yeah.
Plus ab is going to be the new output.
And then this is what we save. This is the new the new in the new in matrix.
Uh is where's bias involved in this? So do how does bias work here?
Now it's kind of important that you're doing this not just for the last layer but for a lot of layers. Okay. So for each layer we have these low rank adaptations for each layer in the model.
Got it? Yes, you got it. All right. Now it makes sense. Now I understand why there's zero overhead because yes, you are technically well the pro there's one piece missing.
There's one p one puzzle piece missing in the story though because the whole point of Laurel I assume you know. Okay, I think I get it. I think I I think I've solved the final piece.
When you're doing low rank adaptation, one of the promises is that you don't have to duplicate a very large model multiple times to get a fine-tuned version of it. What you can do is you can just save the A and B off to the side and then when you load the model, you can take the original model and then just mutate.
So at load time you will mutate with A and B. So you just save A and B which is very small right the low rank right it's Mer matrices right there is overhead during training.
During training yes okay we figured it out you guys. We've got it.
Let's see. Uh namaste Stephen. I hope you're doing good. Hey, thank you so much, Ann Schuman. I'm doing great. Good to have you here. Happy Friday.
What you're talking about is quantized Laurel.
So, how does that work in production?
The trick here is the original W can remain quantized.
Yeah, sounds fair.
What does this mean? R less than less than D.
I don't know. I don't know, Sergio.
So, how does this work in production?
So, my guess is, here's my assumption.
So, you're going to train these Laurel weights over here. You're going to save those into a separate file. And you don't want to mutate the original model uh and then store a whole bunch of copies of like a 500 GB model. So if you're going to create a million copies, you don't want to create a million gigabyte or near nearly 500 gigabytes worth of data. That does not make sense.
So what you would do is you would copy all the Laurel layers, save those, and then when you want to loan a load a fine-tuned version, you load the original model, and you also load the fine-tuned laurel layers. Then you merge and mutate the original model with your fine-tuning here and it generates sort of a new model in your memory. In memory, right?
So you mutate in memory.
Hey that vasper, good to see you.
Welcome on in. Happy Friday. Good to have you here. In production when you deploy, you don't even care it was Laurel. Yeah, you wouldn't. You wouldn't. I understand that part.
There's just one of the promises that Laurel brings with it is that you don't need to duplicate a 500 gigabyte model for every version of the fine-tune that you have. If you so what here I'm thinking of it like a multi-tenant infrastructure. So Nea, if you want to make a million versions of that fine-tuning, you wouldn't want to store 500 GB times a million. You would only want to store the original 500 GB model once and then whenever customers need to make a call to your model with their fine-tuning with their own custom fine-tuning they you would you would add that mutation on in the model at runtime.
Then once you've loaded the mutation you can inference for free. So, that does all things that that we're describing and you get every single version. Hey, Tiny Rick, good to see you. Welcome on in. Happy Friday. That's funny because Delta Force has Tomb Raider Avatar for this season, her bow and everything except for dual pistols. Oh, wait. We need the dual pistols. You can't get rid of those. Those are OG. Yes, you can also just save A and B separately.
Absolutely. Nice. All right. Now, I've got I've got all my hopes and dreams covered. They're all fully covered.
Okay. I guess it means R is much smaller than D. Oh, Sergio. Got it. I see what you're describing now. Yes.
Low rank because it's going to be smaller than the original by a lot.
I guess I need to see what that looks like.
Yeah. So it's see we just saved the laurel layers independently in their own file in their own little tiny teeny tiny file right here. We got a nice little laurel file right here. It's got all of our laurel adaptations right here. We got a whole bunch of them. So let's do this. We got A and B.
We can do copy A and B. So we got one Laurel A and B there. We got another Laurel A and B there. And for each of the layers we've got that.
Then we pass that through feed forward X on one side, X on the other. We get our new output. Then we do back propagation to train our Laurel.
And then after that's training's complete, we just go through and save.
We just save only the Laurel model. Now that's saved to a file. We've got our original model on the left and we got our new Laurel model on the right.
Right? Laura model at runtime we'll load the original weights and whichever fine-tuning instance that we want. We'll mutate that model in memory with the laurel weights and then we proceed forward onward with our inference at that point.
Even R1 oh wow wow equals 1. That is impressive.
I didn't know you could do that. What are we doing today? We're doing some uh like in the paper they used R4. Ah, so you could even do R1. Wow, that is that would be amazing. That's one 256 the size. That would be really small. That would be really small. The vasper. So we are doing some uh low rank.
Let's see. Uh low rank adaptation fine-tuning.
So we just learned how it fully works.
Nea gave us the full walkthrough and now I fully understand it. The part that I wasn't getting was the disconnect between the original model and where the lower model came in at inference when you're running it in production. We connected the dots though. We got them all. We got them all. So if we find a really small model here, let's see. I want to do here. Let's do Jiminy. I want to say uh good hugging face model for uh testing Laurel with uh I want just a good it should be small.
It should be small model.
I want to do this. All right.
Yeah, a large language model. Sure.
Okay. So, we do want that's why it's standard uh one that uses PyTorch.
Uh, one I see model that uses PyTorch.
Also, Llama 3B is pretty decent. That might be nice. I would like that one.
So, we've got the Quinn library native purely on top of PyTorch. Okay. So, I think we're going to use Quinn. I think we're going to use Quinn. There's also a 54 mini highly optimized pietorch and flash attention. Why it's great for Laurel around 3.8 8 billion parameters.
H, you can fine-tune it.
All right.
Okay.
How's it going there, Farhan? Good to see you, Farhan. Fisel, good to see you.
Welcome on in. Happy Friday. We made it to the weekend. We're currently going to fine-tune a large language model. Are you ready? We're going to use low rank adaptation.
Vicki, hey Vicki, good to see you.
Welcome on in. Happy Friday. Good to have you here. Quinn is pretty decent.
You like their work? Nice. Okay. So, there is a a good Quinn. So, let's look for a small Quinn model cuz it's all pie torch, right? So, all pietorrch. Let's go Quinn.
Search for a smaller. See all Quins. All right. I want a very small Quinn, please.
There is a 27 billion or 35 billion.
Smaller. I want smaller, like a around 3 billion. 1 to 3 billion is what I'm looking for.
There's a 9 billion.
That's kind of a lot. I probably could fit Deep Seek for Flash.
Yay. Happy Friday, Vicki.
Whenever you use the deploy, Quinn is a nice experience. Is it nice? Okay. If you want a very small one, there is uh one from Jimma family. See, the problem is though, it's not PyTorch. I need I I want to do PyTorch. It's got to be PyTorch native cuz I'm going to be running through the layers. I think if we can do that, I'm going to try to get the model weights because we're getting safe tensor. So, we should be able to get the weights. So, it should be fine cuz uh Gemma is a different architecture.
Here we go. Here we go. I found it. I found it. Here we go. This is the one right here. Here we go.
All right. So, let's see. Is there a way? So, let's see here. So, it's our message input.
All right. Pre-trained.
Okay. Automodel text. So, this is the model itself. Okay. All right. Perfect.
This is what I want. Load directly. Here we go, you guys. Here we go. We're going to do it. All right. So, let's make a new folder. Um, call this here. Let's do mkder fine tuning Laura CD fine. Okay. PB paste Laura.py.
Okay. CD CD fine. All right, let's do Pi Python Laura. Let's see if that works. It should work.
It's uh partly why you recommend GBT2.
Oh, GBT2 would have been a good one as well.
Thing is maybe tricky. You need to take out the linear layers and actually modify model code.
Uh well here we won't do we'll we'll try just the last layer. What if we do just the last layer?
There isn't much else you can do about it. You know open up Quinn 2.5. Hey the vasper. Yeah. So we're going to do Quinn 3.54 billion. I'm going to try that. I think that might Oh, it's 10 gigabytes. Oh my gosh. It was going to take a minute here, isn't it? It's going to take a little while. Gemini says it's very small. Oh, Jim uh Quinn 2.5. Okay, let's take it as So Quinn, let's see this do Quinn 2.5.
Oh, I got to spell it right. 2.5. Okay.
Uh oh. Oh, here we go. Here we go. It is really small.
Like really, really small. Okay. I like that. I like that. All right.
Here we go. Let's try it. Let's try it.
It is extra extra small and plus it gives us direct access to the model.
Okay.
All right. Um, so I'm going to just close that really quick. All right. We're going to we're going to replace that. We're going to rerun it now. Here we go. Okay. Now, now it should be I need to delete this model. It's pretty big.
Oh, wow. Look at that. Okay. Here we go.
Save tensors. Only 1 GB. That's a lot better.
That baser. Thank you. Thank you. That's what I'm looking for. All right.
The last layer is only probably not good enough move. So, we use it. Reuse it.
Reuse it. If you're going for just the last layer, you can optimize a lot by premputing everything and then it's linear regression. At that point, forget about Laura. Just do a linear regression. Uh, yeah, that makes sense.
That makes sense. I I want to try at least implementing it. We'll see if we can get more than one layer. We might be able to. We might be able to.
Uh, you are a helpful. Okay. I am a system that can help you with various tasks. Available. Available. You are helpful. Available. So, this might not be the best one. However, I'm only going to track the loss. So, it's not going to be a problem. But, all right.
Here we go. So, here's So, we've got our X input. We've got our model inputs, new tokens, max. Okay, so here's our model. We're loading it in. And then we've got Oh, interesting.
To device. What is our device?
I don't know what. Hold on. Let's print the device. All right. Print model.
device.
Oh, it's going to be whatever is on the Okay, got it. So that's just that's just a copy. That's not a problem. Here, let's try it again.
You want at least two layers. So it's actually deep learning. All right, sounds good. What is the objective for fine-tuning it? Do you want it to do? We are just we are just trying fine-tuning.
We we're we don't have an objective.
We're just going to see if we can make it make an impact.
That's it. Probably just next token. No.
Exactly. Right. That's easy. Yep. There we go. That's what we're doing. Wait.
Uh, hold on one second. Close that. Did it print out CPU? It did. Okay. So, we're running CPU here. Good. Good.
Good. Good. Good. Good. Okay.
We should probably What is PyTorch accelerator current accelerator? All right. PieTorch get current accelerator.
There is a Here it is. Yeah, this is what I'm looking for.
current device. Current accelerator.
Let's see what we got here. Torch that current accelerator. Okay, let's see if we can get this to run.
Although it is it's already fast enough, but I don't necessarily see that I need to put on a accelerator pad token ID and end of sentence ID for open-ended generation. Okay. I don't know why it's saying that, but all right. I like fine-tuning system. Yeah, I like the fine tuning stream. Nice, Sergio.
Great to hear it. Great to hear it.
All right. So, here's our uh accelerator.
Print the accelerator.
Okay, here we go. Uh, I'm just going to see what's available. It should say NPS.
I should see NPS join the screen.
Uh oh, import torch.
There we go.
And then rerun that there. Okay.
So, we should define our Laura our Laura model here.
So, let's do this. How do we do this?
All right. So let's do our uh class low ranking adaptation and this will just be torch do is it model or module I always forget see defaf init self and let me do super init like that.
Double check. All right. So, we're going to have our Laura low ranking adaptation.
See if this works. Also, while I'm while I'm constructing this, I'm going to hide these things for now cuz we don't need them yet.
Actually, what I'll do is I I'll do like a fort. I'll we'll just we'll just indent indentate this for now.
C model looks good. Okay. Uh defaf do nothing.
Fast forward. Let's get our accelerator.
Put it up here.
All right.
Okay. Okay. Uh I'm not an expert, but I heard fine-tuning is making a model very good at one specific task and doesn't need a database for it. Yes. Yes, you got it. That's it. the vasper. Oh, wait.
You can monkey patch over original layers to do lore training. That's kind of cool. Like if you override forward.
OH.
OH, YEAH. OKAY.
YEAH, I see what you're describing.
I see what you're describing.
That could be pretty neat.
We would need We'd need the original H.
I think.
All right, we'll keep going. We'll keep going. It's probably going to do something like uh Wikipedia. That's a good idea. That's a good idea.
So, we'll grab a Wikipedia page.
I'm seeing examples on GitHub. Okay.
Nice. Yeah. Share them. I'm I'd be very curious to see I'd be very curious to see. Okay. Let's make sure that we're running torch not module, it's model. Is it model or is it nin domodul?
Double check. Okay, there we go. Good, good, good. Okay, so I need to create self dot uh h let's think this through. I'm going to do I'm going to do one layer first.
I'm going to do one layer to keep it simple and then we could try more than one layer afterward.
I think that I think I think I just want to keep it simple first.
Isn't Wikipedia easy to scrape and fetch? We can just grab a training gauge from there. Yeah, I'm just going to grab one page. I'm just going to grab one page. Hey, Stephen. Good to see you.
Hey, that's great to hear, Muhammad.
Good to have you here. Happy Friday.
We're doing some low rank adaptation finetuning today.
All right. So, let me let me check my model here. All right. So, I want to I want to I want to inspect this right here. I want to inspect this. Let's take a look at what we got going on here. All right. So, let's pull this out.
Then, we're going to say uh print model.parameters.
And we're going to say for uh layer in model.parameter parameters m we go easier just to pick up a hugging face data set and chop it up. Ooh, there you go. Yep. We can just grab a hugging paste data. I'm just going to grab like a simple text document but just it's going to be very short. It's going to be very short. What's the name of the song playing in the background? All right, Perry Math. The question, the answer is it's called Inner Light by Kevin Mleode.
Kevin Mleode. Inner light.
Inner light. You asked at the right time because it's uh just about to finish.
Okay. All right. Got a new Here we go.
Well, let's do a new poll. All right.
Have you done fine tuning before? Have has everyone done fine-tuning here? This is what we're doing right now. Have you ever fine tuned uh tuned an AI model before?
I like the yes no. I like the less. I like it when it's two answers. It's two possible answers.
Okay.
So la print layer.
See if that still works.
Okay. Uh, auto tokenizer doesn't exist.
Oh, it's cuz I commented it out. That's why.
Bonzupi, I've trained many, many, many artificial intelligent models from scratch and only fine-tune a few at once. Oh, look at that. I like that. Oh, I'm happy now. I know what's going on. I get this. So, we got our models parameters right there. Ooh, look at that. Look at that right there. So, what do we do about bias?
Here's my question. What do we do about bias? Do we leave it as is? Do we not touch bias? Neva, do you know?
Thank you for the party, Poppers.
Appreciate it. at Nea. Do uh what do we do about bias vector?
Do we just leave it as is? I think we might just leave it.
Just leave bias as is. Okay. Okay.
All right. So, here's a laurel. And then we're going to do uh do we need self here? I do. We do.
We need self. And I need the model. I want the original model. So, we're going to do a laurel and we're going to get the original model. And then I'm just going to grab the last layer. I wonder if I could do that here. Let's just try it. Let's just grab grab the last layer.
Let's go -1 if it's iterable. Although, I might need to throw it into a list.
See that here. Oh, it's going to give me an error because I'm asking for the model here. One second. Hide that for now. I'm not using it, so that's fine.
Okay. Is making a simple terminal chat not considered as making an AI agent?
Oh, well, it is. You're right. You It is. That is the interface. So, a terminal chat is the interface, right?
So, that's the what do you call it? The scaffolding, the harness.
It's not necessarily the agent itself.
When you're building a harness, you will call out and do specific tasks and do certain things with the AI. That is the agent itself. So you need to have it headless. This is a headless agent. It's called headless agent. No UI.
It's just input output.
The funniest thing you would be uh make is it dumb AI like no large language models, no deep learning, just if statements. That would be that's original AI right there.
You got original AI right there.
Okay. Um, so you use Google API. I like the Google API. Nice. Very nice. Uh, did that work? No. So, we're going to have to list this up. We have to list it.
Got to list it up.
I think that should do the trick right there. Let's find out. Listg one. It's going to give me the last layer. And oh, it's pulling it in. Pulling it in. Oh, well, that did something I didn't want it to do. Whoops.
Okay, that is very odd. It's giving me every single par. It's giving me every parameter independently. Why? That's a little bit too much. I want it to not do that. Hey J bro, how's it going there?
How do you test run code while building?
Oh, you mean like this in the screen?
Like on our screen right here? So on our screen, you see that we've got uh Vim up here in the top side of the screen. In the bottom side of the screen is a a T-Max split that is running uh another TTY there another it's just another another input here. So in our terminal emulator is Ghosty. We've got T-Max and BIM. So T-Max is splitting the PES so we can run the code and read it at the same time.
Uh, you mean uh negative one? Wait. Oh.
Oh. Colon one.
That's probably it. That might be it right there. Maybe. We'll see. We'll see. We'll see. Let's see if that Let's see if that does it. Thank you for asking or thank you for mentioning. I'm not sure based on the output that I saw.
I don't know if that Oh, here we go.
Uh, but it's giving me I just want the final layer, just the final one. And it's giving me all of them.
So, what if we did just H how do we do that? You know what? I suppose we could do do can we negative 1 one? Oh, let's try that. Let's try that1 comma 1. Bon Michael, thank you. 1 comma 1.
See if that does the trick. Oh, wait.
I'm an idiot. Oh, no. It You're You're good. You're good. I mean, I didn't I I didn't know either. Got to try it out.
Uh list indices. Oh, let's do not a list here. Let's try here. Let's try that cuz it's a it's a tensor. It's a tensor.
Although that is a an iterator. So, if it's an iterator, that means Yeah. So, it's a right. It's a generator, right?
Generators are not subscriptable.
So, what if we enumerate it? No, cuz that's still a generator.
It's a list. Uh, just a list.
You want negative one? I do. Exactly.
That's what I want. Put a colon after.
Okay.
Uh, list.
All right. Try that.
Okay. Uh negative 1 colon. Not that way.
Oh, a different way. Oh, actually that's exactly what I wanted. That's it right there. You did it, Michael. That's it.
Use yield. Yay. We could use yield. I know, right? Length minus one. All right. No, you got it. Yes, we got it.
We got it. All right. Hey. All right.
Good to see it. Good to see it. So this is the last layer, right? So literally this is just the last layer. So we could we could add this logic here into our model. So we get the last layer. All right. So see uh get the last uh last W.
Call it last. I'm just going to say layer layer on the original on the original model.
Thanks, Nea. Yeah, thank you guys. This is really useful. This is really helpful. Okay, so now we've got the last layer. Now we need to lura ourselves around it. We got to Laura around it.
All right, so we need uh I'm going to say self dot self.last layer. We'll call this Laura a Laura in.
I I like calling it Laura in going to be parameters torch.n nn parameter. We're going to say torch rand.
We need to understand what the input for the model is going to be though. So that's part of the challenge. It's going to be something.
So we've got our tokenizer here and I need to see what this is going to look like.
New m uh max new tokens 40 generate.
Okay. So maximum number of output tokens that's just going to run it through. So that's not going to be a forward that's a generate.
Okay.
Input Stephen just pick up the shape from the parameter of the layer. Oh, thank you.
Thank you. Hey, thank you. All right.
Self.last layer.shake.
There you go. right there. Uh, we've got it. Okay, so let's do uh shape equals shape. Okay, torch.rand our shape zero, comma, and then we need a rank. Rank equals 4 rank. Okay, we got a parameter. Then we'll do a Laura out which will be zeros.
Um this will be rank shape one right is what we want. Shape one here.
There we go.
One.
Perfect. Okay.
Okay. So we got Laura in and Laura out there.
Uh weights. Oh, last layer. Okay. Thank you. W E I G HT. Well, are you sure? Uh cuz this is a t this is a tensor. So it should it should have the shape here, right? I believe. Do we need the weight?
Also want to assert the parameter is 2D.
Oh, well it is 1D.
No, it's two. Is it 2D? Yeah. No, this looks like it's 1D.
This is one dimension.
This must be the bias.
Oh, no. You're right. Oh, it's the layer. It's the parameter. Yeah, he's good. Hey, Lions for Lands. Good to see you. Welcome on in. Happy Friday.
All right. So, I'm seeing one dimension here, which makes me think this is the bias. So what if we did like -2 which would be maybe the the actual weights because you want a monkey patch. No.
Well, uh I guess if you want to mutate the model directly. So what I want to do is I don't want to directly mutate the model.
I only want to mutate it at runtime.
So, part of the idea of the fine-tuning, the objective is that I want to have a million variants of the fine-tuning, and I only want to save just the little smidgen, the little smidgens, the teeny tiny laurels.
Mostly avoid the activation. Oh, right, right, right. Yeah, yeah, we're not doing any activation here.
Uh, I definitely I would I'm curious about the monkey patching.
Would be interested to do an architecture day. I would love to present on your channel. Hey, MTW, that sounds pretty interesting. That would be a lot of fun. Send uh send your details on Discord. Send details on Discord.
I would iterate over layers. Yes. I mean, yeah, I Yeah.
Oh, instead of parameters.
Oh, okay.
All right. One second. One second. One second. So, not parameters, layers. is what you're saying.
L E Y E R S.
Okay, let's try this.
See if that works. Is that a thing or modules?
All right, so we don't have layers. You say mod moduls like that.
I know parameters from the autograd tutorial that we did many many weeks ago. Oh, hey, here we go. Hey. All right. All right.
So, this is the outro layer. Features in, features out. Whoa. Wait, wait, wait. That seems like uh a little bit of much there. Isn't that 151,000 output features?
Seriously.
It is modules. Yeah, a bit gray. Wait, what is a big gray?
It's big because your token size is big.
Cuz there's so many cuz there's 151,000 tokens.
That's what it comes down to. All right.
It's a big one. That is a big one.
0.5 50%. What are we halfing up there, Michael? Oh, it's a lot. It's a lot of features.
All right. So, that makes sense. Here's what I want to do is also I'm going to go back to parameters really quick and I'm going to say -2. Just curious about that.
I thought it was.5 billion. Oh, yeah. Yeah. In terms of the total number of parameters, right, right, right, right, right. So, that is also a a vector. Uh, let's see here. Interesting.
Let's go three. See what's at number three. Because hidden dimensions are larger. Yeah, hidden dimensions is where all the memory is stored. All right, let's go back to modules. Modu modules.
All right, let's try that.
So, that gets us the layer.
See, this is uh Wait, did I type that wrong? Um, what are you what are your inputs? Uh, the inputs are going to be here, right?
right there on the screen. Uh, modulless.
Oh, I just spelled it wrong.
Module. There we go. Okay.
Inputs are going to be just a bunch of tokens here. And then we're going to train the model on the output to fine-tune it here. All right. Let's see here.
All right. So you have four inputs and you got that many outputs.
Yeah, kind of. Let me think. So it's essentially the number of outputs are going to be here. This is why. So 151,000 outputs, right? That's the number of features out. Wait, features out. And technically that's the labels in features. Okay, that's just the output. All right, so the point then lines or lamps, this is going to be fine tuning. So we'll be able to train the model to learn new data that it's never seen before and it'll be better at replicating whatever we're trying to train it. And we're just going to train it like on a Wikipedia article.
Your way with layers was cleaner. Like if they have layers, use that. It's so nice. Some people feel like they are too cool for school. And you have the force pietorch to let you get the thing. All right, it's trying it. I want to know how much letters is there in one token.
Oh, you know, I don't know.
I don't know. I I don't know what their tokenizer is here.
So, our tokenizer is whatever whatever their tokenizer is. I don't know specifically. I don't know how it splits it. I don't know how it normalizes it. I don't know if it's bite pair encoding. I don't know the specifics. The only thing that I care about is I'm going to wrap the layer and I'm going to do matrix multiplications and then I'm going to add the output and then get a loss.
So I'm going to assume the model here.
So you said so Nebby you're thinking that uh was layers or parameters better?
I think it's bite pair encoding. I don't think I seen anything but BP. Okay, so it's bite pair encoding which means it's a bunch of word segments like a bunch of word segments. For example, it could be like the word um parameter would be like par and meter or me and tur. So it could be it could be like split into three tokens.
You're with layers. Okay, layers. Okay, got it. Layers.
Uh well, you mean parameters?
The only thing is um I guess it's just the bias on the output. So I'm not getting a multi-dimensional matrix is the thing. I'm only getting a 1D matrix when I do that, which is not what I was expecting.
Uh syllable splitting. Yeah, exactly, Michael. Mhm. You got it. That's exactly what it is. If they were too cool for school, we would be four out without the modules. All right, we could without the It would be great if we could. Patience is godly. Let's see.
Yep. Parameters for sure is wrong. Okay.
So, modules then.
So, do modules and I I get the same output. I get like the same thing.
No, no, I don't. I get something completely different. No bias. Okay.
This is it right here. Perfect. Okay.
All right. So, print layer.shape uh shape. All right. Let's try that.
Okay. All right. All right. I think I think this is our answer here. All right. So, we are going to just grab the first one. There's no shape. What is there a size?
Linear object has no attribute shape.
Why not? Modules gives you layers while parameters gives you the weights.
Layers is a lot better. It's a lot better. All right. I like that better.
Nice. Okay. So, can we We can also say size, right?
Size. This should be a function. Oh, wait. Got it. Okay. w ei g ht w we e i g ht.
Okay. And let me do uh shape.
All right. Let's try that. Let's try that. Okay.
That has the shape. It's got the shape.
Thank you, Neva. Hey, there it is. Look at that right there. All right, we got it. We got our answer. All right, so it's going to be out and in, I believe.
Right. So our output needs to be the out here. So let's see. Um okay.
So the shape needs to be inverted. So this needs to be no no. Uh this needs to be one. This needs to be zero.
This will be modules.
Yep. Negative one. Do we list it? Yeah, we list it up. Okay. Get our modules.
Now my question is can we also you can uh also monkey patch over the forward method if you want which is fun.
So monkey patching is that I mean there's there's how do how do how do we go about that?
Uh so I am going to have a deaf forward here right self comma x we are going to have a forward and then how okay so I do want okay I do want every single layer I I see I see where our problem is now I for every layer I do want.
Hey, will you post this? Absolutely.
Michael, like the video? Yes, absolutely.
Basically, do layer.forward equals new fancy closure.
Layer.forward.
So, the forward of the layer and then we reassign it. So, we're overwriting.
Is that what monkey patching is? We're just overwriting the method.
Thanks for doing this. Yeah, you're absolutely welcome. Yep. Here, I'll we'll we'll do this real quick. So, let's see here. Uh get status, get add.
Uh get in it. We go. Get add get status.
All right. So, we've got a new Laurel.
And then we're going to say get commit.
Uh started low ranking adaptation fine tuning.
Tuning. There we go. And then we're going to create a new repo.
Monkey patching is overriding stuff. You kind of shouldn't this way. Monkey patching is overriding stuff. And you kind of shouldn't do it.
I got it. I got it. I got it. That makes sense. Um, so I'm probably I'm going to see if I can do it without monkey patching. Do I need to though?
Because I kind of do. I need to get the forward from each of the layers.
Right. And then we get the final output.
The only thing is what about we don't need to soft do we need to softmax it still for the training part because we need to softmax for all the tokens.
Laurel is so elegant. Is it? It's pretty neat. It's pretty neat. I like it. I really like it. I think for Laura monkey patching is the best solution. Okay. So you you're saying that we should override all the layer forwards. Okay.
So we really should be iterating through every single layer. The only thing is the input needs when we're capturing the input, we need to capture the input from the previous layer, which makes it a little tricky here. Right? So we've got some trickiness up ahead. If we're trying to do just the last layer, you can save the old forward somewhere if you want the original back later. Oh, right. Yeah, that makes sense. Yeah. So, old forward forward equals uh let's see, you called it uh self.last layer, right? Uh dot forward. Got the old forward and we got the new forward.
New forward equals some something. I don't know. We could we defaf new forward, right? And then we and then we say do something and then we say old forward, original forward, right?
Like that. And we just uh override it with the new forward. New forward. Just like that.
Uh you just overrode forward and it calls your new method with the correct inputs. That's why I like monkey patching. It reduces the chance for bugs.
All right. I see what you're doing there.
One second, you guys.
All right. Where is this? This This music is is nice. However, where is it? It's called Chance. Chance.
Here we go.
Okay.
Uh, getting up to speed on some. Sorry, I just changed up the music there for a second. Uh, you just uh la That's why monkey patching. Hey, code length and cinderby. Good to see you. Welcome on in. Happy Friday. We're doing some low ranking adaptation finetuning. How's it going there? What happened to your hoodie strings? Oh, yeah. I got rid of them. Hey, you noticed. You noticed.
They always got in the way, so I just pulled them out. I didn't think for a year I had my hoodie. I never used the strings. And I'm like, the things get in the way. I just got rid of them. I just got rid of them. You noticed. That's awesome.
Okay.
Scandalous. I know, right? All right. So, this is the basic the basic part of it. All right. So, we got our basic part here. Looks better. Yeah, it looks better. It's easier to deal with and I don't have to like organize them.
I don't have to organize them anymore.
All right. For those of you who are here, we're doing fine tuning with low ranking adaptation.
We're using a Quinn 3 or Quinn 2.5 model because it's tiny, which works out really nicely for, you know, this situation where we're just trying to build up our own low ranking adaptation algorithm based on our help from Nea in chat and the archive paper that describes the process of doing this for each layer. For each layer, currently we're doing the last layer. I'm going to try it with the last layer. I'm going to see if I can get it to work with the last layer. And then let's see we need the original weights.
So we got our last layer here. Get we can do a forward pass on that.
So I need the inputs from the pre when we do the forward here. Perfect. Okay.
So this is our new forward.
Can you explain the monkey patching?
Well, apparently it's just overriding functions. I've heard about monkey patching very often in Python because it's, you know, you can monkey patch a lot of things. You can import a library that overwrites some of the capabilities of another library. With monkey patching, what we're doing is we are just overwriting the layers forward pass, which is just another method built into the class. Right? So, this is monkey patching right here. You're looking at it right there. Essentially, we're just overwriting the method. We're overwriting the method with our new method here. That's it. It's really fun.
in Python. Yeah, it sure is.
Polymorphism. Hey, Maren, thank you for for your chat. A little bit of polymorphism. In this case, it is uh is it polymorphism? Is it is this what polymorphism is?
Overriding methods from a class.
You want to use really old ThinkPad as a nostalgic Linux tank. Hey, Peter Parkour. It is. That's what that is. No, it's not polymorphism per se. See, that's what I thought cuz I remember learning about polymorphism like 25 years ago. The new method is just assigning the last layer. Yes, that's what we're doing. That's what we're doing here. Yeah. So, we got the last layers forward method right here. The last layers forward method, which makes it really easy because now we can get the input. We get the inputs here. Although I'm wondering, does it need to be self? I think it needs to be self, right? So self, comma, it's a new self. It's a new self. It's called model self, right? So we have to do something like that.
Uh what would it be if we didn't do that?
You would need to build the entire model, I think. So I the the way we're currently doing it is odd and it's not fully complete. I'm just trying to make it work. Hey Lun, you're back. Welcome on back. Good to have back. If we didn't do it this way, what would would we do it differently?
There is a different way to do it. Yes, we would define a forward pass here, right? Which we need to do.
Let me think about how to do that. Oh, we don't need to do it. We don't need to do that.
So, we're in injecting the model in and then we need to we do need to do a forward.
So, we're going to have a layer forward.
If we have all the existing layers, I'm just thinking it through. I'm thinking through.
You're biting your lip, are you? Wait, feel free. Feel free to say anything, but following the path if you want to take a look. Oh, Michael, what' you say?
I'm putting something in general.
So, for the stream. Hey, Laurel Pietorch.
Laura Pietorch.
Yeah, we're we're we're doing our own.
Oh, that's exactly what we're doing right now. Hey, from module. We're we're we're going to do from module the model print. We get the whole the whole picture range Laura model X new model dot merge.
So, this will merge it and give you the full the full model. Okay.
Nice. So then you can do your back propagation and training as usual here.
Hey, we figured it out. I understand it.
I understand it. Now we just got to build it. We just got to build it now.
Oh, that's from your minus above. Oh, sorry about that. Okay. Uh, let's see here. Expand. Okay. Lower linear. Oh, hey. Okay. Super in it. In features out features.
That makes sense. Okay. So, that defines that there. This is This will work if you're Then you've got an alpha which is going to be used to modify the um it's it's a scale. It's a scale. So you take the rank and you take the alpha and it's going to scale it up so that when when you're doing back propagation because it's so small, it's very tiny, right? It's very tiny cuz it's zero to start. So you need it to learn something and you need to scale it up.
You just took like 6 months and two support ticket to finally get your stickers. Oh, wait. What stickers did did you see the donated to Mozilla? You did. Wait, did you see you you donated to Mozilla and it took 6 months to get your stickers?
That's very awesome. Share your stickers on Discord. We are happy we are prior to the monkey patching in terms of efficiency. Oh, where are we happy where we were prior to the monkey patching in terms of efficiency? Oh, I it efficiency is going to be the same either way. I don't think it's going to be a matter.
You can monkey patch or not. And monkey patching is actually going to be really nice here in this case because if we're going in and do the last layer, right?
Weight parameter.
Yep. Empty.
Uh, and then you've got zeros. Yep.
Exactly. Wait, there's a bias here.
Interesting.
I'm not doing bias in this case.
I suppose you could AB self a right empty zeros.
Wait, where's the bias being used? Self bias parameter. Y is that being used somewhere here?
Frozen base. So you've got your base parameters. Okay. So this only works for one layer.
This works for one layer, which is what we're doing right now.
Merge weights. The original weights A* B time scaling. See, I told you there was scaling. I told you it would have to have scaling.
Uh, you put a photo in of the donator.
Oh, and already on general. Oh, I got to check it out. Okay, I have to check it out. Forget the code. Uh, how's it going there, Eminem? Good to see you. Welcome on in. Happy Friday. We made it to the weekend. Uh, forget the code. Show us your skincare routine. Okay.
Yeah. Uh, you just wash your face.
That's all you got to do. Wash your face. Wash it every single day. You look even tighter, bro. All right, Eminem.
Thank you. Appreciate it.
If anything, monkey patching would be slower. Like, you're using Python. You don't care about CPU. Yeah, exactly.
Yeah, exactly.
Uh, do you overlap point? Okay.
Put a photo. All right, let's check out real quick. You said you put the photo in here. Where's it at? Where's it at?
Oh, here. Very nice.
Mozzilla Foundation donor 2025.
Nice. Very nice.
Prrenice, welcome on in. Good to have you here. Happy Friday. Okay, my camera kind of bad. Sorry. Hey, no worries. We could kind of see it. We could kind of see it.
Okay, let's keep on going with our laurel. And you know what? We'll probably want to continue this tomorrow.
We're not done yet. We're gonna we're going to spend a little bit more time on our low ranking adapt adaptation fine-tuning.
And the way I'm kind of going through this right now, I'm kind of thinking get our module here. All right. So, I think this should be fine.
Do we want to I think it's it's important that we that we do do this. The only thing is the problem here is where's the activation going to occur? What's going to happen with the activation? I suppose it does it matter? Will the activation still occur if we if we do monkey patching like this?
And if we don't otherwise, how are we going to get every single layer to work properly if we scale it up to all the layers? You put a sticker.
Okay, let's see.
Hey, make good tech again. Very nice, Mozilla. You got the sticker.
You may want to treat the original linear layer as a buffer and not as a parameter. So, it won't be turned on considered for gradients.
Oh, right, right, right, right, right.
I think steps and in terms of time sunk in the activation layers still do the forward.
The activation layers still do the forward if you monkey patch. Yes, you do have to activate. You do have activations for the layer itself.
So if I'm only capturing the layer, right? just the layer itself, the mod, the like the linear layer, the forward pass that's defined in the model itself will still capture, right? You'll still get a capturing of the activations.
Yeah, you're capturing the linear layer that didn't anyway do any activations.
So, the original code did. So, the original code will still activate or it won't anymore.
Also, if there's compet here, activation linear forward X. So, the original code did, right? So, if we're doing just if we're doing monkey patching on the linear, the original code still does activation. Nice. Okay, that's what I thought. Okay, so we can do this.
Uh, let's see here. We could put this in a for loop if we wanted to. If we're only doing the last layer and then we do the forward, then that means that we can capture the forward on the input and the output and then we just run forward on the model.
Will this monkey patch it?
Let's try it. Let's try it.
So, we get the model. We're going to monkey patch it. We're going to get the last layer. We don't really care about the old forward.
We're going to do uh let's see.
Last layer dot uh forward.
Hold on a second.
Hold on a second.
Hey, you're doing Hey, you're doing Stephen. How are we doing? Oh, we're doing good, Silky. Good to see you.
Happy Friday. We are currently doing low rank adaptation fine-tuning, which is actually pretty straightforward and simple, though implementing it. We're working through it right now. That's the part we're working with. That's the major point of monkey patching it does doesn't break any other parts. Nice. All right. No, but you've convinced me.
You've convinced me. This is the way that we're going to do it. This is this will be the way to forward.
Forward.
Okay. So, we got our parameters. I'm going to move this to the just a little bit lower. There we go.
Okay.
You All right. Then I want to delete that.
And we say selfrank.
We'll do a rank. And then self.alpha equals some sort of scaling.
I mean you can do either add train the last layer or do like a few epochs on your data. Don't know what the lowle fine tuning means though. It just it means low rank, right? Low rank which just means smaller. That's all it means.
You know, if you thought all these fancy AI terms, they all mean very simple things with fancy. They got to have fancy terms.
It's not really a last layer thing.
Yeah, exactly. It's a it's an every layer thing. I'm just doing the last layer just for simplicity right now and then we'll enumerate all the layers which we do we'll do here right we'll enumerate all the layers low rank isn't that fancy it's first one one year of college yeah all a lot of these things I'm like very often like well that seems like it's too simple for the terminology that has been granted it has been designated terminology that is very fancy and yet. I feel like it didn't quite deserve it. Peter Parkour, you're Are you learning? We are learning. You are learning together.
We're learning together. This is great.
It's like rank is linear algebra. Yeah, it's very simple. Very simple. All right. So, rank. Rank. We got our in and out. This is just the first one. This is the first layer.
Uh let's see. And then we got our shape, which is going to grab us the details there. Then I'm going to override and monkey patch the forward.
And then our forward will need to take the model weights and do a dotproduct. Right? So then we just do our x at uh self dot let's see last layerweights dot last layer.we we ei can we do that that's the output we just say return so this is just a pass through right return I think that's it right there simple pass through uh map is isometric by definition if we choose the fanciest terms just scare people up from us you got to choose the fanciest wait there's there are other terms like objective Homorphic map is an isomorphic by definition. We just choose the fanciest terms just to scare people.
That's what I thought. That's the whole point. You scare everyone away.
You make it sound prestigious.
And an echelon, unreachable, unobtainable.
That level of knowledge is so fancy.
Don't ever expect that you can make it.
Isn't that crazy? It's this AI, you guys. All those fancy math terminologies mean simple things.
Sounds fancy. Really isn't that fancy, right? I mean, I know matricy rank is haven't done whatever Steven is doing here. Ah, so yes, we're it it see it seems fancy. It's it's really not. Uh where this is just going to be used for the low ranking the sh the short smaller matricy, right? Small tiny teeny tiny.
And then the alpha is going to be used to scale it back up on the other side when we do a forward pass.
That's it. So this is just a pass through.
Simple pass through here.
Nea is the is that returning correct? Uh I think this is just a pass through.
Right. This is not yet. We're we're keeping it simple. We're keeping it simple right now.
Uh I don't know if we need this anymore.
So, I'm just going to get rid of that.
And this isn't a module.
I think it'd be fine if it's a module.
Do we need it to be?
I don't think so. In fact, I don't even need to be a class.
I don't even think this needs to be a class. It could just be a laurel.
Lot ra.
And then we just monkey patch.
There we go. So now we can just monkey patch it.
Hey MD, good to see you. Welcome on in.
Happy Friday. You saw Greg Hartman talking about Rust and Linux kernel. Oo, hey, there we go. I think you can just search it on the internet right now. If you want to really really want to know, I kind of curious about it. I'll have to watch it for lunch. I'll have to watch it for lunch.
Stack Overflow. Did have you guys been over to Stack Overflow recently? I bet you haven't. You guys want to go see what's going on over at Stack Overflow?
Stack Overflow.com.
Hey, Steven Blum. What did you learn today? I'm sorry, Stack Overflow. I'm one of the 100 people that visited you today. I feel really sad. What do you got going on over here?
What do we got going on? We're visiting Stack Overflow. Check this out, you guys.
one variable programming. Okay.
Stack overflow. Uh, you could just make it functions. Python doesn't care.
Making it functions. Can we run it?
Yeah, we can. Here we go. I'll run it. I don't know if it's going to work. See, where are we at here? L Python Laura.
Here we go.
And it shouldn't crash maybe. Or no, it's going to give us an output. There we go. Okay, cool. All right. So, let's do this. Let's do it. So we are going to run our laurel Laura on our model. Should it return the model? We don't need self anymore. So we got to get rid of self and super self.
Self self. We don't need self here. No more selfing. Last layer.
We go. Okay.
Uh la pass through.
Does everything else look good? Nope.
No. No. No. Nope. Got to fix that. Okay.
So, this will just modify monkey patch.
It's going to monkey patch the model.
Okay. And then we're going to run it.
We'll run it for you. All right. We'll run it. Lions for Lambs. We'll run it.
It's not going to do We We haven't done the finetuning part of it yet. This is just setting up for it. We might do that tomorrow. We might do that tomorrow.
Let's just see if we can get through part of this last layer. Shape rank alpha laurel in.
Looks good. Death forward. All right.
So, we got our forward pass here and we're going to overwrite it. Looks good.
Accelerator tokenizer model. Then we're going to return the model.
Return model.
Right. Model. Modules layer. Okay.
I think we need to do that last layer of the model.
Okay, let's see if that works.
We should also print it worked to confirm to confirm it, right? We got to confirm that it worked.
All right, new and bad at GitHub. Oh, hey, that is why empty get rebuilt. All right, sounds good. Have that tutorial on how to add Git to my projects. Really easy. It's really easy.
You can make code more readable by leaving spaces between functions.
Yes, that's what I do. I kind of do that mostly.
Let's see here. We don't need parameters print anymore cuz we already got what we wanted.
Okay, let's double check that. Okay. Make sure that we're still working here.
And then we're going to say our model is going to be monkey patched with model.
Pretty sure that should do it.
And then we should see we didn't do any forward path uh forward path yet, but let's at least try to run it first real quick.
AI discovering kernel bugs.
very frequently right now. Oh, really?
Really? List object has no attribute.
Wait, what?
Oh, okay. One second. Where are we at here? We need this.
Here we go. That's right here. We got it right there. It's the same thing. It's the same thing.
Uh oh. H.
Maybe do zero.
I'm not sure. Hold on. Wait, wait, wait, wait.
I I I've got an idea. I've got an idea.
Here we go. Yeah. So, we'll just we'll paste this in. And then we're going to set the last layer to the layer like that. And then we are going to say uh break get the get the last layer. All right.
This is a temporary very temporary temporary. Get last layer.
Can we print out the X times last layer weight? Yes. All right. That that that'll be as far as we go today. That That's a good idea. We'll do it. We'll do it. X at last.
Let's see. Forward uh out equals We return out.
We say print out.
There we go. Actually, let's do this so we can so we see it a little bit better.
A little bit easier to see.
Okay. All right. There we are printing it. We are printing it. Now, that's just the forward. So, let's see here if the rest. So, I'm going to I'm going to hide these for now.
No, no, no, no, no. Here we go.
Uh, do you guys at PubNub ever monkey patch or is it generally a big no no?
Uh, we don't we don't monkey patch.
Nope. In fact, when we've when we had Python, which we've replaced with Rust because we've gotten so big, our data our data files just too much. It was too much. So, we had to we had to upgrade to Rust.
However, when we had Python, we might import some we used to import with one of our code bases a module that did monkey patch some of our code. However, that wasn't our joint. We didn't do that. So, yes, we've done it before. Do we do it today? No, we don't. We don't do it today.
Okay.
Is that going to solve the issue? What you think with Rust? I think so. MD, I think Rust is going to help. I do. I think Rust will help.
Okay, let's see if we can at least get this to run.
Okay, so far so good.
Linear object has no attribute weight.
Did you mean weight? Yes, I I typed it.
I typed it. Okay.
LA.
There we go.
Okay. Okay, we're making our way, you guys. We're monkey patching our way. Can we print out the X at last? Wait. Yes, that's what we're doing right now.
With monkey patching, you can just do the for loop on every linear layer and it would work exact same way. Exactly.
Yes.
Speed and Rust a lot bigger than Python.
That's why we did it. That's why we did it. You want to try Rust? But people make something called a biocheer seem like a monster. It's just It's actually fine. The only thing that I don't like about Rust is having to manually manipulate lifetimes.
And any of the complexities that you have along with generics, I can do them. I can deal with them. Do I want to? No, I don't want to. However, I can eventually get used to it. I can Hey, there it goes. All right, it didn't crash. Perfect. Okay, so let's do let's do a forward pass now.
Let's see if our monkey patching took.
All right, let's try it, you guys. Let's try it. Let's see if we did it. Let's see if it works.
Thank for the party poppers. Appreciate it. Uh-oh, we got a problem here.
Missing one required positional argument x forward. missing one missing one required positional argument.
So we need to define it.
Uh it's right here maybe maybe it's just X. Well see the thing is with the forward path in a class it's going to require self to be defined right.
Oh wait hey it worked.
Or did it? Matt one and Matt 2 shapes cannot be multiplied. Oh, good. Okay.
At X. Let's try that.
Have you ever tried Zig, Stephen? Hey, Bon Zubie. I have not tried Zigg. I think I kind of would like to, though.
Why is that not good? That is so close to exactly what it should be.
Let's see here you guys. Uh, and 10 million of those tokens were cashed.
Wait, look at the previous message for context. Okay, loons on codeex. You raised around 11 million tokens for the free tier. It's great.
You love wasting big AI corporations money. All of them except for Anthropic.
That's a lot of tokens for free. That really is. That's a lot of free tokens.
But it's because I made everything be identity based. So everything is copy and I don't care. That's okay. Exactly.
Clone it all. Clone. Clone everywhere.
It's not memory safe, but it also doesn't have any hidden control flow or undefined behavior like C. Unsafe in Rust would though. It sure would.
Centers of coding in a nutshell. Why is this not good? It's so close to what I should look like. I know, right?
Boogateers. Exactly.
Yeah, it's very sea like. All right. So, what I want to do before we wrap up here is let's get this very final bit here working. We need this last part working here.
Uh, for whatever reason, that shape isn't working for me. And I don't know why it should now, but we need some help here. I know, right? Lines for lamps. Uh, almost. I mean, you can see right here. This is very close to what we want it to be. Do we need to transpose on one of these guys?
T. Am I doing this right? I don't Maybe we do transpose.
I think we actually I think we do. Okay.
Hey, look at that. It worked. We did it.
We passed it through. Okay, now we can get our laurel training going. Look at that. We did it. Hey, it worked. All right, that's what I'm looking for. See right there? No errors. No errors right there. See, you can see it right there on screen. Woo! We did it. We did it.
Lines for Lambs.
Zigg Devs said no hidden control flow.
It's got to be all on the surface. No escape. It's got to be verbatim. Let's train it. I know, right? Let's chain it.
I really want to. I really want to. I do got to get I I got to get going though because I've got other work to do for Friday. We will be back tomorrow, though. We'll be back tomorrow and we'll be doing some low ranking adaptation finetuning on the Quinn 1.5 uh 2.5 model. Quinn 2.5 model right here. Quinn 2.5 model right there. We're going to do it tomorrow. Estelle, you how much time of learning and practice I need to make what we're making now?
Well, it took it did take me quite a while to get on board with the AI stuff here and understanding the pieces. My recommendation is building your own neural net from scratch to sort of understand the initial basics. That's the good stuff. All right, guys. Good stuff. Good never. compared to Rust.
Yeah, it compared to Rust, Zig mostly just doesn't have macros and no destructors.
Oh, we did it. This so makes me happy. All right, we'll be able to finish our laurel finetuning. We will the low ranking adaptation.
Just look at Zigdev word for uh for at face value. You just took Zig, right? I know, right? We just assume.
It's easy to assume. That's the way Zigg was.
We just assume.
Thank you very much for the stream. Very informative. Yeah, Bogoteers. Thank you.
Uh it was really great to learn this Laura fine-tuning today and I was surprised at how straightforward and simple it was. Now that we've got the basics and understanding, we've got a single layer, the last layer. Obviously with Laura fine-tuning, we need to do all the layers and we will. We'll grab every single layer. For now, we're just doing the first one. And this is just what it looks like. And essentially, we're monkey patching the original model. This is going to allow us to leverage the same inputs and outputs for this layer. And then we are going to train this. And then we're going to snake out the output. We'll sneak out the output into a separate a separate model.
And that's the plan. That's what we're going to do tomorrow. All right, you guys. Thank you so much for joining today. Had a lot of fun. Let me commit this code really quick. All right, so get status get add get commit with with two M's. Thank you for the party poppers. Uh we have the pass through working. Uh need to get uh ABx next. All right, that's the next that's the plan for tomorrow.
Get push origin main.
Origin main. There we go.
All right, you guys. Uh, basically what DC did to anthropic and small, right?
Yeah, they did some distillation. They did some distillation.
All right. So, I'm going to paste the repository. Where did I put it here? Oh, wait. We didn't make one. Get status. We didn't make one. Hold on.
Uh let me make a git repository and then I'll paste it on our discord. So if you guys are interested in following along you can. New repository.
Here we go. All right. So this is uh low ranking fine tuning.
Here we go. All right. uh using the Quinn 2.5 model to finetune using low rank adaptation fine. Yeah, adaptation. Perfect. All right. Public create.
Then I'm going to copy and paste this into the terminal. And it is live. There you guys go. Here's the Laurel.
I'm going to paste it into Discord under link share and we're there. We're good to go. All right. Bye, Stephen. Hey Lun, Neva, Bonsupi, Bogoteers, that Vasper, Estelle, Estella, and Lions for Lambs and all the viewers. Thank you so much for joining.
We're headed out now, you guys. Good to have you here. Thank you. We're wrapping up. Runescape. Ooh, you want OG Runescape? Hey, bye everybody. You should remove the blocks AI text. Oh, right, right, right. because uh Yep, exactly. Thank you everybody. Good to have you gears here. We'll be back tomorrow about the same time tomorrow and we'll continue with our low ranking adaptation finetuning. Bye everybody.
Thank you for joining.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











