Blum’s surgical deconstruction of the Transformer architecture replaces black-box mysticism with rigorous engineering logic, making it an essential masterclass for those who value implementation over abstraction. It is a rare, no-nonsense guide that bridges the gap between high-level theory and the practical reality of machine learning.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
PyTorch Transformer: Part 2Added:
Hello everyone. Welcome on in. Happy Tuesday. We're going to be building more Transformers today. Uh what you do with chat GPT, we're going to be building it.
That's what we're doing right now. We're going to build it.
Uh we got pretty far yesterday and we have a lot of things that we need to do in order to make it work today. We got pretty far. Let's see. Where are we right now? Let me see here. All right.
Let me get some things updated.
I want to go to general channel, paste in the link to the discord. Oh, hey.
How's it going there, Omar? Good first like. Thank you. Appreciate it. Good to have you here. Happy Tuesday. Hope you're doing good over there. I'm just going to do a quick little link copy share to our Discord real quick. Uh, just got started at everyone.
There we go. All right.
Now, close that. Hello world. How are you all? Yep. Hello world. Exactly.
All right, let me get my Windows all set up perfectly good there. Hey, Niswah, good to see you. Happy Tuesday. Welcome on in. Oh, and also Digital Inventor.
Hi, bro. Hey, how's it going? Good to have you here. All good here. Nice. Hey, Quantified Quantum's here, too. All right. Good to have you here as well.
Hey, good to have get quantified quantums here. That's so cool.
Quantified quantum, you've been a long time. Been a long time. lots of videos.
All right, so what are we gonna do today? We have our Hey, it's Torva as well. Hey, Torva, good to have you here.
Was waiting for the live. All right.
Yeah, here we are. We're ready to go.
We're ready to go. We're going to do some coding. We're going to do some coding here. We need to do a lot of to-do items.
We'll make it work. We'll make it work.
What are you making? Digital inventor.
We are building a transformer. This is what you know the chat GBTS cla you know the geminis they're all based on they're all based on that this technology and we are building our own from scratch well not really from scratch we're using a framework called pietorch to start with we are going to be using the pietorch transformer class so that way it's pretty straightforward then after we are successful there we will implement this class ourselves directly Let's begin. Bingo. Good to have you here. Welcome on in.
>> All right. So, where where do we leave off? Where do we leave off? All right.
Let me see here. So, we have we have we still have to finish our dictionary creation, which is our tokenizer, right?
It converts words into tokens and then tokens can then be converted into embeddings. From embeddings, you can feed feed that into the transformer model, which is going to split those embeddings into three values, right?
Query, key, and value. We've got that text up here on the screen. So basically it keeps the data. It's basically copies. It's three copies query key and value. Once we've created the correlation coefficient and we with when you multiply that's the attention part, right? You do the attention part. Do some matrix multiplication there. We get the output and then we add back the original values in.
That's what we're going to do today.
That's the plan. I always wonder why installing PyTorch is bigger than a gigabyte sometimes. Well, because they have a lot of hardware support. It's a lot of hardware support. You prefer to call it from scratch if we're not following from a tutorial. Hey. All right. There you go. It counts. It counts. It's from scratch, right? You know it. It counts cuz we are doing this ourselves. Ourselves. Okay. So I need to create a dictionary here where we need to predefine our dictionary with some special tokens.
Our special tokens are going to identify the start and the end and also padding.
So self.dictionary equals I want to give it some special padding.
So we're just we're just going to give it say like pad uh you know what? How about this padding? Wait, wait, wait, wait, wait, wait. I need I need to think about this for a second. This cannot be an addressable.
It cannot be addressable.
So, zero. And then we're going to have start of sentence, right? Sentence.
And then we're going to have an end of sentence, right? Basically a a token that Oh, no. All right. Stuff falling off my desk here. A stuff. Something that indicates when the model should stop, basically.
All right. So, this will be two. This will be one. This will be the end of the sentence. There we go. So, now we've got our dictionary. Then we're going to instead of assign it, we're going to say update like that. I think that should give us what we needed for our special tokens.
I think we do. I think we do. Good to have you guys here. Okay, to-do complete. Let me mark this up here as complete so we know that we did this.
Let's do a green check mark.
All right, looks good. Wait, I don't I don't I only need to complete it once.
Hey, Brandon, good to see you. Welcome on in. Happy Tuesday. Hope you're doing good.
I bet you there's going to be some Wait, Benjamin Lindigrin. Are you Wait, you're Brandon, right? Just I I put I named you right, right? You're Brandon, right?
Benjamin, welcome to the stream.
Benjamin, are you not I might have confused you with someone else who has the same initials.
Just double checking. Good to have you here, though. Okay, so we need to get our data set. We built the dictionary.
Is it correct? Did we do it properly?
So, we get our starter uh starter sentence. Actually, you know, this should be sequence. Sequence a little bit. A little bit better.
Right. A little bit better. Okay. And then I've added special characters around the side that are they're they're nothing special. They're just there to prevent the tokens from being mistook for other tokens. Right? That's what those are there for. All right. And then we create our dictionary right here. All right. Looking good. Okay. So, we also need to do positional encoding though it's not mandatory.
You does it make it better? I I assume it does. I haven't tried it either way.
I think it makes it better. So, did we build our dictionary? I think I think we got a good dictionary here.
So, let's move this to the bottom. We need a data set. What are other to-do items that we have on here? Oh, we got to do masking.
And then we need to be able to generate our training data.
There we go. What else? What else do we need to do?
So, we got our embedding. We got our transformer. Ah, right here. Okay. So, our forward is going to need I need to figure out how to deal with this because there's a problem here.
There's a problem. When the transformer is looking for data on this forward, it needs it needs to get the target output I believe. So we have to we have to do some work there on that side. So this the target, right? So we got source and target and the mask. So those are things we're missing. What are we building today? And what language? This is Python and we are building a transformer. This is the sort of model when you heard about chat GPT and those large language models that this is what they're based on. It looks like this.
We are currently we are we've done the embedding. So we did the left side of the screen yesterday. We're doing the middle middle section of the screen today. So we're doing that part right there.
That's what we're working on right now.
And I do believe if I run this it should work. Python. What's the file name?
Transformer. Okay. Yep. And there we go.
We got some output. I like that. All right. Feeling good so far. Okay. I do want to create a get repo for this really quick. So, get So, let me see here. I need I I don't think it's Is it get status? Yeah, we we do have Cool. Is it? Is it? I think it's pretty cool.
Okay. So, let me create a new git repo for this really quick. GitHub.
Create a new repository. Here we go.
Uhhuh. We're going to call this the uh transformer pietorch.
Pi actually. Yeah. How about this? Pie torch transformer pietorch. Transformer.
And I will say myself.
We build a transformer model live model live on stream and we train it.
There we go. Public repository.
Then I will move this on over.
Here we go.
So let's see here. I want to How do I want to do this? So I've got my origin here.
So, let's just add this really quick.
Let's add the origin.
Okay.
Config.
All right. Looks good to me.
Let me see here.
One second.
Uh, where is that? Okay. CD transformer.
All right. MV.
Okay.
Okay, get status. Get add get status. Okay, there's our transformer file. Commit am initial transformer and we push to the main branch. We should be good to go. All right. Did we do it? All right.
We already have the main branch, so we're good. Get branch. Yep. So, this is a no op. And we just get pushed. We're done. There we go. All right. So, now that we've got that all like solidified just in case we Whoopsie. We You never know. I've done it before where I deleted a file and if it's not in version control, it's gone forever.
All right.
So, what do we want to do from here? All right. So, let's I want to print out my dictionary really quick. So, we just added some tokens to it. So let me print dictionary. Print dictionary.
And I need to create a new dunder method. I believe it's called reper self.
We just return dictionary, right? Print dictionary. Is that going to do the trick? Python.
Just comment those out for now.
Uh let's see here. Doesn't VIMM a file buffer that can be used to retreat?
Well, just yeah, it does by default. It does by default. However, once you It's called the swap file, right? Usually, right, the S swp file. And after you close your terminal, it deletes or after you close Vim. So, if I colon Q, right, if I close, it disappears. So, that's a problem. However, if it was still open, then I could potentially recover that.
But I turned those off cuz those swap files, I don't like them. I turned them off right here. Right. Vim RC right here. No swap file. I turned it off cuz they get in my way more often than I have recovery. So, what I do instead is I just get commit very frequently.
You're able to use what you learned about Transformers yesterday in a conversation today with another dev, which was cool, Kyle. That's great to hear. Very nice.
We are They're actually like as we're describing them or discussing the details. They're pretty straightforward, right? They're pretty straightforward for the most part. Pretty simple, simple, easy by transformer. Okay, so I want to print this out. Python transformer. Okay.
Oh, I need a stir, not repper. All right. Reper needs to be stir. Okay.
Str. There we go. Let's try that. Okay.
Oh, come on. Oh, wait. Non-string. Oh.
Oh, it needs to be okay. Hey, Epic Blox.
I'm You're No worries. Hey, we're we're just we're we went an hour later than usual. Usually, we go live around 10:00.
I had some meetings and so that's kind of how that works. Time for you to vanish. All right, Quantify Quantum.
Good to have you stop on by.
See, hide. Hey, quick fix. How's it going there, QuickFix? Good to have you here. We're going to be continuing our transformer. So, this needs to be a string. How do we do that? I suppose we just do stir.
Can we do that? Wait, wait, wait. We say repper. Reper, right? It's repper. It needs to be stir though, I think. Oh, no. Perfect. We did it. All right.
Perfect. That's exactly what I was looking for.
All right. So, we got ourselves a little bit of a dunder methods there.
Then we've got our tokenizer, our normalizer.
I think we're good on the dictionary, right? I think we're good.
Let's see here. Let me get rid of this syntax.
Okay.
How far are we in the process? Well, okay, let's look at where we're at right now. So, if we're looking at Thank you, Kyle, for sharing this diagram. We have our embedding here, which we just I just showed you on the screen where we've got our tokens, right? So, you can see our tokens defined there on screen. And that is the process of converting your words into numbers. It doesn't stop there, though. It takes your numbered words and converts them into embeddings which is a multi-dimensional array which I believe we drew out here somewhere probably. Uh did we draw it anywhere? Looks like this. It's like a a an array. It's a it's it's a matrix. It's a matrix, right? And each row is a associated with a word for the most part. Each row is associated with a word. However, what you can do is you can decrease the dimensionality and assign based on the value within each of these columns a different dimension if you wanted to decrease. Hey, peace lord.
Good to see you. Hello. Hello. Hello.
Welcome on in. Happy Tuesday. So, we are from the total part of the process. We did the embedding. We've got that done and we've got the transformer itself though. We are there's a nice little bonus that comes pre-built with the PyTorch transformer. The iteration over each token is much more seamless. It's a lot more seamless. Happy Tuesday. How are you? We're doing good. We're doing fantastic. Thank you for asking. We get started a little bit an hour later than usual because of meetings though. We're good to go now. What are we building? I forgot. Uh this is a transformer. So this is when you use chat GBT, you've probably you've probably used that by now, right? That this is the technology that drives that. So we've got oursel a transformer. And so we're currently working on the middle part right here.
Once we get that, we'll get our output and then we'll do a cross entropy uh criterion which is the loss function and we've got an optimizer.
Then we will train the model which is going to do back propagation by creating gradients.
Use claude. Oh, nice. All right. Well, this is this is like claude as well, right? Claude itself, the CLI or the desktop app is a harness that accesses the large language model, which is this behind the scenes. It's like the brain.
You can think of it like the brain, you know. All right. Good to hear.
Fantastic. I like Claude, too. Isn't Claude great? I use Claude. So, I've got Claude right here. Little bit of Claude.
There we go. Some Claude for you. Okay.
So, let me rerun this. Make sure it looks good. All right. Let me also try doing one of the outputs on this.
You know what an LLM is? I use it. Oh, llama. Hey, nice. Quick fix. Good to hear it. Which is your favorite? What's your favorite open model? Actually, let's ask let's ask the chat. What are you What are you using for your local model? What local LM are you using?
U using All right. Do we do Gemma? We do L like Wait. Llama. We've got Quinn.
QN.
PubNub went Jimma 4. Nice. That's the good one, right? That's the good one.
PubNub AI. It's happening. We're working on it. Niswa, we are we're there's things happening right now where we've got some native Pubnobi. It's waiting.
It's waiting to be released. We're working on it. Let's see what are Oh, other good ones. Okay. Kimmy K2 or Kimmy K star, right? Are there others?
Uh slash others. All right. Let's see what you guys use. I'm kind of curious.
Thank you for the hearts, you guys. Na, that's a great question. Chat GBT codeex. Oh, right. Well, that's that's a that's that's a good one. I use that one, too. Right. So, I've got that here.
Codeex, right? Whoops. Let's see.
Codeex. Got that here. Uh, skip for now.
So, we get Woo. All right. There's lots of things.
Continue without trusting. Okay. All right. And then we also got AGY, which is new anti-gravity.
We got anti-gravity there.
There you go. Deepseek. Oh, I forgot about DeepSeek.
I guess that goes under the other. It goes under the other. Chinese open source LM is a the winner. They are pretty good. Yeah, they they copy everything from the state-of-the-art models. They distill it, right? They train it through distillation. Uh, yep.
You're leaving in 12. It's 20 and 30 bedtime. All right. Quick. Sounds good to me. Seen the Google keynote? Yes, I did. Google IO. I watched it. Very impressive. Omni. I've used Omni. I like Omni. I like Omni. Yep. Hey, Snot. Good to see you. Welcome on in. Yep. It's a little bit later than usual. We'll be going on a little bit later today cuz we got started late. So, that's how that works. We had meetings. We had a bunch of meetings. That's how it works.
Let's see. All right. I want to Where are we at? Where did we leave off at? Okay, so where are we here? Yeah, they are way ahead. They are way They're doing good. They are doing good. Peace Lord, you're right. All right.
Where where So, where's our two here?
All right. So, we need data to learn from. That's fine. Positional encoding.
Maybe we don't This is like This is like the stretch. This is like a stretch goal. Mask. I know we can get the mask through the transformer library to be calculated for us automatically based on the dimensions that we have. So we can do that and I think the mask do we do that on the the target or do we I think we do that on the target. We don't need to do that on the source.
I think we don't need to do it on the source.
So let's see here.
Okay, got the music going. Are you using the sinosoidal positional encoding? Hey, need to let you know about that. Yes, cosine sign. Yep. So, we could do sign/c cosine. Are there other options? So, I heard about some newer approaches like 2 years ago. It's it's been a while. Some newer approaches to sign cosine.
Let's look those up really quick. Yeah.
All right. So see uh positional encoding alternatives modern stateofthe-art uh rotary embeddings. They're the best.
Like rotary embeddings.
All right, let's see. Selective rotary position embedding.
Rotary position embeddings. The rope. We got the rope, you guys. All right, so we got a nice little rope here. dominate modern foundational models because they encode relative positions geometrically, keeping semantic embeddings pure rather than mutating them. Okay. However, there's a catch. Handling extreme context lengths and multi-dimensional data has sparked several state-of-the-art alternative and adaptations. Oh, so there's something better than rope. They got something better.
Clipped rope. Well, that's just a modification, right? Clipped and gated encodings address the limitation of infinite context dependencies by calculating attention based on word token distances rather than the absolute positional indices.
Oo. Well, I mean, how do you how well?
That doesn't make sense to me because you need those positional references.
And if you're going to compare a word and their distance between each other, that is the positional distance. It limits positional influence datamically, letting the model determine when to forget distant information. H okay. What about selective rope? State-of-the-art advancement for linear transformers.
date space models. Huh. Happy Tuesday, Mark Lemon. Welcome on in. Hey, you're you're here on time. We got started late. We got started late today because we had meetings. We had meetings today.
Good to have you here. We're looking at positional encodings. I think the state of the art is for large scale large language models handles very long context. Oh, for the very big ones, right? Massive context. We're going to have tiny contacts because we don't have a bunch of GPU. Well, I could rent a bunch of GPUs pretty easily. I only want to focus on my local computer right now.
Wait, good. Oh, right. This is the one.
Attention with linear biases. I wrote this one. I wrote this one myself. I I I did this one. Instead of multiplying or rotating, uh, alibby applies a static negative penalty to attention scores based on the distance between tokens. Highly prioritized for zero shot contact lengths because it does not require fine-tuning for larger sequences.
Maybe we can do some alibby, right?
Tiny, I'd go with rope, but sinosod also good for learning. All right, good to hear it.
Benjamin, are you building an own version of chat GPT? Yes, exactly. Yes, that's what we're doing. We're doing it right now. It's happening as the moment as we speak. We also have tape time aware positional encoding modifies attention specifically for time series.
Sequential recommendations handling timestamps or irregular temporal spacing rather than discrete indexes. Ah, so if there's gaps, you have gaps in your data. You got like one, two, and then a gap of data. And you don't just go three, four, five, right? You can't you need to represent there's a gap. So tape is really good for that scenario.
Multi-dimensional and modality specific encodings.
Video rope extends 1D text processing for computer vision and video understanding.
All right, there's so many options, right you guys? We have a lie rotational position encodings. Generates rope into two and 3D visual classifications by leveraging the lie group.
Separates weights and spatial width and height and temporal frame index relationships independently minimizing attention bias across multimodal sequences. Okay. And it keeps going. All right. There's a lot in here, right?
There's a lot.
Yeah, they have improvements of rope.
They have quite a few improvements there. quite a few. Slash rope slash alibby. Alibby, right?
Where? What was it? Did I get that right? Was wait. Yeah. A L I B I Alibby.
Okay.
So, let's look up how to do the mask really quick. And also, I need to figure out my target here.
This is exactly what I want to do. Do I need my own infinite syntax code library? Wait, in infinite what's infinite syntax?
Wait, wait, wait. What is that? I I want that. It sounds like something I want.
Rope is surprisingly short to implement, so that's why I like it. All right.
Well, maybe we'll do rope. All right.
So, right now, rope is top of the list to do our for positional encoding. And we we've got the spot to do it. We know where to do it. Encode ing. Okay, looks good to me. We would need to add that around. Where is that?
Here in the feed forward. Here we go.
So, right here. Uh, wait. Do do we do positional encoding post embedding?
Right. So we do that right here.
I think so. Right. To do positional encoding here. Rope. Yeah. Exactly.
We're doing some rope. Gustavo. You know it. English just as bad. Sorry. Hey Benjamin.
Oh wait. Okay. Understood. What is infinite syntax though? I'm still curious.
We're going to do R O P E. Let's I've never I've not done rope. I've done alley by and I've done sinosoidal. So those are the ones I've done. So let's let's do hard mode and do rope.
Is this o org mode? Hey me can't spell.
How's it going there? Will come on in.
Happy Tuesday. Org mode. This is uh programming is what we're doing. We're doing programming. We're making ourselves our own transformer model, which if you're familiar with chat GPT, that's what that is. We're building our own. Wait, did you have access to attention uh to the attention? Uh we could.
So, right now we're we don't right now we don't because we're using it here. I think we could implement our own.
There's like drop in replacements for it. I suppose we could copy and paste it here. Where is that at? Here. Where is our Here it is. Transformer. You can do your custom encoder decoder here if you wanted to.
This is what we plan to do after though.
I'm going to do after.
Okay.
Pretty sure no language has to-dos built in.
Is that Oh, hey, me can't spell. Oh, is that not the case? You're right. They don't. This is just my text editor. It's called Vim, if you're familiar. And it likes to highlight the to-dos, right?
Get those to-do highlights right here.
Pretty to-dos there.
I think you can't use rope if you don't have access to the attention. Or maybe you're wrong. Oh, well, I feel like you should be able to because if you look at the diagram here, right, our embedding, we do our positional encoding before we get to attention, right?
tokenize, embedding, and then positional encoding, right?
Yeah. So, it might be able to work.
Yeah. Gustavo, thank you for clicking the high button.
Emacs. Oh, right. Emacs, right? Hey, check this out. Watch this. Type in Emacs. You go to Google. You type in Emacs.
Oh, wait. They got rid of it. Wait, they got rid of it. Wait, wait, wait. them.
Oh, okay. Here we go. See what? Type in Emacs. Who says, "Did you mean them?"
Hey, right. Emacs, right? I searched for Emacs and Google says, "Wait, did you mean them?" You meant them, right? You did. You meant Vim.
Rope is compute after query and key, not before attention. Okay, got it. So, we can't do rope. Oh, NO. OH, NO. All right. Well, this dude sinosoidal, but the same happened for Vim, bro. I did. You saw that? Yeah, I know. All right. If you click Vim. That's like That's the trick. Yeah, that's the trick. That's the uh That's You know, you saw You saw it. Hey, Sergio. Good to see you. Welcome on in.
Happy Tuesday. We're continuing our transformer model today.
That's what we're doing. All right. So, I guess we can't do rope. Let me see.
Let me let me see if it's possible though. I still want to learn about it.
See, let's go to Jiminy. Jiminy here.
Say rope for uh rope function in Python. Oh, wait. Use PyTorch. Use PyTorch.
Enter.
Uh oh. Whoa. Whoa. Whoa. Whoa. This looks This looks doable. We can do this.
Yeah. Yeah. Yeah. X1. X2 cosine minus sign s cosine. Okay. So this is a negative sign cosine cosine positive and then a positive sign.
Okay. All right. Based on the x. All right. I think we could do that. How much storage is do one need to run kind of project? It's expansive. Uh you if you're going to be Yes, you will. You need like terabytes, terabytes of data, terabytes, which is more than your laptop, right? Uh not just one terabyte, not two, but like a lot. If you were going to be building something like a fullon chat GPT, right? You would need a a mountain of data to get it to work properly. However, we're just doing a small language model or we want to get results using our local hardware.
Next, you're building transformers from scratch. Yep, exactly. We will. We'll do that a We'll do that after. We will implement this layer here ourselves, which means that we could do rope bin, right? We could do rope bin. Yeah.
What is that? The rotary positional embeddings.
How much do you manage to program while listening to music like this? Uh, do you find it very hard? Oh, me can't spell.
Really? Yeah, I can I can code no matter what's really going on. Of course, I'm just a beginner. I have to finish a semester of university. All right, you're on the right path. Shrian, how's it going? Good to see you. Thanks for clicking the high button.
Small language model. Exactly. SML, you got it. Today I was using C. Nice MD.
Good to see you. To create a Windows raw bytes, you have written most of the helper function. Now you realize that you have passed every pointer by value and not by reference.
What? Wait, wait, wait, wait. Pointers.
All right. So, you've got a pointer.
All right. It that has a specific address for it and you've got a pointer that you can point to data.
So you don't it's you can do by it doesn't matter if you do by value or by reference from computationally with the pointer because they're the same size right typically you get like a 64bit pointer address that's going to point to data and the address itself is 64-bit so it's just 64-bit right is that a problem now I'm here after committing that broken code all right sounds good MD you wrote your language model and recently implemented your own Laura after watching your stream. Really? Oh, that's amazing. Hey, Nidzswa. That's fantastic.
That is fantastic. You going to try learning assembly language for today?
Hey, practically sleep only 4 hours. Well, Shuvon, you know what? I think you're making a good call. I think you're making it. You're you're going the right direction. Just keep on push. Who's Laura? Hi, Pro. How's it going there, pro folk? Good to see you. This is great stuff, Stephen. Thank you, Kyle.
Appreciate it. That's very nice for you to say. I like it. All right, Sergio.
Nice. Yes. All right, so let's keep going here. So, I want to It looks like I could Here we go. Let's read through this really quick so I understand it.
All right. So, we've got our own positional embedding register buffer, which I forgot specifically what that does. I think this just overrides the the data.
Okay. Embedding cosine. Do we have to do other stuff?
All right. So, return.
What are we turning?
Uh, tensor tensor. Okay.
Apply rope. Where do we need to do that?
Rotated. Okay, that looks that looks like we could do Wait, wait. Ne, it looks like we could do this. You have a simplified rope code. Oh, you do? Really? Can you share on Discord? I will use your rope code because we have to run a step motor using a kale software.
Oh, and that requires a little bit of assembly. So, that's what's going on.
Okay. All right, dimensions, which is going to give us the size of the matrix matricy which we're going to be using to apply to our embeddings.
All right, that makes sense.
There's a lot of register buffer reaction here. So, we create three register buffers.
Let's see where is this. Uh, interesting.
There's a lot of shaping going on here.
Electronics and computer science. Hey, very nice. ECS.
Instead of doing explicit 2D matrix multiplications for every pair of numbers, which is slow in PyTorch, we use a vectorzation trick. Instead of rotating adjacent pairs, we split the vector down the middle into two halves.
The rotation isn't handled in instantly across the whole tensor.
via addition.
Okay.
All right.
When using this in a full transformer, you would instantiate this class once call apply rope on your query in key tensors right before computing attention. You don't apply it to values.
I was wondering about that. I did this two years ago in a scenario where I was like, I'm pretty sure you only need to do it for query and key. Do you really need to do it for values? I was wondering about that.
Shared on Discord. Thank you.
All right, let's take a look here under link. Share. Oh, thank you.
Click the uh thank you button there.
Okay. Tiny stories language. Tiny language models. Nice. So, rope. Do we have some rope in here? Yep. Okay.
Multi-headed attention. self tension with rope positional encodings. So you've got a rope file here. So all I have to do is go look at your rope. Let me just take a look at rope and see. That would be under models.
Rope. Here we go.
Ah.
Oh wow. That's quite a bit.
So what what is your sign sign? How do you how do you generate that sequence?
Okay.
Frequency. What's frequency?
Interesting.
Sri Lanka. Nice. W rope. Yeah.
Okay.
Let's see. All right. This is great.
Nah. Thank you. Thank you very much.
All right. This is TensorFlow. Great.
Here, let me zoom in. It's a little bit easier to see. Rotate half. apply rope to the to the tensor.
So it does it return it does. Okay.
Q and K. Hey, it even returns Q and K.
Wow. For For me though, for me, since we are using the transformer itself, we only need one.
You see? Oh, I see. I see what you're doing here. Oh, I see. All right. That's why. Ah, that's why we need access to the inside.
Okay.
A Steven also W Steven also. Hey, good to hear it. Thank you.
Thank you so much.
Okay. I want All right. This is great. This is great.
All right. Where are we now? Where are we now? All right. So, I've So, we've got the embedding. We have the transformer. I have to do a few extra things here.
We should also compute the mask and we've got our output. We need to get our target. The only thing is I'm not sure the forward method here when I'm creating it. Passing it through the transformer to say what our targets are going to be if we have targets or not.
Do I'm just not sure about this process here. Training data. So we've got our model and our forward.
Would it expect target data? So that's what I'm not sure about.
Okay, let me take a look really quick. Okay, so I think we have to pass this in the forward as the answer.
So basically this will be like question and answer. I think that sounds like a good idea, right? You like that one? I like this one. And then we have to encode both of them. So let's do question norm this question.
So these are the actual words, right?
The actual words.
Then we've got our dictionary is a we do uh torch.n.module superinit.
All right.
All right.
Good to hear it. All right. So I want So we got a question answer. This is going to give us the ability to pass through into the transformer itself the target output which is really nice that they did this for us because it means less iteration for the user's perspective because you just do you'd be doing extra token looping.
Do you work with Neoim? Oh, I've got Neovim here in Vim. Uh I don't use it though. I've got it right here. Right.
Right here. Neo Vim right here. A little bit of Neo Vim.
Uh I just use regular Vim though.
Just ordinary standard Vim. Wayland client from scratch. Win. Oh, Wayland.
Wait, wait, wait, wait, hold on. Wayand client. What is this?
Uh, language protocol application can use to talk to display server in order to make themselves visible. Wait, display protocol. Any graphics application communicates with the display server.
Oh, called a compositor to render itself on your screen.
Ah, that would be pretty neat. All right. Client server model. Client renders its own content into a pixel buffer and then hands it the buffer to the compositor for display. So, this is I guess it's kind of like OpenGL. I mean, OpenGL here. Clients handle their own rendering using libraries like OpenGL and notify the compositor when the frame is ready.
All right.
Unlike older X11 systems, Wayin eliminates the middleman for Windows management, allowing clients to communicate more directly with the compositor for faster performance and better security. All right, that seems like something I would need to learn more about before before we could build it. Oh, hey, Wayland client right there.
Look at that.
Oh, actually this looks pretty straightforward.
Hey MD, this looks pretty easy.
So this will connect and then we just need to start blasting communication to the connection running Whan display unable to connect.
This is interesting.
All right.
Creating a display, incorporating event loop.
Uh, I mean, you are using Vim for working daily. Yes, it's my favorite text editor. It makes me happy. It does.
No lips, just play raw bytes. Oh. Ooh.
All right. Really? Oh, that's actually doable. I understand how to do that.
That's pretty That's still straightforward. Uh, I would just need to know what bites mean. By the way, have you watched Suits? I have. I've watched Suits. I started watching that a few years ago. I saw part of the first season. I didn't watch the rest of it cuz I was watching it when I was having some illness or something and after I got over the illness then I didn't have an appetite to go back because I was having a bad time. So, it's not the show's fault. It was just what happened to me. Your Vim reminds me of Groove Box. It is Groove Box. It is right here.
Vim RC. You're so right. Groove Box right there. right there.
Hey, John Madden. Good to see you. The thing I keep running into is that my Neoim install spawns a bunch of TypeScript servers for hints. Oo, and it eats up all of your RAM. Ooh, you got to turn that off. You got to turn that off.
So, Wayeline Climber scratch live on win. I know, right? MD, I see what you're saying there. We could do it. I know how to do it, and I know that we could make it successful. I know we could. The only thing is I would need a uh a compositor running. We'd probably also because I'm running on Mac, right?
Your name, right? So this is my OS and I don't know if we could use Wayand on Mac. Can we do Wayland client on Mac? Is that possible? I don't think it is.
Right. Running Wayland climate is an experimental behavior because Wayland is native to Linux. So I would need to get a Linux setup for that.
You're also using Vim with group box.
Nice.
You have the background a little transparent. Yeah, I used to do that too. I used to do that, too. And then I stopped doing it. I We could do that.
It's just that specifically like when I'm on stream, like if I had the, you know, this box showing slightly the background, then it would it'd look a little weird, right? So, for stream, it made sense to make it fully opaque.
MDO, I know. But we we just have to set up Linux is what we'd have to do. We just set up Linux, which is not a problem. All right. So, let's complete our code here. So, this is question norm answer.
Answer norm.
You know what? We should just copy and paste this here. A little bit easy. A little bit easy. There we go. And get rid of that. So, we need our question tokens.
Here we go.
This will be the question norm.
We're going to do answer tokens.
Wait. Yes. Answer tokens. Yeah. With the answer norm work. Okay.
So, we got our our Q&A here. What would be nice is if I just merge those together in a single step.
I can't work without hints though. Oh, John Mitten. Really? Yeah. I like to do everything as custom from scratch. On the upside, Neovim is amazing for ergonomics when doing an agent coding.
Ah, yes. I've seen some of that. It looks pretty impressive. It's easy to tear text out of one buffer and push into another. Nice. Very nice. What's your favorite PyTorch Tensor or TensorFlow? PieTorch. My favorite is PyTorch. Good question.
I like PyTorch because it gives you more control and it's basically you're dealing with a pool of float of of numbers and you can shape those numbers however you want. So, it's really fast.
So, if you need to make adjustments, you can just do a reshape or change the view or whatever. It's pretty easy.
PieTorch. I like myself some pietorrch heart icon.
Blasting communication to the connection is Steven's favorite thing. It is. It is. I know you laugh it, but we could, Kyle. Yeah. Uh-huh. We could blast all sorts of fun instructions to the Wayland Compositor.
Have you heard of Triton? High torch plus Triton equals fused operations.
Ooh, so Triton sounds like it has got some mega kernels. Got some mega kernels there.
Use both Jim and Quinn locally, but you use Quinn more. Hey, Bonzupi, good to hear it. Thank you for letting us know.
W Stephen A. I'm that counts as a good thing. Is it is W equals win? Right. I'm just I think W equals win. I'm going to say that.
Okay. So, we need our question answer embedding. Now, let's do this really quick. Copy and paste.
Here we go. Question answer embedding.
And then we're going to get our question answer tokens here.
This will be answer tokens. There we go.
All right. Now we've got all of our correct data.
Then we've got the target. Hey, there we go. I knew we were doing something. I knew we were doing something. All right.
So, this counts here. So, this is our question embedding and we've got our answer.
There we go. Perfect.
I'm going to count that as done under the to-do completed items.
W. Yeah, got her W's. Have you seen the Renee Rae?
He does Linux kernel development work.
Oh, I might have run into when I was scrolling when I'm scrolling on the phone. I might have run into a few maybe. I don't know if it's the same person, but I have seen someone doing some kernel like development. He uses default Vim, too. Hey. All right, that's great to hear it. W equals when, but it's used like a verb. Oh, okay, Kyle.
Thank you for clarifying. I was wondering about that. I was I was wondering about it.
He is live. Oh, really? All right. Good to hear it. That does sound like a fun maybe thing. So, here's the deal. We're We have the rest of our We have the rest of forever to do all sorts of fun things like that, right? I want to do hard mode things. I want to do like Rust deep learning with the burn library. I want to get that going. I want to learn Haskell. I think Haskell will be a lot of fun to learn. I've I've used a little bit of it and just didn't it didn't click. However, I understand functional programming now and enclosures and different patterns that you can use with closures. So I think H hasll is a perfect language for that. That's like kind of the point of HASLL. It's a very heavy function functional forward heavy language.
Then I want to do more assembly. I want to do some more assembly. I think it would be fun to do some AI and assembly, right? I think that would be pretty fun.
Then maybe some potentially some game development as well. More gamedev.
For others, you can make a dedicated video explaining machine learning from beginner like road map. That would also be really good. Yeah, I like that idea.
Hey, come back to me almighty. Good to have you here. How would I start coding from uh without being you don't need to worry about math because coding there's not really a lot of math in coding.
You're lucky. You're lucky. Can you say Triton?
There you go. There you go. Yes. I like that. I like that idea. I want to do some more beginner beginner uh machine learning stuff because I think I can describe the details. Okay. Taking in mind current IT market. Yes, we can do it. We can do it. When I want to understand something and try to use Russ for that. Uhhuh. You try to use Russ for that and youize. Wait, what do you mean you What do you mean you aize? What is that?
Hey loons, good to see you. Welcome on in. You were re ad revising for an exam.
Hey, very nice. Always good. Always good to learn new stuff. It's getting frustrated how much abstraction there is. Yes, that's the problem with Rust.
There's too much abstraction. There really is. Have you ever used Rust yourself in the It is a problem because Rust while so powerful with concurrency safety, memory safety, you know, all the safeties built in like type safety of course, then it comes with extra syntax and I want a simple version of Rust please. How are you? I'm doing good.
Good to have you here. C is perfect for understanding what G going to understand. Yes. Okay. I think C is really helpful because it's it's a lot simpler. There's not too many abstractions. It's clean. It's easy. And it's powerful. How can I apply coding on medical science? Come back to almighty coding for medical science. Well, it depends, right? It depends on what you're doing. You probably want something like scikitlearn. I kitle learn to start with then potentially making your way to Jax or PyTorch.
Sorry. Hey, no worries. Rust. Got it.
Rust. Okay, understood.
Good, good, good. MD, thank you for the clarification. Okay, so we got our sequence. Good. We built our dictionary.
We got specialized tokens. We figured out the target. We sure did. We got our answer question and answer. Okay.
And then we've got Yep. Here we go. I think this will work. I think the code we wrote might work. Can you suggest me a good book on machine learning? So someone else might because I did all my machine learning learning with online materials over many years. Are you going to do leak code stream? Ah, we should need See, leak code is a lot of fun because it's a game. It's a game, right? It's a game where you have to solve the puzzle.
It's a puzzle game. I want to do that. I absolutely want to do that. I think that'll be a lot of fun.
We'll plan for that as soon as I've got all of my desire for absorbing the knowledge through practice of machine learning and PyTorch. Maybe we'll take a break from little coding and do other kind of coding with leak code. Let's go.
I'm in. All right. Nice. Good to hear it.
Does any mod remember muting? Oh, because I got one suspension a long time ago, but I don't remember it. Oh, loons.
I think that I somewhat remember that.
Okay. Python transformer.
All right. So, we've got a little error there. Answer. Where is he? Here.
Question. Answer. Embedding. Embedding.
Embedding. All right. Where is our problem at output equals model line 90? See here. Where is our problem at?
This should be good. Oh, wait. Yes, yes, yes. That's good. And then training data. Good, good, good. Okay. So, it's got a problem here, which means it's around here somewhere. So, let's just do return question tokens. Wait, no, no, no.
Return question norm.
Okay, so we should get the what? Wait. Uh oh. Oh, oh, right, right, right. Okay, here we go.
Okay, let's let's do this. Let's do this. Um, create an array.
And here is the rest.
Let's try that. Let's try that. Yeah.
Okay. Training data 0 and one.
Uh, zero and one. There we go.
Why do rust syntax a little bit weird for the range like oh right right the dot dot yeah the double dot equals 100 isn't it alien like syntax? It kind of is right. You don't have to say equal sign though. You can just do you know uh 0, right? Which would be like 0 to 100 which is 101 or it might be 102 elements right?
So when you say zero or 1 dot dot equals 100 that will that'll also be 101 except for it will be 1 to 101 right so I got that wrong all right so 0 dot00 equals 0 to 999 and then if you say equals then it inclusive inclusively has so that's why they did that equals Uh 0 to 101 or one 100 is what I meant.
There we go. There we go. There's that's the correction there.
Elite code is a little coding challenges. Yes, exactly, Lun. Exactly.
Why are the Rust creators doing it? I don't know. They just chose to do it that way, right? They get they they made it. They get to choose. They can make it simple to read and understand. They could have, right? They could have. They chose not to. They chose not to, which makes me sad. What I want though is a simple version of Rust. I feel like it's possible to just take all the Rust primitives, still support all the crates and the libraries that you can import and then have your application itself be in a simple Rust. So just like a paired down version of that. I think that's possible.
Okay.
Uh this should fix that.
So, let me go back up here and hide that for a moment. Let's see if that works.
Okay.
Pad all words. Wait, I wait. Hold on.
Did I do that? Word list. Where's this code coming from? Think for the hundreds you guys. Quinnpaw looks good. But all of it skills are in Chinese. Oh, are they really? Oh, so that makes it a little challenging to decipher.
only concatenate string not list to string. Where is this a problem at? Why is that there? That's not my code.
See, did I pad?
Oh, this is Oh, whoops. This is my code.
So, oh, I know what's going I don't. Hold on.
So when I create my dictionary, I'm not including these words that I that I I put I So I added new data without making it available in the dictionary.
So the agent just speaks Chinese. Ah, yeah, that makes sense. Well, it's compressed, right? Cuz what's great about the chi Chinese uh like Mandarin, right?
All the symbols, it's like they're just two bytes, right? Double bite symbols.
one, two, which means that it the words are really small versus most words here, right? You've got a bunch of a whole bunch of characters.
Typically though, that doesn't matter because each word is going to be a token anyway.
So, is there really better compression when it comes to like Chinese or Mandarin car characters? I don't think so because it's the same either way, right?
This isn't practical. It isn't. Why not?
What is How do you say nihow in ch and mandrin? Nihow.
Uh nihow.
Mandarin.
Oh, is all right. Nihow. There we go.
Hello. Yes. What is the symbol?
Wait, it's two symbols. No, really.
Hello is two symbols.
Are you kidding me?
All right. I didn't realize that.
That's That is un unnecessary. It could just be one. It's one word. It's one word.
All right. Where is my dictionary occurring at? Here we go.
All right. Dictionary training data. Ah, okay.
So let's upgrade our dictionary to support multiple.
So this is going to be uh this is going to be all of our training data.
We need to join. Suppose we just do a simple join, right? This is this is not a good idea though. This is not a good idea. Let's just do this.
All right. That's just a that's a placeholder for now. It'll come back later. We'll have to deal with it later.
How's it going there, Dr. Scruffy?
Yeah, it looks like you good. Oh, yeah.
Right. You good? Is that what that is?
Okay.
Nin and how nihow. All right.
I didn't I didn't realize that before. I thought that thing's weird.
All right. So, that'll fix our little problem that we had there. Yep. There we go. We did it. All right. So now we've got our question answers being passed forward in the feed forward.
So did we get this done? We got that part done target. That's complete. Okay.
Then we need to do positional encoding.
Put that over here. I'm going to indent this because you never know.
We need to do something. We need to do something along those lines. Okay. So left is the mask positional encoding and then a lot more data. However, we also need to upgrade our dictionary.
Upgrade dictionary upgrade dictionary support uh better word uh memory management.
I don't know if we really have to do this.
Most of the Asian language are either very specific, one word for one thing or phonetic. Oh, wait. There's a phonetic part to it. If I'm wrong, correct me.
Hey, loons. I didn't realize. I thought they were all symbols. I thought everyone every single one was a specific symbol.
Hey, silly techie. Thank you for subscribing. Good to have you here. You joined the right channel for software engineering. We are currently building ourselves a transformer model that's going to be like chat GPT though we are going to it's going to be a small language model. It's not going to be a large one. Give me a teeny tiny little bit teeny tiny one.
You think the Russian tag come from writing an array which basically is like the oh yes the slices and a mix of math where they leave out the numbers with the dot dot. Yes. Range operator. I like those range operators.
Range operators. Very nice. I like those range operators. They're kind of nice.
It's nice. I mean, like other languages like Python, you've got your range operator here, right? So you say list range and you say 1 to 100, right? And it gives you the list of 100. One to 100, which is human readable. It's perfectly readable. Whereas in Rust, you have something that looks like this, right? Dot dot actually. Okay. Yeah, I was wondering if Py I was wondering if Python though was going to support that. It doesn't though. Like would this work in Python?
Traditional Chinese, like Mandarin, very specific one. Simplified Chinese, the phonetic one. Oh, hey, we're learning new things today, aren't we, Lun? That's great.
Okay. All right. Our transformer looking good there. All right. So, we did one sample.
Hello. This is all the data and here is the rest. So this is the you know question and answer. Yeah. All right.
So if these are question answer pairs, what we could do is if we're going to be training the model on question answer, we need to get some data so that way we can see if the model is succeeding. Though I think this is fine for now.
Let's just see what it comes up with.
Let's find it. We'll try it out. We'll try it out.
Let's see. Let me make sure I've got everything I need here. I don't want to forget anything. So, I'm going to copy and paste these things for now. Whoa.
Whoa.
All right. Copy. Paste. Paste. Paste.
All right. So, we got our dictionary tokenizer. We did do that.
All right.
transformer self attention and multi heads. We basically have that good to go because it's working right now. You see it right there.
We need to do the training model.
Okay.
You can use MLflow so you can compare all iterations. Oh, Peace Lord. Oh, yeah. I've never I haven't done that. That sounds like that would be a really nice thing to use, doesn't it? I would like convenience. Do you like convenience? I like convenience.
The the phonetic one is context based.
Mostly the same with the different in intonations having accents instead of being separate signs.
Got it. Okay, good to know. Keeping things simple, you guys. Google Translate there are both traditional and simplified.
Oh, okay.
So, I see simplified has the same starting symbol, right? It's got that same like tree looking thing there. So, it looks like both and simplified same shared symbol.
Okay. What's this? Did we do that? We did not do that part yet. We got to take care of that.
What else do we need to do here?
All right, so that happens there.
And we don't need this.
We should We should have a a dedicated training data generator of some sort. We need to do that. Let's add that in.
Okay.
A training data generator based on our input data file. So we need an input data file.
It used it. You did. It's quite helpful.
Nice. Good to hear. Can represent it in a graph so you can visualize. You can visually compare at a glance. Use ML flow. Also learned it. Nice.
I think it's the knee of the kneehow.
That makes sense. Yeah, I think so too.
It's uh it's the knee of the kneehow.
I kind of wanted it like if it's going to be, you know, traditional, I feel like a hello or a greeting, it deserves its own cemented placement in the language by being just one character.
However, for whatever reason, they're like, "Nope, we're going to make it two characters."
I feel like that. And then, oh, wait.
Hello to everyone. are three characters, which is Yes, they're this. No, they're different. This is weird.
This This final symbol though is the same. Interesting. Hello versus hello everyone.
Niha. Oh, nice. All right. Niha.
All right. Thank you for clicking that hello button again, Lun. All right. I want to continue onward here.
See? Googled it. Apparently, traditional Chinese just uses fewer strokes. Oh, really? Okay. Got it. Well, understood. Are we implementing our own tokenizer now? Yes. Forcing hello. Yeah.
Into a single token. We sure are. We did it. Yes. We got our own tokenizer here.
This is basically our dictionary. I called it a dictionary, though. We could have called it like tokenizer or something like that. And I wrote, we just wrote this. We tokenize right there.
That's it. You see it? We built this live on stream. I like building my own so I know what's going on. I I like it.
I I I'm fine with it. I wouldn't mind using someone else's. Though, in this case, I feel like it's important that I capture the specific meanings that I'm looking for.
Our own token. John Madden. You got it.
Don't put the how in the beginning. It immediately changes the meaning to a slur. Oh, wait. Does it? Oh, cloud AI for free. Wait, where's that?
Where do you get some free cloud AI? I like some free cloud AI. All right, so we got our transformer.
Where do we need here? Okay, so let's grab this. So, we're normalizing, we're tokenizing, we're embedding, and then we need to do a mask and positional coding. All right. So, I know there's a way to get a mask here. I just need Thank you for the hearts, you guys. Appreciate it. A mask on this. Let's go to Jiminy here and say I want Oh, wait, wait. No, let's start a new one. Okay. I want the uh torch transformer.
Hey, Mr. Bungie Pickle, welcome on in.
Happy Tuesday. Good to have you here.
Need your help. Why do I see uh some sort of monitor script error on your router log? Should I be concerned?
I don't think so. I don't think so. I think you're fine. I think you're fine, Mr. Mumpy Pickle. That sounds scary cuz they're like, "What is that? What is this?" Uh, power over Ethernet monitor script, right? That's what that is.
Power over Ethernet, right?
POE E PE equals power over Ethernet.
So, I think that should be fine.
I'm pretty sure.
Had to reload my chat window there for a second. Okay. Um, in Japanese there are three alphabets, kadakana, hiragana, and kanji. The more complicated one. All right. I forgot about that. I did hear about that many years ago, like 30 years ago now. I heard about that and I've only ever I my favorite one is the easy to read one cuz it's in English.
Okay, thank you. Been stressing for a while. Yeah, I think you're good. So, I think you might have a PoE port and there's just something some error that's not that's triggering it or like saying, "Hey, it's trying to transmit power," but it's not or I don't know, something's going on there. I don't think it's a problem. I think I think you're fine. I was pretty sure. Torch transformer mask make from 10 sort.
There's like an easy there's an easy function for generate square submask.
There we go. That's what I'm looking for. All right. So, let's go into this in PyTorch and read this here. Here we go.
There we go. This is what I'm looking for.
Generate square subsequent mass. So, SZZ, that's it. We only have to pass one value in. Only one value.
You're so confused by how the major model tokenizes. I imagine there must be a lot of thought or compute behind why they split certain sections of the text into letters and syllables. Yes, there was. So they it's called bite pair encoding. BPE byte pair encoding.
They do this because there are a lot of words.
There are a lot of words and to get the tokens uh reduced instead of having like 100,000 tokens they could reduce BP to BPE which I believe is going to give you like I don't know in the range of 5,000 to 10,000 tokens. I think the actual answer is like 30,000 in chat GPT. It's like 30,000 tokens which is much better than like 100,000 tokens. That's why they chose bite pair encoding because it's going to split your words up that are reusable components. So that's why bite pair encoding is great. So you got the word empowerment gets split into em and then power and then mint, right? So you got you get three three words there. And in fact they probably bite pair and code it like that. So you get four four tokens. The problem is more tokens.
The the on the context, right? The context links has to be bigger because this is four tokens versus one token.
Though the total dictionary, all total word counts, your vocabulary reduces from like something like 100 plus K, 100K plus down to like 30K.
So, it's like it's like a balance, right? Hey, Scarlet Fire looms. Yes, you got it. Scarlet Fire is happening right now.
121 major languages and 1,300 rationalized mother tongues. Oh my goodness. Peace, Lord.
Sergio Steven Blum's the coolest coder.
Thank you. Appreciate Sergio. That's really nice for you to say. So, you get a smaller language. Yep. A smaller large language model. Exactly. John Madden.
Exactly.
Kanji are the easier than you think.
Even though you are over 50K, you basically just need to study 270 radicals that carry meaning and are just glued together to make.
Okay. Well, 270 seems like you can memorize that. That seems memorizable.
Memorable. Hey, Robbie. Good to see you.
Welcome on in. Happy Tuesday. We're currently working on our transformer model. We are getting the mask going next and then we can do some positional encodings. I think we don't we can be fine without having positional encodings initially and then I want to add them later. So let's do the mask first. So we need to mask I believe just the question embedding. So we so uh question embedding mask equals torch dot where is that? Where is this at? I think it's NN maybe. Yes. Torch.n.t transansformer. Where was that at here?
It's around here somewhere. Where did we put it? Okay. Yes. NN transformer generate square subsequent mask which we opened up here. So this is our function.
Where we at here? Here we go. Excellent.
Excellent. Just came out of an internship interview. Hey, very nice. I hope it went well. Did it seem Do you feel good about it? Did it Did it make your day or are you just sort of not sure? Are you not sure?
Transformer.generate.
Then this will be question embedding I believe, right? Yeah. And then we'll put the mask in. Here we go. I wait. No, no, no, no, no, no, no, no. I think we have to do this differently. I think we have to say question mask.
Yes, great people. Yes, it's good to hear. Very nice. You speak Bengali, Hindi and English and learning German too. Going to get major languages. Nice.
You feel positive about it. That's such great to hear. That's so great to hear.
Isn't that great? Isn't it wonderful feeling? Like you feel like you've accomplished something in the day and so it's just like ah things are going well.
That's great to hear.
Okay, so for those of you who are here, we're going to jump into Discord real quick.
We do have a hackathon going on right now in our in our Discord. It's the Bloxsai AAI hackathon. All you do is go to blocks.ai and follow the onscreen instructions.
It's a website, right? I'll click it right now.
Hey, Snot. Good to good.
You've been I saw that. I saw that. You did that twice. You did that twice. I saw you put the uh the the like hand over mouth.
Just follow the instructions and you're good to go.
Really interesting things. three writing systems during the Nara period where Japanese allowed the Japanese writing styles as well as the phonetic writing styles with words.
Really? So that how long ago was that in Japanese and Chinese? A mixing of both. So you would you could write the words in Chinese. Oh, what words sound the words? They mixed both Japanese and Chinese also both phonetic and actual words. They tried to merge the languages. That would be neat to see.
You have a question about PubNub. All right. Lons. Yes. Does PubNub have an AI company using the product for like their chat app? There are a few. There are a few.
Hey, is it is it late over? Hey, Snot 123. I bet you it's late over there, right? I bet you it's late. Y'all better like and subscribe. Mr. Bumpy Pickle, thank you. Appreciate it. What are we doing today? Steven Robbie, you're looking bons. We are building a transformer.
We are done with which is basically we're building ourselves a language model, right? You've heard about chat GBT cla with Opus Gemini, right? With Google.
We are building one. We're building one.
We're it it'll be smaller.
Doesn't follow instructions. as well.
Hey, good to go. So, yeah, we're currently working on this right now. This is what we're doing.
C++ or HA hasll never shipping class.
Yeah, I think C++ is going to be most shipped language maybe ever, right?
Unless you compare it to C. Maybe C is in Linux. A lot of the internet runs on C.
Chat GPT. Oh, you don't like the chat GPT? I think it's fine. It's fine. Wait, do you have opinions? Tell me about it.
Uh, let's see.
Tell us more. It's one of the lost robust languages. It is. It's kind of getting a It's getting the sideline now that there's so many other languages that run so well.
Stephen, send help.
Colonel crypto is hard. Oh yeah, it is.
Well, I can tell you. Keep on going.
You'll make it. You'll make it. Bonzupi JavaScript is close to CLS. Wait, what do you mean? Syntactically, it's kind of similar. A little bit. It's not completely the same. It's got curly braces, right? It's got the curly braces.
Okay. So, I think this is our mask. And I think our transformer needs a mask input here, right? Mask. How do we do this? Oh, in the mask on the forward function. Okay.
Transform also target source mask. Okay.
So, we have to do this. All right.
But we want it on the target mask, right? We want target mask, not on the source mask.
So, I have to do that here. So, this needs to be different. This has to be the answer mask. Answer. Okay, here we go. Answer mask. So, we go TGT mask equals answer mask.
I think that'll do the trick right there. You wanted Quinn to do the one thing. Use the included skill to make a website and he just didn't use it. No, really? Oh, that's terrible. It's built.
What do you mean it doesn't work?
Yeah, sometimes those uh coding harnesses, they just didn't pull in the context. Oh, thank you so much for the bite pair and coding explanation, John Madden. Absolutely. Yeah, isn't it? It's really straightforward. It's really straightforward. Basically, it's just splitting the words up so that way you have fewer vocabulary, smaller vocabulary.
It's it's a trade-off. more tokens or do you want a bigger vocabulary for me? For what I'm doing, I don't need to do bite parent coding because I can keep the whole word as an entire token all on its own.
How many machines it use or how much is it shipped? How many machines it used or how much is it shipped? Uh class what?
Oh, with JavaScript and C. Well, okay.
Yes, you're right. Comparatively speaking, if we are going to look at the most widely run programming languages, JavaScript and C are the top because every computer, every phone, not only is it running C because it's running Linux typically or, you know, it's running a Mac OS like a iOS with with iPhone that's running C-based systems, right?
For the most part, it's C. And then JavaScript because it's web JavaScript on the web. So all browsers have it.
Coloss, you know it. Machine learning and assembly. Peace, Lord. I think we're going to do it. We're going to do it.
We'll make the plan. We'll make the plan to do it now. I made him dumber with faster mode and not thinking mode. He wants to use the skill all of a sudden. What? Android is JVM base, I believe. Yes. So the the operating system because I've got an Android is Linux Linux right Android is Linux JVM is installed because typically your apps are written on the JVM compiled down to the bite code for Java right so typically you're writing code in Cotlin maybe Java native and it compiles to JVM and that runs on your phone you're right you're right there is some Java And I don't like it. I don't I don't I don't like it.
All right. One second here. I'm just reading some messages. All right. See, it's just an assembly wrapper pretty much, right? It is. The nice part about C is it will target different architectures. So you can do ARM 64, arc, risk, you've got um right, you've got different microprocessors, right? pick 32 ESP and then you've got uh that say x86, right? The one that most people use. Eagerly awaiting, Stephen. Are you?
Sounds great. Looking forward to it.
Yeah, we'll do this someday. We will.
Okay, where are we? Did I get the mask?
Did I get the target mask? The answer mask. Did I do that right? I want the answer mask, right? Uh or do we need both?
Let's do both. Let's do both. Let's do it. Question mask. I'm actually not familiar with this part. This is part I didn't always do. Question embedding.
There we go. Perfect.
Good idea for a poll. Do you ever Oh, here we go. Let's do it. Let's do it.
Okay.
Do you refer to AI as he, she, or it?
That's a good poll. That's a good poll question. Here we go, you guys. All right, new poll. All right. Do you refer do do you uh what do you call that?
Genderize gender AI as he she or non gender gender. it.
He she tab and or it slash thing or them. I I should I I don't know why I'm only doing one uh gender. There's multiple it thing. Yes, that's the correct answer. Yes, 100%.
That's the correct answer. Bonufi. Yes, exactly. Oh, no. I know. Here's here's the deal. So, all of our European engineers and European non-American engineers, most of them say he.
Most of them say he. American engineers say it or the robot or the machine or clanker. That's what they do.
All right. SRC mask. src mask equals question mask.
Okay, I think that'll do it. Let's see if that runs. All right, so we have a problem here in just one. How about we change this really quick?
Okay.
Uh, one second.
Oh, yes. Clauss, I see what you said there. I see what you said. Yes, it is.
Yeah, because often there's a lot of yelling at the AI model. Like, you're not doing what I told you to do. So, you yell at it and you give it a slurp. I use he sometimes, especially when referring to Claude. But I enjoy calling it slopbot 2 based on your knowledge. Yeah. Uh yeah, based on real world experiences. I know what you say. I know what you say.
All right, let's double check that this is still working correctly.
Okay. Yep. Okay. So, let us grab just the source mask.
Actually, wait, before I do that, see if that's the line of code that was failing. Okay, so that is this is the line of code that's failing right here.
So, I have to figure out why that's failing because I want to generate a a mask.
Do you have dot files for your color screen? Yes, I do. Here we go. Chill.
Bim src.
SRC. Wait. Bim RC. Here we go. Those are the color settings right there.
Your Arctics box. Ooh, you're working a brown tan setup. Nice. Yeah, I've done that. I did that for a while. Quantum AI equals the endg game. It sure does.
That's when we get to AGI. I think when our quantum processors are good enough, they are going to be so fast. They'll be able to think at the speed of light because it is. It's thinking at the speed of light. Hey Bruss, good to see you. Welcome on in. Happy Tuesday. We're building ourselves our transformer boo finger enthusiast. You're welcome.
You're welcome.
Okay, here we go. Here we go. I want to figure out why that note work. Okay, let's see here. One second.
Here we go.
Interesting.
Okay.
Okay.
Oh, got it. Okay.
So, I'm doing it wrong.
I think there it's a healthy exercise to treat large language models with certain level of abuse. Hey, John Madden. Uh, defend. I guess if you're getting good results.
I always treat them nicely and with motivation.
It's not a human and should not be treated like it, right? Because it's just it's just an algorithm. You're right. It's just a bunch of numbers in a for loop.
No need to write an essay if it understands. Break that function out.
Yes, exactly. That's also very important. Minimize your input tokens.
When you're working with large language models, code generators, specifically harnesses as well, like Claude Code, Codeex, and Gemini with the anti-gravity, I recommend cuz this works really well.
You use fewer words. Drop all of the formalities. Hello. Can you please do the blah blah blah? No, just say update function file add add color purple. That's it. Done. Done.
It understands it and understands what you mean. There's no need to ask it. Is speed the ultimate attribute for AGI?
I think so because right now it's that is going to be able because right now a model is frozen in time and when it's trained after it's trained it's frozen in time so it can't really think anymore and if we want to have a representation of human brain where you're constantly you know like trillions of things are happening in your brain. We need to be able to represent that in the real world with these artificial intelligences. We can't do that right now cuz we don't have enough compute. So Ally, welcome on in. Good to see you. Thank you for saying hello.
Also, the amount of time, effort you put into that could be better spent on different sessions. Oh, John Madden.
Yes. So, you just write little bit of session here, jump over to another session, a little bit more over there, jump over to another session, a little bit there. You don't need to write a giant paragraph of instructions. Keep it simple.
Is it human? We have I heard that before. Not yet. Not yet, class. We're working on it, though.
Basically, you write prompt as code in English.
Uh, yeah, that's that's what I do. Do you do that? That's what I do. Hold on one second. I need help with the thing.
Python torch import torch. I say torch.tensor.
One, two.
And we'll do we'll do a multi-dimensional tensor here. Okay. Dot size I think is that it like this.
And then we say function. Can I specify the number the dimension? There we go.
Okay. So this tells me how big I need to make my So this has to be a dot size one for it to work. Right? Is that right, you guys?
Hey, JB Bro7, welcome on in. Good to have you here. Thank you for saying hi.
Your brain's fried. Pseudo code in English. Oh, got it. English. English.
Yeah, I figured. I figured that's what you meant. Speed's not just about tokens per second because it scales with compute. You can throw a faster computer at the largest models, but efficiency is also a factor, too. John Madden. Yes. So there's all these attributes and all these considerations around these models that we need to take and improve it. We need faster hardware, faster algorithms, different strategies and approaches. And if we do that, we also need to add in sort of say like short-term memory and long-term memory modifications. Today, what we've done is we've built these large rainbows models.
We've trained them with a lot of data and then we put them into production as frozen weights. And that is part of the problem. We need a way for the model to also make adjustments based on its perceptions. Otherwise, it always reverts back to its initial state. And we cannot get I believe we cannot get AGI until we have some sort of mutation during inference.
That's it right there. That's it. It's what we need. Mutation during inference.
Bro, Steven, my dog. Hey, the infamous.
Good to see you. love how I always come across your lives. Hey, that's great to hear. Yeah, we're building ourselves a transformer today. We're building a language model. Yeah, a language model.
I think I just fixed the thing that we had a problem with earlier. Let's see if that fixed it. Let me run it. Code here.
No, something else happened. We got another problem. Uh oh. Oh, wait. The problem is I Oh, wait. No, we did fix it. Hey, there we go. Okay, go do it.
We're doing good now. You're a transformer. Hey. All right. Hey, how's it going there, Dev from 2005? Hey, how's it going? What's up? Good to have you here. We're building ourselves a transformer, a large language model.
Second Android app released to production. J bro, thanks to AI, good job. Keep on pumping out those those apps. Make it happen. Quinn desktop lets you clone your voice and training process consists of saying sentences.
Oo. And then the text is needed to read contained slurs. Oh, wait. I don't know about that part. I haven't heard about that. Nice breakdown, Sergio. Thank you.
That's great to hear. Hey, Song 1865.
How's it going? It's a pleasure to join the stream. Good to have you here. AI is an it. Yes, agreed. Agreed. It's an it.
Blood scrapers, transformer models. What don't we do in here? Yeah, exactly.
We're going to do everything. We're going to do all the hard stuff. We are.
That's the plan. We've got the plan for the futures, you guys. We got the plans for the futures.
All right. So, enter size one.
And then we've got our output. I believe this is the one we want, right? Yeah.
There we go. All right. Did we get it?
It's an it. It is, Robbie. It's an it.
Yep. Mhm.
Oh, we got a problem. Okay. That seems like a problem. All right. We'll figure it out. We'll figure it out. Can you teach us kernel crypto, please? Kernel crypto. I can teach you consensus algorithms like like the Bitcoin one, right? We could do that one. That's a good one. What is the kernel part of it?
I What is the is the kernel part of the consensus?
I actually don't know that is either good or bad when I reckon recently seen more and more of that apps and websites are just less and less refined. They have like small lags and monitoring. Yes.
Uh-huh. Because it's so easy to generate this code now. And so we're creating these really inefficient systems.
Kernel cryptor. Hm. Right. The infamous like what is the kernel part?
Is it just weird thinking that AI is a person? It is just weird. It is. Right.
It is, Robbie, cuz it's it's just a a machine that goes in a loop.
The cryptographic. Oh, Bonzupi. Now it makes sense. Okay. Yes.
Okay.
So you're talking about just cryptography.
AI is Cody. Oh, right. JPro. I remember when it was Cody for a while, right? It was the subsystems that depend on it.
Bonzupi. Got it. All right. Now it makes sense. So you're talking about the the algorithms themselves that represent signatures in cryptography system that I've been banging your head on against for 3 weeks. Ah yes. Okay.
That there's a lot to that. There's a lot to it. Okay.
And where is my problem here? Something with I generated the mask. Hold on. Return question mask.
Question mask.
Hold on a second. Question m. There we go. Let's try that. Okay. Oh, that seems like that should work.
It seems like it should work to me.
uh formerly refined by intens intensively testing and just manually programming in general where they had to intensively think about what they want to do not just vaguely with a prompt.
That sounds like Wait, I missed that. Formally refined by in Wait, Torvo, what are you talking about?
Uh, my colleagues replaced my g your generated code with human written code.
Did they? Wait, what? Maybe assisted by chat GPD code and it broke on launch today. Oh, wait. You told them not to go for the approach and they went for it.
They just edited the AI code by hand.
Why' they do that? John Madden, I think we have brain damage now. I'm feeling like a little bit.
Remember the good old days when I was when I was saying I remembered. I remember the good old days, you guys.
The good old days. So, this looks like a perfectly fine mask to me, right? So, why no like here? How about we do uh just the target mask and we get rid of the question mask?
Let's try that. Okay, so that's a problem. Let's do it the other way around.
Okay.
Okay. Is causal is causal. The shape of the 2D attention mask is but should be a correct 2D size. The shape of the 2D attention mask is 128 128 but should be 6x6.
Are you sure?
Are you sure about that? I don't know.
Hold on.
Let's do uh print answer mask dot size and then shape.
And then we do the same with the embedding here.
All right, let's try that.
See what we got.
Uh yeah, look at that right there. has nothing to do with a 6x6.
It's 128 x 128, which is a a great square.
H although it said Yeah, it's five by Oh, is it because we need it to be um let me think about that.
Let me think about that. Maybe it should shouldn't be one. It should be zero here.
Let's try that. See if that works. That might do the trick. All right. All right. So, it's a 5x5 now, which isn't correct. I think we'll have to think about that for a second.
Your new your new catchphrase is composed entirely of words that cannot be uttered here. Oh, hey, Bonzupi. All right, let's thank you for being cautious. Remember the good old days when I was saying yes. Oh, wait.
They didn't end even end. They didn't even edit it. Oh, wait. They completely redid the UI chasing the tech giants design language that we were not even close to. John Madden. Oh, they were reaching for something that they couldn't get to.
They were You even told them, John Madden, you even told them like the errors and stuff before AI coders had to intensely plan. Oh, and think about the features and design how they were just prompting an AI and launching it. Now, the prompts are a lot easier to deal with. Yes, just play with some prompts to refine it in 2 to 3 days. Now, you don't even have to do that anymore. You just do really small, tiny, quick, direct prompts and it's able to achieve its objective in a few minutes.
The multi-day prompts are not necessary anymore.
Oh. Uh, Bonzupi.
We do. We do. Yep. I see what you said there instead of the months it used to take.
Oh, got it. Okay.
Yeah, I know what you're describing. So, I do remember many years, like two, three years ago, you had to construct very large descriptions about how something was to be accomplished and be very specific about it. It's like a genie making a wish and the genie would grant your wish but with with a whole bunch of extra side effects that you didn't want to have to deal with. Now the good news is you don't need to worry about that so much because everything's faster, more direct, and has been fine-tuned with reinforcement learning to better fulfill a task.
Yeah. Yeah. And the AI refines leads to somewhat glitchy websites instead of the manual refinement of code. Oh, these AIs, you guys, we're still trying to get them good.
People are absolutely able to write very poor and unplanned code without the help of AI. Yeah, it's been happening for decades. Wait, more than decades.
Actually, no, decades counts. People need to learn how to use AI to write tests that create hard constraints that prevent regression. That's a good way to do it. Yes.
Top 1% engineer. Hey, how's it going?
Congratulations on moving forward. What should I focus on learning? Wait, have you since you're making since you're graduating, did you say uh AI? I would say like AI machine learning. Do you want to do that? That's what we're doing right now. That's what we're doing right now. Let's see. We need to figure out how to get my mask working.
Let's see here. Is my mask going to work? It worked. It worked. Hey, it worked. All right, that works for me.
Oh, perfect. All right. Good, good, good, good. Uh, it only took a little bit of a little bit. Just a smidgen.
Just a smidgen. All right, here we go.
Let's do this one. Let's hide that.
And we say return out, which we already have here.
How far are we on embedding? We did it, Peter Parkour. We did it. We got our embedding. Although, we we didn't do the the positional coding yet cuz I don't think we need it yet. Though, I will be adding it. I will be adding it.
I just wanted to do the mask.
So, I think I got the mask correctly.
Yes. There we go. That's it. We got it, you guys. We got it. Mask defined.
Okay. So each of these is a word.
Yeah. Each of these giant arrays that you see here is a word.
Cool. Okay.
Cool. All right. Oh, yeah. Before I started on this kernel project, I made an AI write a massive up user space test harness for full correctness and vulnerability probes. Nice. Yes, that's actually really worthwhile because that's going to allow the AI to fulfill your objective and maintain its constraints without exiting and doing some random things on the other side.
They get a result fast a week. You can pump out a website, but back in the day, you had to actively look for how to design stuff effectively.
It was a lot more work back then. And now we've got AIS that just make it so easy.
So if Yeah, exactly. Bonzupi, if there's any regression, it's in. You can make it. You can make it work. Okay. Uh I think I got everything I wanted there.
So we did the mask maybe. There we go.
Mask complete.
Linear output for target size token. We didn't even get the output working yet.
So, we need it to be tokens. How do we I need to think about that for a second.
Hey, how's it going there? Faux comp.
Faux comp, thank you for clicking the hype button. Welcome on in. Happy Tuesday.
Good to have you here. Okay.
I'm scarred by the prospect of using AI in lowle like that. Super impressed by the harnesses.
Wait, that Bonzupi wrote? Oo, Bonzupi.
Wait, you wrote some harnesses, Bonzupi?
Did I miss that? Wait, really? I feel like that is impressive.
That's actually really impressive cuz that's what has really sprung the world into a frenzy on AI, I would say. And the big spend in the harness, you guys.
It's all about the harness.
Wait, did I share my harnesses?
I I don't know. I If you did, I I missed it.
Okay, so we got our output there. I think I need to convert it into logits. I think I need to do some logits here to do logits for token output.
I think I need to do that. That's what I got to do next.
How do I translate that though? I don't remember.
We're going to have to figure that out.
Let's see here. Let me look at some notes.
Uh, I need to my forward pass.
Okay. I'm pretty sure I need to got to go, Luna. All right. Sounds good. Good to have you here. We'll be continuing our transformer model. Uh we've got another I've only got like maybe more about 10 more minutes and then I'm going to head out. However, we've got enough time to get I believe a potential sentence constructed.
We might be able to get a potential sentence. Maybe.
Yes. Please send me some malware.
Uh you just think the concept is cool.
Ah, yes. My m my harness is full of malware. Not going to lie. I don't think I should share it.
Uh, I was unexpected.
Okay, Stephen. I hate to break this to you, but you're constructing a sentence right now. I know, John Mitten. I see what Yep. It's We're speaking to you in words that end up being sentences. Like, how's it going there, Satium? Sautium, thank you for subscribing. Appreciate it. Join the right channel for software engineering. We build AI here is what we're doing. We're building AI. We're building a transformer right now. We will be doing stuff that's rather challenging. I like to do that. Some challenging hard tough things. Also, as you notice, we're coding by hand with our very own keyboard. We are not using any AI, though I do use a lot of AI when I'm off stream. Good to have you here, Lind.
Literally, Mau should only be on virtual machine. Ah, yes. That's a good way to do it. Keep things isolated.
Virtual machines, you guys. All right.
So, I need a linear layer. Did I already build it? I don't think I did.
Self.linear equals torch.n.linear.
And I need it to be the right shape for my transformers output.
So, let me understand how that looks here. I need to see it here. All right, print.
Here we go. Let's do the shape. Here we go. Okay, let's see what that shape looks like.
Okay, so it is a 5x1 128, which means that I believe how big is our dictionary? Um, how is that going to work if we're going to guess the next output token?
Right.
I think. Okay, here we go. Here we go. I need to think about this really quick. So, when I'm getting my target output, I'm how many I get? Five tokens, which is going to be the answer.
How big is my input? All right, target one, two, three, four, five. All right.
So, it says 128 by the length of the dictionary. I think that's what the answer is, right?
So, that's the dims. That's the dimensions right here. Okay.
Dim, uh, lang of dictionary.
Let's do that. All right. Let's try that.
And then we say out equals self.linear out.
Here we go. Okay, that should be it. I think we're good here.
All right. Uh, dimmed. Yes. Okay, there we go. Hey, there we go. Now we've got our logits. Look at that. We got our log probabilities. We can convert these into words. Now we can convert these into words. How do we want to do that? I suppose we just do argmax. Yeah. All right. So do output argmax.
Torch. Argmax, I think. Right. Let's see. Uh oh, wait.
No, we uh comma see I need to do comma dim equals 1.
Yes, there we go. So it thinks the answer is 33533.
Uh that's funny. So if we got the dictionary, it says all all is all all.
That's what it answered with. All all is all all. I love that. I love that.
Okay, so we got our argmax and then how do we want to do it? Oh, should we when we're running this through our optimizer, do we want to give it the arg max when we're calculating the loss?
I don't remember. So, we need our criterion and then we need our targets.
If we have our target is going to be I guess it depends on how we want to do this.
I guess I'll have to think about that for a bit. That might be something that we do tomorrow.
Okay.
So, let me convert the words into a sentence. That way we can see what's happening cuz I that's the that's like the one last thing that we got to do today. YouTube censored you for saying the name of the diet that consists contains the substring all wait did it did it yeah sometimes it does so I don't see sometimes uh it doesn't show up right so I only see your message here okay let me grab this window over real quick all right let's convert those into words I believe we can do that potentially with our dictionary. Maybe we can do a dictionary comprehension.
So, how do we want to do that? We say do item.
Can we do that? Is that going to do it?
No. Items. No, probably not. Items. We could say detach.
Detach.
Although, do I want to detach it? Wait, it still says it's tensor.
Uh the item can grab that. No, it's a 5 million. Can we make So that's a scaler. Okay, I don't want a scaler. Okay, we need to get the dictionary tokens.
How does Armac pick something? It's based on So here. So we're at this part.
All right, John Madden. We're at the end of the transformer here. So we did the beginning, the middle, and now we're doing the end part. the probabilities here.
So when our the logit the output from the model makes a decision about which are the most likely next token, right?
What's the most likely next word? It's going to assign a probability. Really, it's going to give you which number is bigger, right? You can do which number is bigger. And so we just need to capture the biggest number.
That's it. So that's what Argmax is.
It's going to find the biggest number.
Hey CSE Prince, hey, thank you for subscribing.
Appreciate it. Good to have you here.
You joined the right channel for software engineering.
And so what we're doing today, we are building ourselves a transformer. So we've got which of these words? So you can let me zoom in here. So for example, so you see this token is and, period, right? Those are those are tokens. and the probability uh to select them based on how big it is. So it looks like the biggest one is a period in this case. Actually there's some the biggest one is going to be the here we 93 902 901 98. Here we go. So see how it's a negative number. So this is the largest number here. So if we do an argmax, we're going to get the word and cuz it's going to grab us the biggest one. Typically though there is a little bit of fuzziness that they apply to large language models to make them more creative.
They app they grab the top five the top five and then they do a random chance to select across these top five.
So if you if you have a lower temperature which is just it's just a constant. You don't have to worry about that. It's going to select most likely the top word or it could potentially select a comma.
Peter Parker. Yeah, that makes sense.
Yeah. So that's that's what is implemented in the large language models for me. What I did was I always pick the top item. That's it. I always pick it.
That's what Argmax is doing here. And so you can see this is my dictionary output. So we've got five words and each one so this is one word right here and this word is representing all the possible words and so I picked the one that's the biggest. So it looks like this one right here is the largest. So 01 should be the output position one token one and that is what I need is the start that should be the start token right. So I go up to here and I say start of sequence. So that is the beginning.
Uh major in computer science. Do you major in all as well? Yeah, we sure did.
Ally, yes, we sure did. And if you did, how long after four years did you get your career? Uh I've been doing this for many years. Post college now. Many years. So almost about we're hitting 25 years. All right, let's give you a quick answer. Here we go. So those of you new here, I am Steven Blum, CTO at PubNub. We are a communications API that allows you to build in-app experiences for communication. You look at the top 10 apps in the app store today. They all have built-in app communication where you can send data and receive data across devices for things like tele medicine where you need to talk to a doctor or if you're ordering food with on demand delivery. You see the car moving towards you. That's my technology. You also do multiplayer games where you play with other people in an arena. That's my technology as well. There's also chat like what you're doing here on YouTube right now. You've got some chat going on. How many years of work as needed for this uh for Ally? I guess it depends on what you're trying to accomplish right at Alli. What are you What is your goal?
What is your goal? Jabbro. OMG. Okay. I know, right? It's crazy. So many years, you guys. 25 years in tech. It's a lot of years. All right. I I want to as the last thing today in my code, I'm going to convert it so that way I get the words as output and then I will have successfully at least gotten the full end to end transformer. We didn't train it yet. We're just having it make guesses about the tokens.
Let's see if we can make it happen, you guys.
Okay.
La token get status. Uh, let's see. RM RF anti-gravity. Don't want it. Okay, there we go. Rerun it to make sure we're good. I'm going to get commit. Get status. Get add get commit.
Full full transformer. Good. Need a few to-dos tomorrow.
Okay, good enough. push to the main branch.
You're trying to decide between computer science, engineering, but you're not sure if you want to set that as your career since if AI is the amount of work we need. Hey, good news. AI is just a tool. It is a tool that you use on computer to generate code, right? Or generate images. It's like it's think of like Photoshop. We've always had Photoshop.
It made it easier to build photos and digital artwork, right? Now we have AI which is just the next level of tooling.
We still need humans to solve problems.
The AI it's just there as a tool that we can use to implement and solve problems.
We need more engineers now than ever. I think we're going to see a large increase in the amount of engineers that we need. Definitely stay in the field.
Don't leave. Let's see here. So I want to go to Stephen. Here we go. Check this out. Here we go. On our YouTube, we've got a post that I can show you really quick that shows we are on our way upward in a trend with more jobs being posted in 2026.
So, we're on the right track right now.
We're going to see this increase. We think it's going to increase 15% every single year for more jobs being posted.
And now that we have AI that can generate more code than ever, we're being very productive. And so we're seeing just a lot more productivity and we need more engineers because it's now accessible to more people.
Anan good question over there. As a CTO of a startup been running things for 3 years starting to get some decent traction. Nice. Good to hear it. How would you say the role has changed for you over the years as CTO? Any advice?
Yes. So we when I of course it's in the early days you do everything right not just technical stuff but you also have to do sales you have to do marketing you have to do you know hiring you have to do HR you have to do uh you know what else whatever else there is right you're kind of doing everything in the early days how it's changed is now in the later years we have a team of engineers ers that manage and maintain and own the product going forward. So from my perspective, I used to contribute a lot of code like a lot of code. I still do a lot of code just it's just not in every single product that we have now right.
So we have so many teams they're all managing the products. They're doing a very good job and for me I just mostly test things out. I learn how things work. I try them. I see that it's doing good. Yeah, I learn about it. I learn more about the market. I learn how to talk with, you know, customers about the current market trends. And I need to stay up to speed and I need to learn as much as possible so that way I can at least communicate with the entire with all the different teams. So, it's a lot more communication is the answer.
Thank you for the info. I'm going to check your channel right now. Hey, sounds good, Ellie. Appreciate it.
All right, let's get our dictionary going here. Great stuff. Hey, Peter Parkour, thank you.
Okay. Uh la got our words. All right, so uh tokens tokens equals torch. Armax and then I need to iterate over those and generate a token. So let's do so for token in tokens.
Let's go. Print.
Oh, see words equals array. Ah, I've got a better idea. Here we go. Words equals token.
Much better. We'll do list comprehension.
Okay, list comprehension. Here we go.
Print words dot join. Wait. Uh, see, space join.
There we go. There we go. We're getting We're almost there, you guys. We're almost there. We almost have it. We're going to see our AI model actually talk to us for the first time. It's getting exciting.
Thanks for the response. You're welcome.
Anon. Good to have you here. It's getting exciting, you guys. Right here.
It's about to happen. Words token. All right. So, this is going to be the this can be the letters. This will be the numbers, rather. This will be the numbers. So, let's see here. Uh, run that real quick. Oh. Uh, let's see. Oh, right. Little bit of syntax there. Okay.
Words. Words. Words. Words. Yes. Words.
Words. Right there. Words.
Sequence item zero expected. Stir instance. Tensor found. Okay, let's see here. So, let's just print out the words really quick.
I believe we have to say item maybe.
Good to see you back. Hey, Martian coders. Welcome on in. Yep, we're we're back, you guys. All right, so we have to do item to get our scaler.
There we go. That's what I'm looking for. And the moment of truth. Dictionary right there. Wait. Oh, wait, wait, wait, wait. Uh, can I So, wait, wait, wait. I need I I don't have a way to actually extract the dictionary here. Wait, I need the actual dictionary, the vocab.
Can I say v Can I just change this word to vocab?
self.dictionary.normalize.
Here we go. Dictionary. I want to say vocab.
Can I do that?
Oh, let's see. Wait, wait, wait. No, no, no. Transformer. No, no, no, no, no, no.
All right. All right, I need to fix I need to figure out this line right here.
Missing bracket on the dictionary. Look up you added. Yep, we are. We're We're So here, uh, it's going to be something like this, except for this is incorrect.
This is the inverse.
I need to I need to build the inverse lookup. I don't have it yet.
I need to get I need to get the inverse lookup for my dictionary. So, I need a translator. I need a decoder, right? I need the decoder. I need to write the decoder first.
How about we have a decode function?
Defode self, comma, tokens, return tokens. All right, we need to do this here.
I'm going to take the output like this directly.
All right.
And then I want to say decode.
Decode. Here we go. Paste this in here.
Okay. So, we got tokens words.
All right. Let me return words. Okay.
So, this is what we need to fix. We need to fix this line right here. This line's broken.
Think of AI as an extra layer of distraction for translating human readable language into machine code.
Bonzupi, you're saying you're talking the truth now. You figured it out.
That's exactly what we're doing. The AI has got a brain. It's got a bunch of numbers and we're using it to make its own thinking process. We need to translate the input and output so that way humans can understand it.
We first add binary switches and punch cards. Yeah. And then we added assembly.
Then we add compilers and interpreters and transpilers. And now we have AI. You got it. Bonzupi, you're on to something here. You're on to something.
What's my background? Your favorite domain of software lower level stuff. I like network network coding. So like network coding like a C network engineering. That's what I've done.
That's what I've been doing for uh a decade and a half at my company here.
Doing all this here.
So that's been my most recent background. Full stack of course, right?
Data science, big data, analytics, research. I really like that stuff. I'm really into it. So I I do work with, you know, large query engines like Trino, um, Athena, you know, Presto. I like those.
More towards networking and clearly what you've a lot of success in. Yes. Uh-huh.
And I like the low-level stuff. It's a lot of fun. I also really started as a I wanted to be a game developer and then I found industry and business was a lot more profitable immediate. So then I got into it.
Hey, Martian Cers. Hey, do you have a GitHub repo? I want Yes, I do. Here you go. GitHub repo right here. You guys uh working on maybe you can give me some feedback. Absolutely. Sounds good.
basically paraphrasing Linus Torval here. Bonzupi, really? Really? No way.
It sounds so spot-on.
Awesome. Yes. Thank you. Thank you, Torva, for the Discord.
Let me pin your message really quick.
All right.
Thank you for the Discord link, Torva.
Okay. So, also those of you who are here, we do have a blocks hackathon, blocks.ai. Visit blocks.ai AI website.
You have a hackathon going on right now.
Started last week, ends in June 10th.
So, you've got a couple weeks. You got a couple weeks. Well, maybe less than that now because June 10th is coming up pretty quick. Actually, no, that's two weeks. That's exactly two weeks is.
Okay. So, I want to do my decoder here.
Let's do our decode.
This is where we need to fix it. I need to have the inverse dictionary.
Self dot the this is going to be the inverse. So let's say decoder decoder equals going to do a dictionary comprehension for KVN self.dictionary dot Can I do that? Hold on. Python. It's been a while since I've done this. All right. So a equal let's see yeah a equals dictionary one no a string colon one and then comma b one okay all right for let's see let's see if we do this k comma v for k comma v in a uh kv BK KV. Did you forget parenthesis?
Oh, all right. Yes, I did.
It needs to be a tupole.
Okay.
Dot value. No keys. I think we have to do the keys.
And then we just say K.
Then we say D K and then A K like that.
Hey, there we go. Okay. So this is the syntax we need here.
There we go.
Perfect.
Okay. This is our decoder.
All right. And a dictionary.
And then I need to invert it.
Yeah. Here we go. Inverted. Perfect. So our decoder is really easy. Perfect.
Decoder. Now I can use the decoder.
self. Decoder here.
Perfect. All right. I think we did it. I think we did it, you guys. This might be it. This might be it. Are is our AI going to talk to us now? Is it actually going to speak to us? I We might have done it. We might have done it. words equals really expect you answering chat actively. You're welcome. Thank you for the party poppers, you guys. You're celebrating early. We don't know if it's going to work yet or not. It might.
We'll definitely follow along. Sounds good. Appreciate it, Anan. Great to hear you. All right, you guys. Moment of truth. Here we go. So, I need my dictionary.
And I need my output.
Decode output. Thank for the party poppers, you guys. Output. That'll be my words. print words.
Did we do it? Did we do it? Uh, we might have done it. I think we have to say join first. Here we go. Here we go.
It We're We're right there. We're right there. Only one more keystroke away. All right. Here we go, you guys. See if we did it. Hey. Oh, no. We have an error.
We have an error. All right. So, I know where that I know what that's prompt.
That's That's easy. keys self dot. There we go. Oh, hey. Okay, so it needs to be uh self.dictionary again.
There we go. And I'm going to move this onto a new line cuz it's a little easier to read. There we'll Whoops. Whoops.
There we go. Like that. Okay. Secrets of the universe coming in. Jay, bro, it's going to happen. Here we go.
Oh no. Okay, we got another problem. We Oh, token token token. Okay. Uh token item a set object is not subscriptable.
Let's see. Oh, because because we needed our token. So, where was it? Uh token token decoder. And then we've got our decode here. Got our tokens.
We're token and tokens.
All right. So, we got a problem right here. We're almost there. We're almost there. Why it no likey that? Why not that? So words dictionary decode output we and then we we pass in argmax right it's an argmax so that part should be fine right tokens right there all right what I'm going to do is I'm going to return tokens for a sec for a quick little second that shouldn't fail uh except for right there yeah hide that for Okay. So, yeah, there's our tokens.
Looking good now. Yep. All right. Good.
Now, we just need to convert them to words. We just need to decode them.
We're right there.
Okay. All right. Perfect.
Decode. We're so close, you guys. We're so close.
So, why is this not working right here?
See here, we'll have to we'll have to split it up into pieces. Token do item set object is not subscriptable cuz it's wh why though why why it's it's right here. It was it was right here.
Is it still a tensor? It's still a tense. Well, it's not supposed to be a tensor when it's here though, right?
So, where's our decoder? Oh, okay. I need to see what the decoder is.
Return self. Decoder. I might have done this part wrong.
Okay.
Okay.
It is wrong.
Okay. So, the part that I'm doing wrong, I I just had I just had weird syntaxes. All There we go. That's what it was. That's what it was. We're typing too fast. We're typing too fast. Or I I just made a mistake is what I did. I I messed it up.
I messed it up. All right.
There we go. Hey, although our index needs to also include Shouldn't it Shouldn't Shouldn't it include Why is it Why is uh It's missing It's missing this. It's missing that part.
Why is it not including that?
Do we Hold on.
This that's a problem. H print dictionary.
There's There's something wrong right here.
Okay. I need to find that little Here we go. Yeah. See, it's right here. It's in that dictionary. Why is it not in the other one? What happened? You see the padding starter sequence and end sequence tokens are there, but it's not in the decoder.
That's not fair. Print debugs always wins, right? A non right. We see what's wrong now. We see what's wrong. It's just uh What happened? These are valid tokens. They need to be in there, too.
They've got to be in there, too.
So this part is uh see that's the dictionary and let let me print the decoder as well.
Here we go. Decoder.
All right. So it should be the inverse.
Uh let's see here.
Yeah, it's just missing. Although it's the same number.
It goes zero.
This is weird.
Hold on. Why am I seeing this weirdly?
Zero. Zero is not all. Zero should be padding. Hello is two.
Oh, all is zero. Oh, I made a mistake. I see where the I see where the mistake is. Ah, do you guys see it? You see it? I see. Okay.
Plus, so we need to say the length of this.
Here we go.
Whoops. Whoops. Self.dictionary.
Okay. I now now we should be good.
All right. Hey, there we go. We've got it. All right. All 12. We're there. We get it. All right. Secrets of the universe incoming. I could 100% figure this out in Typescript. Just not understanding Python quirks. Yeah, we got some very specific Python syntaxes here, right? All right, we did it, you guys. We did it. All right, now we should be able to get our words. I We're really close now. All right, print words. Yeah. Uh-huh. We're so close, you guys. We're so close. It's almost there.
Here we go. All right, let me hide that.
Okay, we're almost there. Hey, we did it. Look at that. Okay, it returned the tokens. It's speaking to us. We did it. We got our transformer.
Oh, all right. Now, we just got to train it. It talked to us. We asked it a question. It gave us an answer. What did it say? What did it say, you guys? It said It said pad data here. Pad here. It doesn't make sense. It's okay cuz we didn't train it yet, but it's talking to us. You're going to beat Elon Musk.
Well, not quite though. Not quite. I know. He's like a few a few years ahead of me right now. You got a storm rolling in. All right, Bonzupi. Thank you guys so much for joining. Approximately six frames per minute now. Oh, yeah. You got some uh frames dropped there. You want it data padded? Yep. It thinks it thinks it does. It thinks it knows.
All right, you guys. Thank you so much for joining today. We succeeded. We did it. We did it, you guys. Let me say get add get status get commit. We want to say uh made the llm speak back. All right. Push done. Success. Party poppers. When are you going to build cursor? Oh yeah. All right. Aish. I don't think we might build an IDE. Do I want to build an IDE? That would be a lot. That would be a lot. I don't even use an IDE. I use Vim. I guess Vim technically is an IDE. Thank you for the party poppers, you guys. All right, we succeeded. We did our objective. We achieved our goal. Appreciate everyone for joining in today. We'll be continuing tomorrow. We'll be starting to train the model with optimization.
We'll get the loss and the criterion and we will make the model start actually giving us real coherent responses, not just random words like this Python. It says hello. Start of sequence. starter sequence. Hello. See, it doesn't know what's going on. Sergio, thank you. Have a great day. It's been a pleasure, everyone. Hope you have the great rest of your day and see you guys tomorrow.
All right. Bye, everybody. Tell me your GPU. It is embedded GPU on MPS. So, this is the metal, right? This is Apple silicon. Cheers, Stephen. Thank you, Anan. All right. Bye, everybody. See you tomorrow.
Take care. Great stream. Thank you, guys. Thank you, John Madden. See you J bro. See you Torva. Sergio Ashish. Good to see you guys. Bye everybody.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











