Positional encoding is a mechanism that adds positional information to token embeddings in transformer models, since transformers process tokens in parallel without inherent sequential information. The implementation uses sine and cosine functions at different frequencies to create unique position vectors for each token position, which are then added to the token embeddings. This allows the model to understand the order of tokens in a sequence, which is essential for tasks like language understanding where word order matters.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
PyTorch Transformer: Part 7Added:
Oh, hey >> Sergio. Hey. How's it going, Sergio?
>> Hey, you made you made it in. You made it before I clicked the start button.
Wow, that was amazing. How's it going?
Happy Sunday. Hope you're having a great great weekend here. Uh, we I was just getting all things set up and you showed up before I even clicked the start button. Yeah, there's like a little start button right here. Well, it says end stream now, but it's there's a start stream. And you said you commented before I even started. That's great.
Good to see you. All right, I'm excited.
We're going to be doing some what is it?
Positional encoding today. We've finished out all the other transformer things that we needed to do. Now we're going to do positional encoding. PubNub CTO. You got it, Sergio. You got it.
Absolutely.
That's fantastic. All right. So, did we put in Yeah, we have a positional encoding.py file here. Hey, Dreamer.
Good to see you. It's been a it's been a little bit. Good to have you back.
Welcome on in. Hope you're doing good with your MVP.
Hope you're doing good over there. I'm really excited about this syntax right here, which is really neat. I think this is the step function. Like it describes how things are going to step, right?
Every other, every even, every odd, something like that. Apply cosine to odd. Yep. Right there. Yep. Odd and then even.
First like, hey, very good. Hey, Bonupi, what did I miss yesterday? We were successful in converting our transformer model up here at the top screen into batches. So we now we can do batch batch training which means that we can train a lot faster. We can train with our accelerator. Without batch training acceleration is not as worthwhile because of how long it takes to transfer the memory on system memory to GPU memory. That part is timeconuming. However, if you do it in a big burst and then you have the GPU spin up and do a lot of parallel work all at once, then you get enough benefit to actually use a GPU.
Creating a new MVP on solving major problems. That sounds great. Batching is crucial. It is. Batching is really good.
It also helps with regularization because when you do your loss calculation, typically you do it on a on a on the entire batch. You calculate the loss on the batch, which allows you to better generalize. So batching helps a lot. It helps a lot. Our current batch size is 10. We get 10 batch size right there. So if I run this right now, I will see some output here. Yep. And we're going to see the loss is dropping.
We see a lot of loss drop. So it's below one. Makes me happy. Basically, it prevents thrashing. Yeah. You in extremely oversimplified terms.
Thrashing. Yes. because you're just throwing throwing data from your system memory over to the GPU memory and that is timeconuming and then you get burp little quick little like if you're doing like a batch size of one and it's a really small matrix multiplication problem then it computes immediately on the GPU or the accelerator and then when you pipe that back to the CPU or you know the system memory you're all your latency is stuck so you're not getting any benefit so you're thrashing you're thrashing you got You got it.
All right. So, we're going to do positional encoding today. And that'll be the one major goal. And then we might start one of two things. We might start reading the jeepa or we will do transformer from scratch. We want to do one of the two things. I'm not sure which one specifically, but we will find out.
Actually, that's actually a really good idea. What should we what what would be better to do?
Probably ask this question here. All right. So, what's better? Do we want to what next?
What to start next? All right. So, we do the uh transformer transformer from scratch.
A GPT, right? Or Jeeper is pretty cool. or Jeepa. Yeah, exactly. And because of how Jeeppa is requiring a lot of data, we'll we'll see. We'll see. We'll see how far we get. Hey, Enkit, how's it going there? May I come in, sir? Absolutely.
Welcome on in. Happy Sunday. Good to have you here, Japa. Which is pretty neat. Yan Leon thinks LLMs are dead end.
OH, WELL, there's an there's we need another model. We need like the next model to go even bigger. The scaling factor.
Okay. Oh, too big. Too big. From scratch. The GPT/ LLM. There we go. Or Jeepa. Which one would be the best one to do? I don't know. We're going to find out.
Hey, Kunmore. Good to see you. Welcome on in. I am not aware about Jeppa architecture right now. I am implementing Bart encoder. You mean BERT? You mean BERT right at Kunoir Bert by D what is it? Um encoding it's like birectional encoding representation for transformer. Is that what that is? Bros here. What about you? It is 9:46.
9:46 a.m. in Seattle.
No, Bart from Facebook. Wait, that's a different thing. Wait, that's different.
Wait, wait, wait, wait, wait. What do you mean? I'm actually curious about that. Let's take a look. Let's take a look. Let's see. Bart from Facebook. You said Bart model Facebook.
Bart large.
Whoa, whoa, whoa, whoa, whoa. Wait.
Transformer encoder decoder. It's It's Bert, but Bart. How is it different? How is it different?
Let's zoom in on this, you guys. Let's zoom in on this. It sequence the sequence model. Isn't that what Bart is too? What's the BART model pre-trained English? It was introduced in the Bart dnoising sequence to sequence B training for neural natural language generation translation and comprehension transfer in yeah transformer encoder decoder. It's like BERT, right? So BERT like and auto reggressive. So what's the A stand for? Hey, SQ lighter. Good to see you. Welcome on in. Hope you had a good weekend so far. I hope your weekend was good. I hope cuz now you're starting the week, right? So, you're on you're getting the week started here pretty soon.
Bro, you create 68 patents. You're so genius. Wa. Thank you. Hey, you know what? It's actually not that hard. It's actually pretty easy to do patents. You just need to create some sort of business process that hasn't been invented yet, which sounds that sounds like it's rare, but it's not. You just create a new process and it's called a method patent. And the method patent will have a series of steps and requirements that create a value that create value and then you hand it off to you know a legal team that can help convert it into the right format and then you got a patent. That's it right there. YOU GOT A PATENT. Bert is for text understanding while Bart is for generating text. Oh, really? Hey, Nea. Hey, you guys are here. SQ lighter and Nea is here. You guys, it's a Bart Roberta, not Bert.
Bert is encoder only. Oh, hey Tova. Good to see you. Welcome on in. Hope you had a great weekend so far. We got Nea and SQ lighter here. I'm in class 10th, right? And in this age, can I really do it? Yes, Ankit, you will. Don't worry.
The all this AI technology that's coming out right now are just new tools. Just think of them as tools. They're tools that humans are using to achieve some sort of objective. They're just the new kind of tool. And same thing happened when when you know new programming languages came out that were higher level. They're like, "Oh, wow. Nobody's going to need to write assembly anymore." And they were right. Don't worry, we don't. or even before assembly there were punch cards like well I guess we don't need punch card machines anymore right and the next thing kept coming out the new revolutions it's fine we just got new tools now just got new tools now BART is particularly effective when fine-tuned from text generation summarization translation but also works well for compens compens comprehension tasks okay all right we might do a BART model that sounds Pretty neat. I mean, if we're writing our own, we might as well, right? Hey, Mark Lemon. Good to see you.
Welcome on in. Happy Sunday. We're still writing assembly, by the way. Yeah, we are. We're still writing assembly. And there are still humans writing code to this day. We We're one of them on you guys in chat as well. When it comes to patents, you have to either prove with what distinguishes you from other processes.
If it's too Yes, a good point. have a clear definition of what you are patenting. Yes, exactly. Yeah, Torva, you got it. You got it, Torva. If you want to make your own patent, you can do it right now. If you want to just have sort of like a unique business process method patent where it has a specific recipe and it clearly defines how things are different. For the most part, the last part isn't fully necessary close to the original transformer architecture. Oh, really? Okay, Kunoir.
Then more we know. Hey, how's it going?
Good to see you. Hope you're having a wonderful day. You, too. You, too. I'm doing pretty good over here. When did the stream start? Uh, about 10 minutes ago. You got here immediately. You got here immediately.
The easiest to do would be the GPT style. A Bart style thing is actually not really popular today. Oh, really?
So, we should do GPT.
That's what Hey, Dena. Good to see you.
Welcome on in. Happy happy Sunday. We got our happy Sunday. There we got Derek and Nea and Kunar and SQ lighter. Bird is only an encoder. GPT has only decoder. Bart has both and is very close to a transformer. Okay, got it. GPT is only the decoder. Got it. So, if we were looking at this right here, and this this might be difficult for you to see. or to represent. I've got a question for you guys. All right. Help help me out here. Help me out here. I need your help. I need your help.
Where's the encoder and decoder?
Which one? What part of this is the decoder? Which one's the decoder?
I think I only see encoding and then decoding potentially in the output, right? Cuz you need to convert your logits to to right your tokens, right?
So that way you can convert them to words.
The left is the encoder. Oh, okay. So you encoder and you don't need the encoder. You just use the decoder which is the right. Is that how it works? All right. Hey, do you ever want to visit India? Oh yeah. And get I got really close. I've been I've been over to that side of the world. I have. I have not been to Asia. Have I been to Asia? I've been very close. I've been very close to Asia. Jeppa looks fun. Something have wanted to try out. Oh yeah, Derek. Yeah, I would think that would be that would be really awesome.
Left side is encoder for understanding.
Okay, so you've got left side is encoder and the right is the decoder. All right, so if we wanted to keep things simple, we just throw away, right? We throw away this other side. So let's So here's what we do. Here's what we do. We copy that.
Let me go over to our or here. Paste it in. And then you're saying that essentially we eliminate we eliminate this part right here. It's like get rid of this, right? So that's gone. No more of that, right? So this this part goes away. And then essentially the rest is going to be the decoder. And you only need this. Is that right? GPT only need decoder. Well, that sounds so much more simple, doesn't it?
All right, let's just do a GPT then.
Right side decoder and left side's encoder. All right, perfect. Oh, wait.
Yes, yes, yes. Got it. Derek, good to know.
Decoder and generate text. war.
Thank you guys. Appreciate it. For the classic decoder only model, we remove the second cross attention layer. Wait, let let me let's read that again. Let's read that again. GPT only needs the decoder. All right, perfect. For a classic decoder only, we remove that second cross attention layer. So, let me zoom in on this. The second cross attention layer. Uh you mean so the second what what's the second in this? What counts as the second?
So why are we just using AI for all these? Oh, Ankit, you mean writing our own writing our own code for AI? Well, because we're learning. This is a learning channel. We want to write everything ourselves and memorize it and become an expert. Of course, at that point, you could just have the AI do it all for you anyway. You don't have to learn anything. Do you really? I don't know. I feel like it's still valuable. I still enjoy it. So, this is fun.
Then we won't have to deal with cross attention. Oh. Oh, got it. Okay. So just reintroducing the key and value. Okay.
Hey codelen, thanks. Good to see you.
Welcome on in. Happy Sunday. Good to have you here. The layer that makes it from the encoder. The layer that takes from the encoder right here. Oh, okay. So you're saying we don't need this anymore.
We can just eliminate this.
This takes from the encoder, right? So we can also eliminate this. So we can get rid of that here. So we also get rid of this here is what you're saying. So then the only parts that are left will be this part here, right? And then this part up here.
So we're we're we've essentially simplified is what you're saying. We've simplified it.
Nice. There's some inputs from Okay.
There are some input go from encoder, right? Yes. That was what we removed.
Got it. Okay. Yeah. So, we eliminate this or I mean technically you're still stacking it. So, uh there's multiple layers of these attention mechanisms, right?
All right. Hey, Cardinal. Good to see you. Welcome on in. Happy Sunday. Hey, Stephen. Um Erdom. All right. I was texted you in LinkedIn and I said my final exam have started. All right. That sounds good. Very looking forward to attend next live streams. Have a good one. All right. Hey, we're going to be building our own from scratch GPT. We're going to be building our own from scratch GPT. I think we can do it. I think we can. You've been learning GraphQL. Hey, very nice. GraphQL typically uh let's see. GraphQL uh that is a format and a request format that allows you to send essentially J let's say JSON with empty fields to the server right so you submit like name uh and like a profession name and profession with empty with not filled in you submit that to the server the server finds what's missing and fills in the rest and then responds back is that GraphQL I think that is for actual doing DPD. There are a few other changes, but they're they aren't that big of a deal. All right, Neva, sounds good.
Decoder doesn't look back at the encoder and encoder outputs are injected into the cross attention layer of the of the decoder. Okay, got it. Yeah. So, this so we get to eliminate a lot of stuff here, which is great. So, we get rid of this.
We get rid of this and we just have feed forward. We just have a feed forward there. Well, not not full on feed forward. When the feed forward here are represented in linear layers, right? The multi-headed attention are QKV specific layers that are more projections or filters.
Right?
So, I think we got it. I think we got it. That'll be the plan.
A trend you would you would would see with the transformer is that they all take stuff from attention is all you need and cut and modify. Ah, got it.
Yeah. Okay. So, they take the original document, the original, which is what we've got here on screen and then they just they just make it their own. They make it their own.
I learned GraphQL to sell GPU subscriptions. Wait, what? Hold on.
Wait, wait, that doesn't make sense.
GraphQL isn't a GPU thing, right? Wait, to sell GPU subscriptions. So maybe if you're using GraphQL as a way to transmit data between the front end and back end, right? Hey, how's it going there? Hey Stephen, how are you doing today? We're doing good. Thank you for asking. Good to have you in here.
Jun uh Jun Aid uh Juned. I welcome on in Natik.
Hey, how's it going? Hey matchick, good to see you Stephen. I'm trying to brew accidentally hit enter. All right, you brow use no worse. What are you trying to do?
Cardinal, how does how does web search agent work? I cannot understand. How does it click the buttons? Oh, yeah. So, it does OCR. It finds out what's on the screen and it has a like what do you call it like coordinate system that's able to parse the screen and then it feeds that information into an LLM. The LLM will use tool calling to respond with what is the next action to take either type into a text field or press a button. Right? It's really simple. It's really simple. All right.
Computer use. Computer use. All right.
So, you've got a screen. Let's say, let me see here. I don't know. Let's do we have any examples? All right. So, let's say we've got this screen right here.
And then we want the AI to search for something, right? So, the AI will see the screen. It will convert it into a bunch of coordinates, right? It'll say like uh input box at 500 by or well in this case it's like 100 by 300 or something and then there is a link.get get ignore at 100 by 400, right? And then it it does OCR and then it feeds that information into an LLM using function calling. So, it's actually pretty easy. It's pretty straightforward.
And then you've got this LLM over here with function calling. Function calling, which just means that it's going to require a response that's in JSON format, right? Really easy. So function calling will say respond with next action based on the goal and there's usually some sort of goal that defined the LLM will respond. This is the LLM. It'll respond with instructions on how to proceed from there. And it says like for example, type type in search box like uh file. C or something whatever it's searching for.
Then press enter. Press enter. That's it. Does that kind of make sense? It's pretty straightforward. Hey, good night.
All right. Sounds good. Hey, how's it going there, Venant? Venge, good to see you. Welcome on in. Happy Sunday. Hope you're having a great weekend so far.
Some search engines to just extract from HTML source. Oh, that's a good idea, Derek. Yeah, you don't even need to do OCR. You just get the HTML.
That's amazing. Trying to make a mailbox cleaner. So, the plan is to load your mailbox using IMAP and extract the features. Hey, wait. What is What is TF IDF? I forgot about what that I remember hearing about that. What is that? What is this?
Term frequency inverse document frequency. Oh, so this is like an an index. You're creating an index. Natural language processing search algorithms. Yeah.
Measures how important a word is for a specific document. Nice. Matchick. That sounds like a good project. Got to keep us up to date. Let us know how it's going.
Would this project classify as an agentic AI since it's technically making modifications to your mailbox? Yes, I think so. Yeah, I I think so. Yeah, it it's it's there. It counts. You're back.
You're back. We're trying on our own AI personal data privacy laws for banks, governments, and agencies. Nice. Code lengthens.
Good to hear it. GraphQL.
It also got access to HTML DOM and press buttons to input data. Yes, code. Yes, that's right. You got it. What are you doing today? We're going to do positional encodings today. So, we've got help from y'all here on the stream.
You gave us a file that we're going to reimplement ourselves in our own we This is where we have to do it. I have a placeholder right here.
We're going to implement this. That's the plan for today. And then once we have positional encodings done and we train our model, we're going to maybe mess around with it and like play get have it learn some things. And then tomorrow we're going to start our own from scratch GPT. I think I think I have it all in my head. I feel I think we can do it. I think we can do it.
That's the plan for today.
Running in Sbert is probably a good way to get higher quality filter. What is Oh, Sbert.
There's so many Berts, you guys. I don't How are you supposed to keep up? I haven't. That's amazing. That's so cool.
Like a month ago, I didn't know anything about Thanks, Stephen. Hey, video you suggest was a banger. Hey, great to hear. Matchick, that's fantastic.
Amazing. Cardinal Erdom. Cardinal Erdom.
I thought it was inspect elements on website. I mean, right click, inspect, and take a look at the scripts. But I understand computer vision agent sees screen like uploading an image. Yes, it needs to see everything including images. And there's sometimes text in images, right? And so if an AI can't see inside the image, there's going to be a problem. Do you have GPU? I sure do. I sure do. It's a It's a It's not like Well, what do you say? It's a Mac. It's a Mac GPU, right? It's pretty good. It's a pretty good one.
It's been like almost a decade since Bert. So like it would I would hope I would advance the technology. Yeah, it's a good point. Yeah, that's fair. Never.
You bring a really good point.
Why don't you just fine-tune Quinn models? Oh, we did. Quinn, we did. We did that. We actually did it. We did it 100%. We did it a couple a week ago or two weeks ago now. We actually did that.
We fine-tuned. We used Laura fine-tuning. We built it. We built it and we trained it to learn new things.
Lots of Bert, lots of Bart. Then there's Roberta. Uh and and Deborah Dura. Oh, wait. There is Deberta. All right.
I didn't know.
Integrate GPU. You mean? Yes.
Integrated. It's an integrated Mac GPU, right? So, it's still accelerator and it's an allon one. It's a system on a chip. So everything is on one giant chip, right? It's a Mac. It's a Mac processor.
So we do, let's see. Can we see uh uh Mac M5 processor? What does that look like?
Images.
I want to see here. Here you go. Here you go. This is what we're looking at.
So you see like everything's in baked in on one one chip. So you've got like performance cores and GPU cores and memory and a whole bunch of other things. Everything's all baked in.
Quinn rocks. Yeah, we did. That was really great.
You know, DMA. Hey, upp.
What is DMA? What is DMA? DMA acronym. I might Oh, direct memory access.
I know of it. I know about it. Is it bypassing the virtual the virtual memory addresses? Is that what that is? Little bit of bypassing virtual memory. Does your company offer any intern job internships or job opportunities? Yes, perhaps remotely. Yes, we do. I'd like to gain experience. All right. Hey, Cardinal Erdom. Yes. So, I do have a page up here. We do have some open roles on under careers and I'll let you know on Discord if we have an internship opportunity. I'll let you know. I'll let you know. Hey, Donier. Good to see you.
Welcome on in. Happy Sunday. Good to have you here. DMA can mean two different stuff. I thought so. Can be an IO thing for how an IO device handles things. Oh, so it could be IO, not just direct memory access. Got it. Yeah. So, we've got software engineer for AI and solutions architect.
You know, acid property, acid transactions, right? Acid transactions, right? You've got autonomy, consistency, isolation, and durability.
You've been learning AI and machine learning because of you. No way. That's great to hear. Isn't it amazing? You essentially get to say, "Hey, computer, learn this for me and then do the work for me in the future."
Isn't that amazing? I think that's great.
Normal from also. Okay.
DMA can also be just you a physical addresses and not virtual. Steven said, "Hey." All right.
Rope. Hey. Yes, that's right. A linear positional encoding.
So, R O P encoding.
Let's see. Rope. Let's take a look at this really quick. Going beyond the math build intuition rope.
Rotary positional embedding. Oh, that sounds like it would be good. We're going to do sinosoidal positional encoding today. Maybe we could do some rope. Maybe. I don't know.
Maybe. Maybe.
What's this channel about, bro? Hey R.J.
Thank you for joining in. We are a software engineering channel. We are currently focusing on Transformers. Hey, you know the thing that's like chat GPT?
You've used the AI before, right? That's what we're building. We're building that here. We actually have our code up on the screen. We're ready to go. We are ready to go, you guys. We're learning all sorts of new things. Stephen, you're already doing rope. Oh, wait. Am I really? Wait, wait, wait. What? Really?
Wait. Oh, this is rope. Oh, okay.
Accept all. I didn't I didn't realize it. So, wait. This is the sign to cosign sign. Oh, it is. We already are doing rope. Okay. So that's the plan for today.
I didn't notice. Thank you for Thank you, Nea.
Uh, you done study biotech or with CS?
Wait, Btech. What is BTech?
What you done in study? Btech. CS. I' I've gone to college. I went for 3 years and then I dropped out because I wanted to start I wanted to I had business and I was doing business and college at the same time. And then I asked myself, easy question. Are you guys ready? Easy question. What would I do if I graduated? My answer was work on the business. All right. Well, here's the next question. What would you do if you dropped out? Well, I'd be working on my business. It was the same answer. Either way, I already had the credentials. I was already making money. Why not just do that? And I did that. That's what I did. Bachelor's of Technology. Ronnie, thank you. Thank you so much. Ronnie rocks. Appreciate it. Oh, also R.J., thank you. Bachelor's in technology.
Yes, we need to attend the original attention paper. And attention is all we need.
Yeah, we got it up right here. We got that paper. It's right here, you guys.
What is VTEC, though? Uh, wait, did you already say, the other option is GPG style, although they probably don't use it for GPD5 and stuff? Oh, because they're keeping it secret, right? Got some secrets there. You good person with knowledge. Hey, thank you, PPCO. G. I appreciate it. That's really nice for you to say. All right. So, let's get into some positional encoding. Now, I understand the straight I understand it.
Let me see if we can before we cheat too much and refer to the the the reference file that you all shared to me. Thank you. Let's see if I can write it myself.
All right. This is hard mode. All right.
So, we also we need a forward. So, defaf forward. We don't really need a forward, but we're going to have one.
And then let's see here. Uh, where's it at? Wait, wait, wait. Yep. There we go.
Okay.
Uu, redo. Okay. And then we're going to do super init.
There we go. Save that. All right. Your positional coding is not rope. Oh. All right. I was wondering. I was wondering.
I'm like, I think we're just doing normal sinosoidal positional encoding. I thought that's what we were doing. Our man here is a real scientist. I dropped out. I've been an intern since the dawn of time. Hey, code lengthens.
You're moving. You're moving. You're moving. All you got to do is keep on going. Can this help with a manufacturing business? AI. Absolutely.
AI is heavily assisting in manufacturing, right? Optimization. It's all about optimization and prediction. What's really important is inventory prediction because you have component management systems. These component management systems track your needed inventory for building parts, right? So you when you pick components, you need to make sure that there's some components in the component bin to pick.
So you need to be able to order ahead of time. Oh, plastic injection molding.
I've seen that. It's really neat, right?
What type of business, program, developer, any other? What what you is my business? All right, I'll let you guys know. Here we go. All right, so those of you who are new here, my name is Steven Blum. I'm the CTO at PubNub.
Started this company more than a decade ago now. We raised 134 million series E business. A billion devices on my network today. I've got a billion devices. 68 patent claims that have been granted already. Been at this for 25 years. If you look, I've got this is my company here, PubNub. We provide in-app communication. So, you look at all the apps today on the app store, right? Top 10 apps in the app store today all have built-in app communication. This is what we provide. We provide in-app communication. Things like chat, multi-user experiences. We can walk through some of them here, right? We've got sports and media, live updates of scores, gaming. You ever play a game on the internet? Well, you might have used my technology to do it.
Tele medicine, digital healthcare, live auctions and bidding. When you place a bid on an item that can update someone else's screen in real time, fintech, when you see securities and quotes update on the screen, right? The the market changes, the price, and you've got, of course, health uh chat. You've got chat, right? And you can order and delivery and things like that.
Plus 1 billion devices. Uh yeah, plus at least one. Hey, Ronnie, that's cool, man. You got Yeah, better start with sinosoidal for now. Okay, Kunwar, we will start with sinosoidal. Thank you.
Just found your channel today. Nice.
Thank you. Good channel where I can learn. All right, let's learn together.
We're going to be building AI. That's what we're doing now. We're building AI.
Had 100 million in wood, shower, and gold. Yeah, I know, right? The bank account is full of money.
That means you can you put your system in her systems. Wait, what is that? ERP.
ERP, right? Is what you meant? Uh, electronic. Well, I know. EMR, electronic medical record. What is ERP system? What is this? Enterprise resource planning. Oh, right. Yes. Yes.
Yes. Yes. Mhm. ERP. Yes. That's it.
You're far beyond my thoughts. Oh, you PPC. Hey, we're we're still learning.
You know what? Every single day got to keep learning. There's too much knowledge in the world to learn. So, the best way to at least get as much as possible is to keep learning something new every single day. You guys are helping me with that. As much as as much as you get. If you get anything from this stream, at least know that I am having a good time learning as well. I'm learning as well.
Hire me now. Oh, yeah. For your AI role.
Yeah. Kunwart, you'll never regret it.
All right. Well, we're still we're still looking. We're still looking. Just uh go to our do our Discord and I will paste this link again really quick on the link share. There you go. Right there on the link share.
Are you updating those companies with AI inference now, Ronnie? Absolutely.
Absolutely we are. You mean with specifically? I guess it depends on what companies you're describing. However, you can see some of the businesses that we list here. Some of them that we list.
Which language do you know? Hey, good question. All right, let's do a quick little language list. All right, so I'll give you my favorites. Rust, Python, JavaScript/Typescript, and then I like this is a DSL so it doesn't really count. Then I use SQL.
These are what I use on the most regular basis. followed by Golang and where else? What do I use less frequently? I don't use Scala often. And there's also clo, which sounds weird, don't we? It's a bunch of parenthesis.
And uh I guess as assembly, we know that a smidge. We don't we don't use it often. We don't use it often. What else do we do? I guess some other things like Ruby, PHP.
Uh I guess C. Oh, wait, wait, wait, wait, wait, wait. I got to go up here.
Languages. I like C. I like C and C++. I would throw down here. I'm fine with it.
I much prefer C. I want simple. I like simple. Yeah, video games. Good to see you. Welcome on back. Hope you're doing good with your magazine. What did you guys do for Swiggy? So, when you order food, when you order food, let's see.
Where are we? Here. Let's see. Reload.
Swiggy. Here we go. Right here. When you order food, you'll see updates on your on your map where the where the the transportation driver is, right? You'll see that on a map on on on a map. So, you'll be able to track that. That update comes from us. That comes from us. Also, if you chat with your driver or your delivery driver, right? That comes from us.
Yes. Yeah. There you go. I also have SQL. Nice. Simple. Yep. I know, right?
Truncate. Yep. Exactly. Truncate table.
That's very impressive. Thank you, R.J.
I appreciate it. So, good to have you guys here. Yeah. Uh, so I like these are my these are my top favorite languages here. And then I've got some other languages that I was I'd throw on the list. I know I'm missing some things.
These are just, you know, the regulars up here. And then everything else down here. Obviously, I know something that starts with the letter J. Oh, C#. I forgot to add that one. I really like C#. That's a good one. I don't use it often, but I like that one a lot.
There's another one that starts with J that we don't mention here. It's It's a bad word. It's a bad word, you guys.
It's a bad word.
All right. Do you guys know what J is?
What does J stand for?
I bet you do. I bet you know.
What does this stand for right here?
Hey, Drawing Brothers. Good to see you.
Welcome on in. Happy Sunday. Hope you have a great great weekend so far. We're going to do positional encodings today.
Let's see if I can get things understanding here. All right. So, I'm going to Here we go. Comment that out. Okay.
Let's get Let's get a different Let's go.
Here we go. There we go. Changing up the music.
Okay. So what do we need here? We need I know there is a register buffer. We can use PE and then we pass in the PE whatever that it's going to be. So we say self.register buffer. Okay.
And then I know we need a what is it?
We need to create an odd and even cosine and sign. So odd which is going to be uh cosine and I guess even which will be sign.
There we go. Cosine it.
There we go. Cosign food delivery app should have an online customer chat.
Yeah they do. They do Ronnie so that they can get discussed about the food.
Yeah I want to talk about food you guys.
Oh, PHP, HTML, JavaScript, Python, SQL, C, C++, Java. Ah, Cascading Stylesheets. I forgot that counts. It's a DSL. It counts.
Good knowledge set. That is necessary foundation that you need to be a successful engineer in the industry.
All right, Stephen. I have to go sleep right now. All right, sounds good. What time do you start stream? Um, usually around 9. However, I am going to start a little bit earlier. I'm going to start see if I can get going a little bit more earlier. I'm going to try to go live when I get up as much as possible.
However, it depends on the morning and my schedule. Otherwise, I'll go live at 9:30. If I can't go live at 9:30, then I'll go live at like 11:00, which will be really late.
Do you know the data structure? I sure do. DSA, data structures and algorithms.
How PubNub works. Drawing brothers. Oh, you asked the right question. You sure did. All right, let's go to this here.
So, let's go platform where we are.
Let's see. I think I wonder if I can here. Here I maybe I'll just draw it.
Let's draw it. We'll do a quick little draw. See how does PubNub work. All right. How PubNub works. Well, I'll show you our core messaging bus. We've got a lot of APIs for data. We got a lot of APIs for data. 9:30. What time zone? Uh, it is 10:20 here and we are in Pacific Daylight Time in Seattle, Washington.
Hey, how's it going there, Pillar? Good to see you, Pillara. Good. Welcome on in. All right, you guys asked the question. We're going to do some engineering here. A little bit of engineering. All right, how PubM works.
So, you've got your device here, which is going to be a phone, right? You've got your phone. Let's see if I can move this a little bit bigger here. So, say you've got your phone and then we've got PubNub network. And this is PubNub, right? PubNub. What's inside of it, though? How does it work? We've got a lot of things. We've got a lot of things in here. So, we're going to create a stack, a layer stack here. So, let me copy and paste these things here. No, no, no, no. Yep. Paste. Paste. What do we got this? So, starts off with DNS geo. connect to the closest region, right? Because we have data centers uh all over the planet. So this is the Earth, right? We've got data centers everywhere. We run on all of Amazon's availability zones. So basically data centers everywhere. You connect to the closest data center. Each data center has all these things. So that's god DNS.
That's that's stage one, right?
So we got stage one here. Uh, you know, actu Wait, actu What is that? Actuator means actuator. Actuator.
No.
Hey, Lion Labs. Look at how long it took YouTube to add functionality to the live chat, like the plus one poll counting.
Oh, hey. I'm actually I'm not sure. I'm not sure. Wait, what do you mean one plus zero poll counter? What is that? Lions for lambs. What is that? I'm curious. Do you use React or Angular? All right.
React or Angular. Good frameworks, right? You got Vue.js. Spelt. Yes, we use React. We use React. You guys, can you teach hardware part two? I uh secular. Hey, secular content. Welcome on in. Good to have you here. Happy Sunday. We could do some hardware discussions as well. Yep. I've got a general understanding about the frequency changes and how electrons in the trans transistors and how those can be combined to create logic gates and be combined to create you know different math operations. So maybe you can do a little bit of that too. Let's see what else we got here.
GODS and we've got uh applica well look like network load balance NLBs right network net load balancer right we got some network load balancers then we've got application load balancer this is going to route to the most the best application network load balancers route to the best application balancer so AB application balancer followed by our what do you call I call this our we we call it the proxy but it doesn't make sense to call it proxy. It's more of a mesh router mesh router to connect to the best endpoint. And then we have our our broadcast broadcast system we call the subscription system right our subscription system. Then it goes a step further and once once a device has been connected across all these layers, then we have a way to send data, right? A way to send data in. And so there's another stack over here. A little simple stack here. Let's do uh let me see. Can I move this? Here we go. Simple simple stack.
So when you send data into the system, it also goes through net load balancer application and mesh. we have sort of like a a publish replication layer and then we have a broadcast replication layer and then we connect in to the main the main broadcast layer from there. So it kind of goes from this route one here and then it goes over here number five over here to broadcast layer and then data sends is being sent back to the phone at that point. So if you want to send data you publish it right this is the publish side publish and then if you want to subscribe to data you subscribe on the other side over here this is the subscribe level subscribe. So that's publish and subscribe, send and receive. Super easy.
At this rate, we won't have any transformers today. I know, Dina. Don't worry. We will. We will. We're actually we're we're we're good now. We're good now. We will get we'll get to the transformer. We will. We will.
What is actuator? Or you learn new knowledge from where?
Oh, wait. Actually, I don't know. I don't know. I don't get that one.
Uh, search what actuator is. All right, we will search it real quick.
You can ask a question and have the chat type zero. OH.
OH, YEAH. All right. Lions for lambs.
All right, I got you now. Yes. Yes. Yes.
And it dynamically populates the poll at the same time. Really?
Really? Maybe we can try it, you guys.
Let's try it. Maybe we should try it.
Okay, let's try. All right, let me let me close the poll. Let me close the poll.
So, we know what the answer is. We're going to be doing Transformers on Monday from scratch.
Okay, which is what chat GPD is, right?
And then, all right, so what's a really good one or zero question?
If I use PubNub for an agent, what capabilities would it give it? The ability to transmit information, right?
You can you can transmit information.
You can do push notifications. You can do uh data delivery for eventdriven systems, right? You can also add support for chat. So you can talk with your agent if you wanted to. And you don't have to use any third party, right? You don't have to use telegram. You don't have to use WhatsApp. You could use your own. It's basically your own custom messaging system.
Transformers from scratch. No libs. Oo.
Hey, Daler. We are Well, I mean, we could we could uh So, we could we could I don't want to have I Yeah. So, here's what we're going to do. Here's what we're going to do. We're going to use PyTorch. We're just not going to use directly the transformer layer, right?
So, this is what we're doing right now.
We're using a transformer layer. We're going to write that ourselves. That's what we're going to do. We're going to write it ourselves.
That's the plan.
What's the security level? What does it improve?
If this is your architecture, you must be rich. Send me mill. Oh, hey, that is Yeah, we're I forgot even the security layers. So, there's a whole layer of security around this. So, around the entire thing, there's like a whole layer of security. We call it our authorization layer. It's a subordinate request that checks that you have the correct cryptographically signed.
Essentially, it's like a a web token.
It's a JWT except for instead of using JSON, we use cabore. So, it's a CWT.
This is our CWT uh JSON uh O token essentially, right? So, we validate through security. So, we have an extra layer of security around this.
I always forget the I always forget the security part, but we got it. We got it. developing a tool in automating sensor driven generation from data sets. Oo jea. Yeah.
Okay. Where we left off here you guys.
Awesome serial. It's pretty good. Oh, like websockets. Yes. Exactly. Drawing brothers. Exactly. Send data in real time. But the what does that try to solve and why should it use when you need multi-user communication? Right.
got multi-user communication. Or if you want to communicate with a device that's on your local network outside of your network. So if you've got your phone and you want to connect to your laptop, you use PubNub because you don't need to expose any ports on your system. You don't need to use any of the DMZ. You don't need to use NAT traversal. You just use PubNub and you can communicate directly outside of your network to your home network. So it's like a it's like a layer over the internet. You can think of it. And you don't if your IP address changes that's perfectly fine. You don't need to worry about that. So we take care of that for you.
Cipher text. Yes, exactly. Cipher text automate sensor dri generation from data sets hardware level. It's an agentic system for driver generation for any type of sensor. Whoa. Secular. That sounds pretty interesting.
All right, let me see. Did I miss any messages? Hey, if you guys let me know if I miss any of your messages, I'll get back to him. That is decent. Nice. Jeppa is for the world model. Yes, exactly. Do Yes. Jeepa is world model, understanding of the world around us.
Okay, let's get to our positional encoding. Let's do this. All right, so today we're going to do positional encoding. I'm going to see if I can implement it myself. And then we're going to refer to a reference, which is right here. This is the reference file.
If we need it, we'll pull it up. And I think we will. I think we will.
So, torch signs sign. I think that's a function. Python.
Let's go. Import torch. Torch dots sign. Is it function?
Oh, it's not. Maybe it's under n sign. Is it dotfunctional?
No. Where where's sign at?
Sign sin. It's called sin.
Okay.
All right.
Pubnub. What is the nub? And how is it different from subscriber? Oh. Oh, good question. Definity. Definity. You have asked a good question. All right. So, pub nub is sort of just it's just one separate. It's just one word different.
So, watch this. I mean, you're going to recognize it immediately, right? pub pub sub. All right. So, as you can see here, there's just one letter different, right? And so, from a business marketing standpoint, the word PubNub didn't exist on the internet. And so, it was a really good SEO optimization. That's what we did. We did search engine optimization because if you search for PubSub, uh, that domain was already taken. That's what I'd say. The domain was already taken. Neit, good to see you. Welcome on in. Happy Sunday.
No tunnel like Enrock. Exactly. You don't need to do it. Exactly. Just use PubNub. Precisely. It's kind of like a WebRTC works. Yep. Uh-huh. Uh-huh. You got it. Exactly.
Yes. Exactly. You PC. That's a very common That's very common. I We see that very common. Hello. Hello. Hello. Hello.
Peace Lord. Good to see you. Welcome on in. Stephen, can you tell me what s g what? So g this is wait.
Nea what? What sogg this is? Like so paneer. You're probably that's probably not you. You probably are not talking about that. What is s?
Happy Sunday. All right. How is the transformer going? We did it. We did it.
All right. We did the batching. We got batching working. And now we're going to positional encoding. And then once we finish that, which is today's objective, we will build our own transformer from scratch. We will be doing it the GPT style and we will be building the decoder. We'll be building the decoder.
If you do the encoder as well, it's just like another layer. It's not songs like the previous song. The kind of bad ass song. All right.
Marketing thing. Yeah, exactly. It was a marketing thing. Definity, you know it.
It was a marketing thing. All right. So, we're currently doing the positional encoding here.
Let's see if I can get it to work. All right. So, where are we here? Here we go. All right. So, torch sign. Let's see. Do I need to pass some input? One.
Must be a tensor. All right. Uh, torch.tensor. Tensor.
Uh, here. Wait, wait, wait, wait.
Here we go. Okay.
Two. Uh. Three. Uh. Four. Okay. Oh, it goes the negative. Oo. We're going to have such good positions. They're going to be so good. Mixture of experts. Hey, that would be a good one. GPT, generative pre-trained transformer with mixture of experts. What book is that?
Cornell. Wait, wait. Corn. No. Con.
Connor. Hey, Connor Nelson. All right, Connor. The book. Wait, this is just the uh archive paper, right? Thank you for the hearts. Appreciate it. You guys see this floaty heart icon? Memorize this stuff. Well, I have to memorize. I know enough of it to be able to know what to do the next step, right?
So, let's see here. OSI, right? You've got hardware, you've got the MAC, you've got um does it go does it go IP? Maybe it goes IP. Then you've got TCP IP or TCP UDP, right? Then you've got T uh secure TLS maybe, right? Security layer session. Then you've got application session.
Let's see how how far did I get? How far did I get?
One, two, three, four, five, six. And then you've got protocol.
Something like that. That's about That's about as far as I remember.
Thanks, Steve. Definity. Well, thank you. Good to have you here. What are the applications that PubNub ex uh wait that use PubNub? Oh, the applications. We got a lot of those, guys. We got quite a few. Where are we? We've got customers.
Here we go. Okay. So, if you've ever been on an experience where you see live video and chat next to it, that's what we do. If you've ever been in a collaboration experience where you drag and drop things.
Uh if you've played video games, right?
Quickly engaging interactive games.
Climate control with well not climate. Oh, okay. So, this one's neat. This is farming. So, agriculture.
That's a good one. More game. Oh, classroom multi-user classroom. So, you can see ex uh management. All right. This one mostly communication with parents and teachers.
See where your bus is currently located.
have live audio and reactions and being able to have other users join into the conversation.
Uh I don't reliable messaging. All right, so inapp messaging sports scores and updates. Oh, it keeps going. You could go forever.
There's so many.
Yes, very good. Presentation session.
Hey. Yeah, presentation and session. Oh, we did it. Yes.
Presentation in session. Yep. All right.
So, we we almost we got we got pretty close, right? We got pretty close, bro. How can I send data to my NAS which is connected via a tail scale fastest G GPT? Answers. Oh, wait. Hey, Jude Thomas. How's it going, Jude Thomas?
Welcome on in. Happy happy Sunday. So, you want to send data to your your network storage system. You've got an app. If you have an app, you can use PubNub to do this. You can use PubNub.
You will have the app subscribe to a data channel. You name the channel, call it your inbound channel, and then that will run on your NAS, right? So, you can do Python, JavaScript, Node. Those are good options, right? Good scriptors. And then if you want, you can write a web app and then you can publish data on your web app. So that way you can like say start play start a playlist, right?
Or play this video or download this file, right? You could you can remotely control it. So that will be running on your NAS and then you've got a web app on your phone with HTML and JavaScript.
Really easy. Just say build it with PubNub. The AI already knows how to do it.
Oh, physically data link. All right. So we've got physical physical.
You've got data which might be the MAC and you've got network network. You've got transport which is the protocol basically kind of like uh the the transport protocol troto. Then you've got session which is going to be usually like usually usually like so between these two right so you've got security presentation application which is typically going to be like HTTP here and then this will be like your protocol on on top of HTTP something like that. All right we're just doing OSI thing. Yeah, definity.
Yes. Application presentation session transport network data link Mac. Oh, wait. Wait. Data link and Mac is that high up and physical layers. Oh, wait, wait, wait. Okay. Yeah. Yeah. All right. I see. I see what you're doing. You started from the top to the bottom.
Very nice. Proxy server. Yeah, we can do some proxies, but the data is like a drive that needs to be updated.
So you could do a cron. Maybe it depends on what your data is trying. Are you updating this manually or do you need it to update on a regular basis? You can do something like a cron, right? You could do a cron sip. Oh, right. Sip for telephony. I forgot about that. Telephony.
All right. So this is going to generate a So I need a tensor here. I need a tensor that's going to be the dimensions. So we got our dimensions here. So I need to have a a linear a linear range.
So I need a linear range. So torch dot I think we can do a range. Can we do a range?
So I've got that around here somewhere in my batch. Here we go. Get batch.
So we could do rand. We don't want rand. I want it to be linear.
Okay, so we need an A range which is going to create our and this has to be the size of dims today. Bit locker got stuck in your drive. Oh, was it difficult to do? So if your Bit Locker is stuck, I mean that seems like a problem.
It seems like a problem to me.
What I mean is a NAS which is a different part from where I live. Hence I connect via tail scale. But what happens is I made a domain and send via cloudflare but it's really slow because of 100 megabyte. Oh, you want to transfer the data?
Okay. Well, I mean 100 100 megabytes you're really going to be at the whim of your upload speed, right?
So, are you trying to Is this like a backup?
I guess it really depends on what your ultimate goal is. So, you want to do a Google Drive but without the Google it's local network stuff but running in the network. Oh, you want to make your own Google Drive.
Oh, okay. I see what you're doing there. All right.
to upload upload data to your own you want to make your own Google Drive.
So I mean my recommendation is you definitely probably want to so you've got a domain and you've got an IP address and you can just transfer directly to your home system. The problem is your IP address will change on a regular basis. So you need to have a way to get your IP address of your current home system. Then you can set up maybe a DMZ or port forwarding, right?
So you can forward data to your port and then it allows you to upload to your own system.
So there are a lot of open stacks that will guide you on this. All right. So how to make my own Google Drive at home.
There's a lot of systems to do this.
Plugandplay NAS. these. All right.
Synology drive custom repurposing old PC.
Install your cloud software, which is what you need. So, there's next cloud and owncloud. Those might be better.
Those might be good choices.
Yes, it's backup. Exactly. Your own Google Drive. Private. Nice. Yep. So, there's already there's already off-the-shelf software that allows you to do that and you can access from anywhere. Yeah. See, I said DNS. You got your own DNS there for your home internet address. and then a reverse proxy or VPN set up so that you can connect securely like a tunnel.
A VPN will require a third party.
So yeah, so this is try I have not done this myself cuz I just use the cloud vendor myself. I use cloud vendors. Do nextcloud or own cloud mini PC with tail scale. Hey frog color tail scale.
What is this here? Uh connectivity for multicloud connectivity platform for devs. Zero trust identity across minutes deploys immediately.
All right.
Tail scale that should also work. So I'm not familiar with these. So I I've only given you the information that I I know of and I can give you recommendations.
I've not used them myself though.
Tunneling without cloud for can it help?
Yeah, you could without cloud tunneling with cloud flare.
It will make it easy, but it won't necessarily make it faster. I don't know. Actually, I don't know. I couldn't tell you for sure.
Could not tell you for sure.
So, good to have you guys here.
Uh, Kunar, good idea, though. Good idea.
I like that. Don't need to worry about ports and DMZ with tail scale. Oh, nice.
It creates a private network. All automatic. Hey, very nice. Which company's computer do you have? I have an Acer. I have a Mac. I have a Mac.
Virtual private network. You got it.
Provides native file transfer feature.
So, it sounds like you've already got Tails Scale. Tails scale is the goat.
The greatest of all time going, let's do web transmission tech.
Yeah, we could. We could do it. Physical voltage in and out of Ethernet controller. Data link Ethernet frames.
MAC address. Hey, definity. Yeah, you're speaking my language. Network does packet addressing aka IP addresses.
Transport. Yep. The data TCP and UDP format, right? Which is just the the head of the Ethernet frame. It's got like a header section. I mean you in your Ethernet frame you can define it however you want right in the TCP IP you've got your addresses your your 52 right source destination port source and destination port source and destination IP address and the protocol the 52 right TCP or UDP port you got your IP address IP address port and port so yeah and then you've got your data layer After the address, you also have flags to determine what the actual packet is for because it could be a sin, a sinac or an act. It could be a data or it could be a fin or a reset.
Hey loons, good to see you very late.
Oh, you're welcome on. Good to have you here. You had to finish revisiting your exams. All right, good to hear it tail scale. Even though I made a reverse proxy to overhead, the cloud flare descending is really slow because it's like a parallelism efficiency problem.
Oh, you want it to go faster. Okay.
Yeah, you're at the whim of your your ISP as far as I know.
And then with Cloudflare, whatever their limitations are as well. So, you've got two bottlenecks. You've got your home internet speed and you've got Cloudflare.
If you can do direct to your home, it'll be faster.
Some kind of system which have different instances which work in parallel together helps for LLM for good usage of GPU for networking send. Wait, yeah, there won't be any GPU involved. There won't and you won't have any large language model involved if you're just doing data transfer.
Hello, Steven. Oto, how's it going there Octo? Good to see you. Happy Sunday.
First time saw channel. You're awesome.
Well, thank you. That's very nice for you to say. Good to have you here.
Have you tried sync thing? I used it for a long time ago. Good recommendation.
If your IP address is changing, you can use a dynamic DNS resolver. DDNS. It updates the DNSA record for your web server IP address changes. Oh, definity.
There you go. Just do that and you're good to go. You don't even need Cloudflare anymore. You're good to go.
It's pretty direct. Nice. Tails scale also provides assigned IP so it will not change. Oh, bonus. Is it Is that a paid feature though? Today episode of Steven University is here. Yay. Yeah, we're doing it. We're going to be doing positional encoding, you guys. We got ourselves some positional encoding.
All right. So, I think this is going to work. All right. I want to do only my positions.
So, let's do this. Guess we'll say break.
Here we go. And then we're going to build some positional encodings here.
Cloudflare offers dynamic DNS loons. All right.
Bim or neoim using there. Hey, good eye.
Just uh Vim. We're using Vim and T-Mox and Ghosty, right? Ghosty plus T-Max plus BIM is the best.
IPv6 would end all the problems. It would because everyone gets their own IP address. Not only does everyone get their own IP address, every device because it's got something like 400 trillion addresses in the address space.
LAN LLC Ethernet token bus token ring oh broadband optical fiber no voice security Wi-Fi demand reverse cable modem pan wait upco you're saying a whole bunch of internet technologies there a whole a bunch of internet technologies where are we on our list here okay let's go back up here Okay. My name is Leba.
I'm from India. All right. Liba. Is it Laiba? Leba. Welcome on in. Good to have you here. Happy Sunday. Hope you're having a great weekend.
You need a kind of system for making sending better. It's like limiting the factor right now. It's the physical straw. So, we need more straw. But at last, we need a structured way, a new method of networking. Jude, I think we have a pretty good structure of networking. To get faster, you just need to get your ISP to give you more bandwidth. That's the ultimate bottleneck.
Remember when we did the IPv4, we said the exact same thing. 4 billion was enough. That's right. We were like, 4 billion was way more than we need. We don't need any more than 4 billion. And then it turns out we do.
We need way more. Love you. Well, thank you. That's really nice for you to say.
Appreciate it.
Which country? I am in Seattle, Washington, USA.
Do we went to 400 trillion? I know, right? Are we really? That's right.
That's less than a thousand devices per person. That seems too low, right? I know. It only less a thousand devices per person. It's way It's not even nearly enough.
It really is. I know. I We're being uh sarcastic.
Yes, you exactly.
128 bit for IPv6, which gives you a lot of addresses.
You get more than you need. A lot more than you need. All right. So, where are we at here, you guys? Let's go into our positional encoding. I want to return this. We're going to build it step by step. Here we go.
And then if we need to refer.
So a range dims. Let's see here. Torch.
Aar range. Let's do like 10. Yeah, that's what we want. Perfect. This is exactly what we want. So if we're at 128, we're going to get 128. There we go. Looking good. We could also do line space, right? So we got a range and line space, which we might want to do line space, right? Thank you for the hearts.
You guys, do you know what the difference between line space and a range is? Line space is going to give you a linear float that I think we need instead. I think we need to do line space. Line space like that. Right.
Line. Maybe it's lind space.
There we go. Lind space.
We go zero comma 10 comma 1 10.
Oh, here we go. So what is this? Five.
All right. So this is the range and this is the number of elements. So 128 would be the number of elements and the range we could do what should our range be?
10. I think 10 is good.
So this will be our input. This will be our ranger. Here we go. So, IPv4, uniccast, multiccast, broadcast, IPv6, anycast, multiccast, or you can ICcast but not broadcast.
That's right. I remember that. I remember the the the promise of broadcast.
I remember that. Uh and we don't see it being used. I think it's mostly blocked by most IP most ISPs.
I remember that much cuz I wanted to use it. I wanted to, but they're like, "No, no, no. You can't use it." I'm like, "But I want to." No. Nope.
You can use it at home, but you cannot use it on the wide area network, you guys.
All right. So, this is our One second here. I want to do torchs sign.
See if this gets me what I want. Okay.
See how it goes to positive, negative to positive, negative. Can we do this with a five, right? Okay.
So, that's going to give us and we need to adjust this on a per token basis. So, how many tokens we have. So, we need max positions, right? So, we'll say uh max tokens equals we'll do like 5,000, right? And then we need to create 5,000 elements.
So, we've got we're going to do line space. Got a line space there.
And then for every single for every single token, we can increment this by one.
So, this will start at one. So we say current token. All right. So for for token in range max tokens.
Do we or oh wait wait wait. There's a better way to do this.
Oh, you just realized your ISP changed your IP address. Oh, it did. Yeah, they do that on a regular basis, right? It's crazy for you to think that some ISPs in the US want you to pay to have a static IP. They do. Yeah, they they they do.
They make you pay for one. They do.
They make you pay you guys. All right. I suppose we could do this. The only problem though is we're going to need to concatenate.
And there's a better way to do it. Like there's a better approach.
Uh-oh. All right, you guys. You mean static IPv4 is a very limited resource?
Yeah, that's true. It is, right? It's limited, so you have to pay for it. You got to pay for it.
The fact that you can even buy it is crazy. I know, right? It's cuz there's so few of them. I mean, there's a there's 4 billion, which is a lot still.
You need you need a home address.
Every human on the internet needs a home address. Okay.
My broccoli timer, my Echo, my Amazon Echo is telling me my broccoli timer is ready. So, I have to go get the broccoli out of the oven because you all know what happens to broccoli if you leave it in the oven for too long, right? 4 billion means we can't give it to everyone. Yeah, we can't. That's right. Cuz we got more people on the planet. So, that means only every other person. But then that means we also need them for servers and devices. And when you're out on the road, you need IP address there as well.
And we can never make more. Yep. They're gone. They're gone. There's a finite resource. You got it. What happens to broccoli when you leave it? It gets sul.
It tastes sulfury and we don't want that. So, I'll be right back. It only take about 30 seconds. I'm just going to pull broccoli out of the oven real quick. All right. BRB, you guys. BRB.
All right, we did it. Broccoli secured.
Are you learning about the internals of large language models? Yeah, we are.
Cornell. Yep, we got it. Uh-huh.
Connor. Connor.
Prof. Yep. Positional encoding. That's what we're doing right now. We're going to try to create this in our own mind.
We're going to try to create it in our mind. I'm not going to do any helps or tips unless you guys give them to me.
It's backseat driving is okay.
Text away. Chat away. You guys, if they did this in France, they would have a big backlash, I think. Oh, really?
Loons? Really?
All right. So, there's a line space here. So, I want to do a line space and then torch sign. There we go. So, this will be the dims 128. We got our max tokens. So the only thing is I don't know how to represent well actually technically we do this right. So wait wait wait wait wait wait wait no no no no no no no this is one here and then the current token needs to be here the current token. So that way I can right we want to we want we want to we want to stretch it out. It's got to be stretched out.
So this part works.
You know ASI? I know ASKI. Yep. A capital A is 65. Lowercase A is 97. You beat. Yeah. Uh-huh. It is. It is. Yep.
We got some fun asy codes in here.
If you accept the fact that peanuts are in fact peas and not nuts, then it makes perfect sense to boil them. Oh, right.
Do you eat anything other than broccoli, Stephen? Yes. I've got carrots. Uh peas.
got peas. I used to eat peanuts and cashews and almonds, but turns out they can create kidney stones.
Whoops. I didn't think that. And I was trying to gain weight, so I was like eating them constantly and I wasn't drinking enough water.
So, I was eating all those. And then I got kidney stones. How is my computer?
How is your computer? I don't know. Oh, wait. I don't know. I don't know. Oh, what do you mean? I I have no idea what your computer is. Wait, did you tell me earlier? I don't think you did.
Your computer knowledge is great. If you're talking about computer knowledge, okay, so there is a way to create a positional encoding array, a PyTorch array. I I don't know the syntax for it, though.
We need to we need to build it. So I don't remember how. I'm very curious about it. However, I do believe this is going to give us the correct output that we're looking for right here.
So let's say print to print uh here uh PE equals we'll say PE.
Okay, we'll just say return zero.
Let's try let's try running this Python transformer.
I should see a lot of output. Whoa, wait. Oh, I did. I never I've never executed it. Okay, so I need to create a positional encoding and I'm going to do that in my model. Where's my model at here? Where? Okay, so we will do self.positional positional equals positional encoding and we go dims like that. Okay, now if I run it, I should see a whole bunch of output. There we go. Look at that.
In array equals ah yeah, C++. I see what you did there. You know some C++ right there. Oh, hey Kunar. Torch a range sequence length long device input types.
Yes. All right. So, we're using light.
We're We're going a step ahead. We're We're going We're going We're making some changes. We're making some changes, you guys. Yep. So, I did start with a range kunoir. I did start with a range.
Then I realized I want to go even deeper. I want to go further. I want to go a little bit further. Index start with zero and excess. Uh with Oh, I did. Did you read a book or anything?
I'm looking for resources on these topics. Uh Connor, we referred to the internet. We have read positional encoding. I've read GBT.
I've read all the things. And so I'm trying to rebuild this with my mind. I'm going to do some struggling. Right. So I basically this this paper right here right here right here.
So we are using PE using this approach. So I'm trying to write this in I'm trying to write this formula but doing this with a different approach.
Hi Stephen. Hey Bros. Good to see you.
Welcome on in. Happy Sunday. We are currently building a positional encoder.
We're doing this pattern here except we're making it a little different.
Kind of work. Interesting. Yeah, that int array thing was C. It was C. Oh, it wasn't C++. It was C. So, I've never done an int. I've never instantiated an array like that before. I've never needed to. C++ you need standard array.
Got it. Okay.
From scratch by Sebastian was a great resource. Hey, Kunoir. Hey, very nice.
Yes. So, we are basically doing that.
That's what we're doing right now. Andre Kaparthy videos are good as well. Andre Kaparthy is like the paramount, right?
The top AI scientists on the planet as far as we know.
Time complexity in this page side up page upside the time complexity.
Well, the good news is it's a one-time computation, so we'll be pretty good here.
val C++ you would use either depending on setup it would be yeah it should be right because it C++ is a superset of C so anything you write in C it can be sit in C++ as well so there is a way to represent a tensor like this it's like a colon colon how do we do this here we Oh, all right. So, there's a way to do it. So, there is a P need to do uh something here, comma 1 col 2, which will give you the step. And then I forgot there's something here. I don't remember how to do it cuz I've never I don't usually do this. Maybe it's like colon. I think it's just colon. Maybe like that.
So then you do zero something like that. Maybe it's two. It would be it's one of these two. It's between that.
Why did you disable the channel points?
Oh, wait. I Oh, I can't do channel points until I'm a partner on Twitch. I I didn't mean to disable them. I just don't have access to them. I only YouTube. I'm only a partner on YouTube.
Uh part uh channel partner on YouTube.
So, that might answer the question for you.
Some stuff around implicit casting. Oh, yeah. I forgot about casting.
Even channel points. This sucks. I know, right? I know. Okay, so I don't know how to do this. Um, I might cheat for this. I might cheat.
Actually, we we might Here, let's cheat.
Just a Let's do a little a smidgen of cheating. Yeah, just a little bit of cheating.
All right, Google.
Uh, see torch. How do we How do we assign torch assign?
No, no, not that. Uh, paste. Yes.
Using this uh tensor annotation.
Oh, see I think I need to put that in quotes.
Here we go.
Okay.
Uh yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah. Okay. Oh. Oh, okay.
Got it. We figured we learn. We're learning now, you guys. We're learning now. I only wanted a quick little a quick little So, we've got our PE equals torch dot Z. We don't really need zeros, but we'll just go. Hey, Neva, do you remember what is it when you instantiate a a tensor and you don't have any of the data inside the tensor prepopulated?
It's like more efficient. We only need the structure of the tensor. We don't need any data to be pre-initialized.
Like we don't need zeros to be overwritten in it. Do you remember what that's called?
Empty. Thank you.
Hey, there we go. We got some empty.
Look at that. right there. All right.
Max tokens, dims. I think that will do it. I think that gives us the answer we're looking for. Print PE.shape.
We do return zero.
Let's make sure that we got the right shape there. 10. Sir, there we go.
Thank you. Appreciate it. Nea, how much time do you spend on one seat? You mean this chair?
Are you talking about this chair?
Probably like between 8 and 12 hours per day.
Does that sound about right? Yeah, it sounds about right.
I believe you should put a tupil in, but might be a variable. All right, got it.
Thank you. Python Python import torch. We'll we'll validate that really quick. Torch.
Okay. And then we say help. We learned this yesterday.
Size outdype. Okay. Size. Oh, so this must be a tupole. Okay. And then this is like a step or something. All right. So, let's try this. Torch.
One comma 1 uh two 2 by two. Okay. Nea. Thank you.
You're right.
Tupil, thank you, Nea. I appreciate it. This time has been so long. It's been a long It's got a lot of a lot of time in the chair, you guys. Yep. A lot of time in the chair.
Okay. So, let me confirm that this is the right shape I want, right? So, it's got to be 5,000 elements of 128 what I want. Python transformer.
Uh, where was that? Why is that a problem?
What? That should work perfectly fine.
Should return should return none, not int. Wait, that's the should return none. Oh, all right. Fine.
There we go. Sure.
Hey, there we go. All right. So, now that that's the right shape, we can use this annotation here.
So now we don't need to do any for loops.
I think we can say equals like this.
And then we can do the same on the bottom. But we will use cosine.
Is it cosine? Python import torch torch.
Cos Yeah, it is cosine. Okay, cool.
So we've got that. Now let's print out the PE.
Let's just directly print it.
See if it works. Okay. I'm enjoying your channel, but it has been a long time.
Oh, hey. I happy to have you here. Thank you for staying for so long. I appreciate it.
Transformer. Do we mess that up? Oh, we go. We don't need this anymore. Okay, here we go. Okay.
Uh oh. Oh. Wait, wait, wait, wait. Uh, dim max tokens. All right. So, how do we do this? We need the total length needs to be Oh, I see what I'm doing. Okay, that's why I did that.
All right. So, I've got a problem. This right here needs to be fixed. We got to fix that right there.
How do we do that? Hm.
Hey, Snot. Good to see you. Welcome on in. Happy Sunday. Good to have you here.
We're doing positional encoding. We're trying to do it from memory and not getting any help as much as possible. As much as possible. So, I do have a problem here, though. Got a current problem. This right here is the problem.
I think I think I might want to take a peek at our cheat.
I see issue. It's in Python. Hey, yeah, I know, right? That's the whole thing is broken. It's all in Python.
Is this side sound for us or is it happening to you, too?
Do you hear? Can you hear? Okay.
Everything looks good on my side. We're all green over here.
So, it can't be token. I want it to stretch out though.
So, I might just I might just do it differently.
I might do this different. I might instead of using this approach, I might go a little bit different here. What I want to do is see how it's being done in PE. All right. So, for/p There we go.
Uh PC_y. Here we go.
Oh, interesting.
Oh, wait. How does how does that work?
So, there's nothing dynamic in here.
This is all How does How does this work here? I see like there's got an eye right here.
Yes.
So, you've got this eye right here. How does this I'm not familiar with this strategy right here.
Also, it's interesting.
That is interesting. Okay.
H I'm trying to figure it out. I mean, I I suppose I could just copy and paste it, but then I'm not going to learn anything. Specifically, I'm trying to create a multi-dimensional tensor that has all the positional encoding applied across the dimensions for each token.
Hello. Hey. Ah, we're good, Lu. We're good. Can you hear us? Okay.
Testing. Yes. Okay.
Okay, where are we at here?
Uh, the only thing is that I don't understand. Obviously, this is going to be when you when were you able to just use the Google command? Oh, yeah. We made an alias. We made an alias command for Google. Yeah.
No. So, that was the audio working good then. Sergio, thank you for saying hi button over there. For memory, use make the tensor just TD. Then one for batch and it's projected.
Yeah. Yeah. Yeah. Yeah. I I that part that part's good. That part I get I get that part. I understand that part. See the thing is this right around here.
Position a range. So, I was using a range before position. OH, it makes sense now. Here we go. Okay, that is what we needed. Okay, it it's a lot clearer now. It's a lot clearer. So, you can put a multi-dimensional tensor inside of this. And then you take the position and then you've got essentially a full 2D matrix. And then you just apply sign across the entire thing.
Okay, I think I can do that. All right, I've got the idea now. I figured it out.
I see some slicing there. Yeah. Have you ever seen this kind of slicing? This kind of slicing. Pretty interesting.
Pretty interesting, right? So now we just need to apply.
So I was just going to do Yeah, the position. So we've got a line space, we've got empties, right? So we got empty, which is going to be this here. Let's see here. Yeah.
Ctrl +V.
Then I need to generate two dimensions in here. So not just a line space, I need two dimensions, right?
I'm pretty sure. I'm pretty sure. Let's double check here. So, you've got one line here, a single line, and then you've got it looks like another line here, which I'm going to do something different than this. I know we could do this. We're just going to do something different.
Bye, guys. Going to get the Norway chess tournament. Sounds good. Derek, good to have you here. Thank you for stopping on in.
Okay.
So, okay. Where are we at now?
Let me think about how to do this. All right. So, we're going to have a sign. I want some spaces there. And then I need to multiply it by yes, another line space.
So we could do two line spaces.
Could we do that?
Because I need So if we're going to get here, so our PE is already going to be the right shape.
Python transformer.
Right, right, right, right. We don't need this right now.
Here we go. Okay, so it's 5,000. four, five, up to 5,000 tokens, 128 dimensions.
I need each of these dimensions to be updated with assign value that's going to extend to 128. So, I can do a line space.
The thing is though, I need it to project across the entire array.
So, let me see here.
So, yeah, I think we could just do max tokens here again, right?
So, let me see what this looks like.
Um, we'll just say sign equals and then we'll print sign. Okay, print sign and sign.shape.
All right. Where are we at? All right.
So, it's not it's not the right. It's not enough. So, we need more.
We need an extra. So, it says 128.
Oh, right, right, right. Okay, got it.
So, with the line space, this is only going to So, we need we need another We need another one of these guys here is what we got to do.
I mean, there's a more efficient way to do it. I'm just trying to do it in a way where I'm going to make it unique in my own.
And this is kind of like how I'm thinking about it. So, maybe we can do um Oh, we could do a line space times once.
Okay, we'll do this times once.
once like that.
Okay.
So then this will be the line space. The only thing is though when I do it this way it's not going to scale.
You upgraded your Windows 11 Home Pro and totally didn't activate Windows with an exploit. Oh, okay. Yeah, I see what you're saying there. You didn't do it the Yep. Mhm. Everything's fine.
Wouldn't want to do anything being ironic. Yeah. Just for Windows sandboxing, by the way. Oh, yeah. All right. Hey, I mean, if you're just going to test things out, that seems like it's fair game. It seems like it's fair.
Seems perfectly fair to me. The only thing is this will not scale it, right?
This is not going to be a scale. So, what I'd prefer to do is just throw this into a for loop like I was going to do before. So for token in tok in range max tokens and then we're going to create we're going to do we're going to do this pattern here.
Okay. And this will be token token.
Okay.
There we go. And then I need to concatenate this. So, this is going to create me a 128, which I like. All good to go. And then it will scale it here.
No, not testing my main install. Get scanned Microsoft. All right. You're just going to straight up. Okay. I see where you Yeah. Okay.
I used to I in the early days many years ago. How's it going there, Mario? Shiny, welcome on in. Good to see you. Happy Sunday. We're currently doing positional encoding. I'm going to do it my way first and then we're going to stick with it. This is what we're going to do. How much time do you have given to learning programming? What is your age now? I'm in my 40s, so I'm I'm old. I've been doing this since I was 11. So that would be roughly what is it? 30 years. More than 30 years, right? So a little bit more than 30 years.
That would be worth of just practicing software.
I've been professionally doing this for uh getting paid actually making money and selling my software skills for about 25 years. So that gives you a hint at how old I am. It gives you a hint status. You got it exactly status, you guys.
So this will get me my torch. And then I will scale it here with my line space.
And then it still would be nice.
I you know what? I'm just going to go with this. All right. So, we don't need this. P. Can we do how to concatenate? All right. Let's see. Do concat tens tensor in for loop.
Quick little quick quick little look up on uh rand and append. So it's called app. Oh. Oh. And then we just create a new tensor afterward.
Yeah, I suppose we could do that.
Concatenate once along the specified dimension. So this will create a whole tensor. So we just create we create an empty list. All right.
PE equals empty list. And we'll just append it on our way. PE.append append if token mod 2 equals equals zero. Actually, we could just say if token mod 2 and we do else we do cosine.
Okay, that should do very similar to what I'm looking for.
only you work in your company. Uh, no.
Oh, wait. I I've got a I've got a lot of We've got a lot of people. Yeah. So, let me go into Let's see. I'll do company.
There's a company link here. Uh, we've got quite a few people at my company.
Let me see here. Where are we at here?
Pum. See? Let's do Let's see. There's people. Where's Where is it at? Is there people? People. Here we go.
Yeah, let's get about 100 according to LinkedIn. We got about 120 120 engineers. Well, 120 people, sorry.
Yeah, 120 people.
So, we got quite a few.
Yeah, about 120.
You Exactly. Exactly. Snot. 1 2 3.
Exactly. Of course not. Of course not.
Okay. So, I want to print. AI mode is fast. It takes like 15 seconds to get an answer from co-pilot on being. Oh, yeah.
Isn't I like AI mode in Google? Google.
That's why I like Google because it's really fast. Inaccurate. This is your company. Yes, it is. This is my company, you guys.
You got it.
See, I even got the shirt on to prove it.
I got the shirt. Also, I've got something in the background that I could share with you guys. For those of you who are here and there's those of you who stuck around, you get a special surprise. Those of you interesting. Oh, okay. Yeah, that's probably fine. Okay. Are you ready for the special surprise? Here we go. I'll show you.
Yes. Are you going too much ahead of my thoughts? All right. Special surprise ready to go you guys.
I don't know if you can see this. I don't know if you can hear me. Yeah. See right there.
Nobody in the background.
Can you see that? Okay. Here. I don't know. Let me see. Where is that? Let's see. Right over here. Okay. Where is the camera at? Right here. Okay.
See? Yeah. Yeah. I I You can recognize what that is, right?
Cool. I know, right? Well, it looks cool. I like it. Nice.
It's pretty big. It's pretty big.
Oh, no. It goes this way. Okay. Here we go. There. Now you can see more of it.
Okay.
Perfect. Perfect. Right there. All right. So, it's going to There we go.
There we go. Okay.
Is it going to stay up?
Okay, we're just going to leave it.
We're going to leave it here. We're going to leave it here for now. Okay.
All right. Those just for those of you who are staying around for that long, you get to see the little background there. That's in my That's in my office.
We'll leave it up there for now and I'll move it back after stream.
All right.
Okay, so here we go. Let's get back into our positional encoding. Yeah, I think we're getting really close here. So, let me print my transformer. Uh, it's called PE.
Uh, all right. A little bit of syntax there.
We're the Ooh. Oh, it's not a it's not a right. So, this is a whole bunch of tensors. Now, are they the shape that I want them to be?
That's my question.
That's my question. I think they are. I think they are. So, this needs to be PE equals tor uh torch.tensor.
I think we just do PE like that, right?
I think that's the answer right there.
Right. That should do the trick.
Can we do that?
Uh, why is that not happy? That should be good, right?
Maybe we just need to concat. Maybe we do need to do concatenation. All right, let's try it.
Where is the thing that we just had up there? Torch concatenate. It was around there somewhere.
Where we at? Where are we at?
It's around here.
No, here it is. Is this it? No. Is it? No, this is not it.
Hey, loons, it's your website. Had it up there.
This is GAN stim pietorch scaler.
If I want to work at your company, how much experience and what language should I know? Uh the languages that I recommend are so we've got I would say Rust you know C maybe Python Golang uh JavaScript and Typescript then I'd also recommend SQL and then data systems like Kubernetes you would have Prometheus although you don't really need to know Prometheus Prometheus you don't really need need to know it.
Uh it's a Kubernetes uh maybe some like like Docker stuff.
Uh Terraform.
What else? So these kinds of things here, the this is a good like foundation. That's a pretty good foundation right there. I like that.
These are pretty good options.
I think that's a good starting point.
There's probably a few other things that would be really helpful to learn. Maybe like like we do we also do React. So we got some React in there. Uh then like system design, right? So like distributed and parallel parallel compute parallel. There we go.
A little bit of parallel computing in there.
So yeah, these kinds of things. I think this is a good start. Also on our Discord, you guys. Hey, Scarlet Fire Time. Yep, you got it, Lun. It has to play at least once per day. At least once.
Yeah. So, I've got some po. Let's see here. Yeah. So, you scroll down to the bottom of the page and click on careers. And then you'll see we've got some positions open. Here we go. Few positions. So, we've got AI engineer and solutions architect. All of these two are currently open and they will be engineering level roles, right?
Corporate council that is legal and that's probably not what you want. You right. I'm pretty sure unless you want unless you want that.
Okay.
So, torch concat see Python torch got to go. All right, Lun sounds good. import torch torch.concat.
Is it also there's also a cat method, right?
There's two. Are they the same? They look the same. Are they the They're referring to the same object ID.
Torch dots sign.
Oh, no. They're all They're all Okay.
This the same either way. All right.
Very good. Nice. Hey, like it. All right. Good to hear.
You're going much uh ahead of my thoughts. Oh, nice. All right. Well, hey, there we go. There we go. Good to hear. Good to hear. All right, you guys.
So, where are we left here? Where do we want to go from here? I I want to concat torch concattor torch. We'll do help.
See what we got here.
Tensors the dimensions. Okay. Let's try it. Let's try it.
So, where are we? Here positional encoding PE equals concat.
We've got our tensors and then dim equals zero. Let's try that.
Okay.
See if it worked. Okay. Now, that's not quite what I was looking for. However, what if we did one? It's almost correct.
It's almost correct. Okay, we can only do zero. So, I'm going to need to reshape that. I need to reshape it.
Reshape.
I need it to be this dimension here. You know what would be very interesting? You know, this is fine. It's fine. It's fine. Fine. So, I'm going to do dims and tokens.
there. Let's try that. Let's see what what we get there. That is not the right. So, we just need to invert that dims.
Okay, there we go.
We did it. Although, this is a problem.
That's not correct.
OSI did it. You did it. Bon Zupi, congratulations. That sounds like you've done something amazing just now. It'll take another 5 years to learn all this.
Hey, don't worry. You can learn it a lot faster than 5 years if you focus on it.
You can direct to target buffer decryption with no plain text staging buffers and kernel memory for the for the RXCAD protocol. Bunupy.
Hey, very nice. That's great to hear.
Good job, Bunzie. That sounds like that was a big challenge. You did. You did good. You did good. You did good.
Very nice, Bonzupi.
So, this looks incorrect here. This is incorrect.
So, how do we do this?
This is incorrect.
Cuz it couldn't all be once. That looks wrong to me.
H interesting.
Oh, no. I know why. Okay, it makes sense now. I don't want to I'm doing this. I did this wrong.
So, this is going to be 0 to one. And if I do 0 to one, the sign is only always going to be one.
Is that Hold on. Let's try this.
Import torch.
See, torch sign. What if we did torch tensor one?
All right. So that's right. So 0.5.
Yeah. See it? That's a fraction. That's a fraction right there. Hey, Shadow King. How's it going? Engineering principles versus fast shipping. What to focus on in startup bugs. Bugs everywhere. Bugs everywhere. Always bugs. Don't worry. Here's the good news.
There will always be bugs. Always be bugs. either something's not working properly or something's crashing. Don't worry, there's going to be bugs. The idea is you got to ship fast and then you need to figure out what didn't work because even if you ship a feature, regardless of the bugs, is the feature worthwhile? Is it even worthwhile?
Because your users might not care. They might not care at all. So, you need to make sure to ship fast. So that way when you shift when you ship your code, you'll be able to have users use your new code more regularly.
That's more important. That's a lot more important. Remember when you were reading about dirty frag a few weeks ago? Yes. Yes. Oh, you solved it with the dirty frag. Well said. Hey, thank you. Appreciate it. That's good to hear.
Nice.
got a little bit of a little bit of a engineering and startup philosophy in there for you.
That's so awesome. I appreciate it so much. Let me see here. Channel one second. William, hey, William. William Richard, thank you for saying well said and thank you for clicking the hi button. Good to have you here.
Okay, so this is very odd. So the token is going to be So this just it's just very interesting. It's just very interesting.
Maybe my concat is doing it wrong.
So what I want to do is say print pe0 because I'm feeling like something's broken there. Hey brother. Hey William.
Good to have you here. How do they explain a logic flaw that corrupts shared page cache doing a poisoning the page cache because they were directly mutating the SKBS by decrypting them in last place lace.
No more plain text in colonel memory.
Hey, there we go. No more plain text.
Now try to manipulate encrypted. Right.
Have a good time with that. Hey Chicken SMP. Good to see you. Welcome on in.
Happy Sunday. Good to have you here.
Welcome on in. We're doing some positional encoding. The plan for today is that we get positional encoding added to our AI model and then we are success. See, it's all ones. That's wrong. It's wrong. It shouldn't all be ones.
It's weird. Is it because line space?
Uh, let's see here.
It see it seems this is the problem here. All right.
Token one. Okay. Torch append. All right. Python import torch.
And then we're going to say 1, 128.
Hey, VibHab. Thank you for subscribing, Vibhab. Appreciate it. Welcome on in.
Good to have you here. You joined the right channel for software engineering.
We're currently working on a transformer. We're building positional encoding using our own approach.
Obviously, there are more efficient ways to go about with what we're currently doing. We're struggling a little bit so that way we understand and learn the language better. How do you learn? Well, you learn through failure. It's the best kind because you learn what doesn't work and then you remember that better because failure has a stronger emotional connection in your mind because it's it right. Is it feel good to fail? It's not always right. Usually doesn't feel good to fail. So you have emotional responses which trigger a lot more neural synapse connections that will allow you to memorize what you're doing better and you also get to learn what doesn't work.
If you only learn what does work, I feel like you're missing out on like 80 to 90% of the most important critical things. So you have to fail. You have to fail to win.
You have created the chess game. But do you know which country game is chess?
I've played it. We did the chess. We did the chess experience. And what where does chess come from? I don't know.
Uppo, tell me where it comes from.
It's slow though, but finally have a working proof of concept. Bun zupi.
That's so good to hear. You just optimized the performance. Nice. India.
Does chess really come from India?
Really? That's where it came from?
What about like precursors to chess? I mean, I know Asia, right? like Shogi.
Isn't Shogi a game that's chesslike?
It's in Asia. So, these are these are Asian games and they made their way across the world because they're so good. They're so good.
What did I just do here? See, look at that right there. Right there.
See the numbers?
See, I told you.
1 to 128 token. So what if we did oh because it's zero. Ah and then we do sum but it's all zeros not ones.
This is very weird.
So what if we said token plus one. Oh because it's coign.
Watch this. It's going to be all ones.
Where did it there? That's why search it. All right.
We'll search it. We'll search it. Google where chess originated.
No, not chase. Chess. There we go. India around 600 CE during the Gupta Empire.
Hey, there we go. Learning new things every day, you guys. Learning new things.
The game evolved over 1500 years though.
trade and travel. So, they added some rules.
Modern rules. The game we play today was finalized in Spain and Italy around 1475 when dramatic new powers were given to the queen and bishop. Whoa.
Which did make the game a little interesting, didn't it?
Okay, so we need token plus one, I think.
There we go.
Now we solved our issue. Python transformer.
Yeah, that's what I'm looking for. There we go. There we go.
I want to Yep. Okay. Looks good. Looks good to me. Let's print out the whole business now. Here we go. Yes. Okay. Is that good enough? Is it good enough?
It needs to kind of go the opposite direction.
So I think I need to invert it.
So let's do max tokens minus token + one. Yeah, I think we can do that. Okay. So I think this needs to be the other way around.
Yeah.
Okay.
Okay. Let's see if that works. Is that better? Is that a little better? Yes, it is. Okay. Now, we got a solid positional encoding. Solid, you guys. It's solid.
This is This is Steven style. Steven style. Beast Hunter, good to see you.
Welcome on in. Happy Sunday. Good to have you here.
All right. Now that we've got our positional encodings, all we have to do is append.
We have to append.
All right.
On our forward pass, I think we're good to go.
Let me just There we go. I think this is it right here.
Okay. Is this is this sufficient?
Okay. Just organizing it a smidge to make it a little bit nicer. Okay, let's see if that gets me what I want.
We'll do this real quick. Print PE. Just to double check. Just to double check.
Uh, did I type that wrong? reg Maybe it register buffer.
Maybe it's Is it torchregister buffer?
No, itself.
I had it right the first time. Hey, there we go. All right. Happy Sunday, though. Now it's Monday in my place. Oh, yeah. It's Monday now. It's Monday for 17 minutes, right? No.
Yeah, 17 minutes. I think it's about right. 17 minutes.
Okay.
So, let's apply our positional encoding to our model in the forward pass.
All right. So, I think we just say return X plus PE.
Let's double check.
I think that's it, right?
All right, we're going to find out, you guys. We're going to find out.
Yes, correct. I'm surprised you guessed the exact time. I know, right? Beast entry. We've been we're aware of time zone math.
How is it happening at the same time in India also actually calculated?
Uhhuh. I looked at I looked at the clock and I did a little bit of math. We did a little bit of math. It's good to have you guys here. All right. So I think all we have to do is our forward now, right?
And that will do the trick. Although if we if X is a batch should still work.
That should still work maybe. All right, let's find out you guys. Let's find out.
So, we got our positional.
So, right around the embedding. Here we go. So, let's do positional on our questions.
Question embedding I think. Right. So, question embedding equals See self.positional questioning buddy.
There we go.
Okay.
My evening ASMR. Hey, how's it going there, Seventh Son? Good to see you.
Welcome on in.
Thanks for clicking the high button.
See here. Where are we at here?
Good to have you here. Okay.
Seven son, I feel like I've seen your name before. Like many like a month ago or two months ago. You've been here before, right? I feel like you've been here before.
Okay. Question embedding. See, this is the line that I'm not sure it's going to work or not. Let's find out.
Python transformer.
There we go. Here we go. Wait. Wait. Did that Did it work? No, I don't know if it worked or not. I got really excited. I'm like, it worked. I don't know if it worked or not. Hey, quick fix. Good to see you. Welcome on in. Good to have you here. Happy Sunday. Hope you had a good weekend so far.
All right, let's see if it worked, you guys. Let's see if it worked. Oh, I got a problem. Um, size of tensor 10 must match size 5,000. Okay.
Right.
So now we need to figure out how.
So that's the batch size. So this is the last thing that we have to do. We have to change this layer here. We have to change this. So that way when we're applying the do we want to add them? I think so.
So X is a batch.
Do we need to do a re a reshape?
So this is the part that I'm not fully sure about. We need to change the shape of this right here.
This is what we got to do. Right here, you guys.
Do you know how like reshape?
Do we do X dot shape? Let's see here. How do like for example zero or something? I don't I don't think this is going to work particularly well.
We need I suppose what we could also do I I had a thought. I had a thought. I have a a pretty interesting thought.
What if we converted it to a What if we didn't reshape it at all?
What if we up here instead of reshaping it, we just do this? So now it's a single dimension, right?
The size of tensor 128 must match size of tensor 64,000 640,000.
Uh, so you say 128. Yeah. Size of tensor 128.
So how do we do that? How do we do that?
Dims, max tokens. Okay. How about that? No.
128 must match sensor 5000.
Do we do the alchemist is conjuring over the code again? Yeah, ex Alexander. It's what we're doing right now. We have to get our positional encodings into the right shape. So then when we append them to our feed forward our our post embedding, we need to see where we are right here.
We're doing this spot right here. So, we just did the embeddings and we need to apply our positional encoding.
That's what we got to do. We can apply the positional encoding. So, we've got the correct data. We just need to make sure it's the right shape so that way it can transmute.
And Alexander, OMG, I know, right? It's things that we got to take care of.
We're getting close, though. We're getting really close.
See here. So, let me zoom up here really quick.
So, what would be really nice is let's see. So, we got 128, but the embedding is also 128, right?
So, we only need to add in the the slice. We need to slice it is what we got to do.
So, that's what we got to do. We got to slice it.
Okay. So, let's go the opposite direction again, I believe.
No, wait. Is that correct? Let's do this real quick. Shape.
Okay. Rerun. Get the error. And then our shape is 128. See, that's incorrect. I believe it needs to be the other way around.
There we go. Slice. Yep. Exactly, Sergio. We got to slice it. So in this part right here, we have to slice.
So we want see colon, colon 2x doshape zero.
I think that's it.
Maybe something like that.
Let's try it.
Print x.shape.
Print shape. All right, let's see. Let's see.
Let's see what happens.
All right, here we go. Here we go.
Teach me how to be a good slicer. Hey, we're we're making our way. We're trying it out. Okay, here we go.
So that's the output size.
We going to slice code with this one.
Ah, we are beast hunter. Yep, we are beast huntery. We absolutely are. So this is our this is our output. That's correct. And I just need it to be this.
Actually, I think this is the answer here.
So it would be this and then the rest of them. Okay, let's try that. Let's try that shape.
See if we did it. Let's see if we did it. No, not yet. It's really close, though. We're getting closer. Look at that. We're getting closer.
Right right here. Oh, wait. Oh, wait.
Maybe we did it. Maybe we actually did it cuz it's it's transmutable. Oh, we might have done it, you guys. We might have done it. Here we go. Here we go.
Here we go.
What are you coding? Oh, we're adding positional encoding to our transformer.
We're building, right? Check title. Yep. Quick. Exactly.
Quick fix. We added positional encoding.
This is the last day to tomorrow. What we're going to do is we're going to write our own transformer layer from scratch. We're going to start on it.
We'll see how close we get. We might get pretty close.
Here we go. Here we go. Hey, there we go. We did it. We added positional encoding. Well, look at that. We did it our own way. I know we could have copied and pasted the code, but we built it ourselves using a different approach. We did it. Check that out. All right, I'm really happy now. I can now feel happy.
I can feel happy.
This is good. This This output right here is good. This very good. See? Yep.
That's what I'm looking for. Right there. We did it. All right, let's get rid of the prints. All right, we don't need the print anymore. Yeah, right there. Right there. Look at that. Oh, it's so cool. We figured it out. We figured it out. We struggled. We failed.
And then we succeeded.
That's the way to do it, you guys.
Okay, let's run it one more time. What are we doing today? Oh, we just succeeded at adding positional encoding to our transformer. We wrote the algorithm. We did get inspired by you, chat. You You told us about it. Thank you for the party poppers, you guys.
Congratulations. Thank you.
GPD Prime is now more closer, but I have to go now. All right, sounds good. Be hungry. Thank you. See you tomorrow.
Yep, we are also wrapping up as well, you guys. We're going to wrap up now.
Let me commit this to get here really quick. Let me make sure that we are good to go. We did it. Oh, it's so nice. It's so nice. Thank you for the hearts, you guys. Appreciate it. Success. Isn't that great? See, now I'm going to have the great rest of my day because we succeeded. We made it better. Here, let's run it again. And we should see Yep. a reduction in our loss. See that right there?
Yep. Looking good to me.
All right, you guys. Thank you so much for joining. Let me do awit get status get add transformer get commit dash message GPT generative pre-trained transformer. What you can Yep. What your got can do or what your GPT can do. Oh, what can it do? Well, it can do a little bit of math. That's about as all it can do right now. It can add two numbers together. That's all it can do. It can add two numbers together.
But we trained it from scratch. You're exceptional programmer and even more exceptional man. Thank you, JB.
Appreciate it. That's so nice for you to say. Thank you, you guys. Appreciate it, JB. That's so nice for you to say.
All right, we added positional encoding party popper. There we go. Push to the main. Boom. Done. Easy. All right, you guys. Thank you much for joining me on Sunday. We'll be back tomorrow. We'll be building our own transformer from scratch. We will be using PyTorch framework to do it. We just won't be using the embedded transformer. So, we're going to write our own transformer. So, we did this layer right here. We're going to build our own.
We'll build our own, you guys. All right. Thank you for joining. Join me back again tomorrow. We'll be back. Have a good evening, brother. You, too, Mark Lemon. Thank you, everyone, for joining.
Have a good rest of your Sunday or Monday. where you are. All right. Thank you guys. Have a happy day. You too, Sergio. Think about creating financial GPT. Oo, there we go. Yeah, making AI that is real good at finance.
See you guys. Thank you for joining. Bye everybody. Have a good rest of your day.
See you.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
So What's Odin Lang Even Good For
TechOverTea
131 views•2026-06-01











