A lexer (lexical analyzer) is the first phase of a compiler that converts raw source code characters into meaningful tokens by recognizing keywords, data types, operators, and punctuation. The lexer design involves creating an enumeration of token types (such as keywords like 'if', 'else', 'while', data types like 'int', 'float', 'void', and operators like '+', '-', '*', '/'), defining a token structure that includes the token type, the actual lexeme (word), any literal value, and the line number for error reporting, and implementing helper functions like advance() to move through the source code, peek() to examine the current character, and peekNext() to look ahead for multi-character tokens like '==' or '//'.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Lexer | Writing a Compiler in C++ | risc-v toolchain | Day 1Added:
All right.
All right. Let me just test the audio real quick.
All right. Yeah. I hope everything's good.
Let's see.
Yeah, I think everything's looking good here. Uh, let me just finish setting up stuff.
Music's good. Yeah, I think everything's good.
Just start.
One second.
Yo, what's up, EXO? How you doing, bro?
Or RNG, is that like your new game? Is that your new game thing that you mentioned?
Okay, I think the audio is fine.
Actually, the audio is good, right, for the stream. All right. Uh yeah. So we're going to start making a compiler. Making a good compiler for our Risk 5 emulator.
So the emulator is already done. We finished that last stream.
Playing games on my Roku TV. Well, games I remember playing games on the Roku TV.
Mostly like Angry Birds. I remember Angry Birds was on there. The OG Angry Birds that was on Roku TV. That's what I played.
back then.
All right, audio's good. All right, perfect. All right. Okay. Uh, so the emulator is done from last time. Now we got to start on the compiler and the project's on GitHub. I actually made the project more structured. I did CMake. I learned CMake and all that stuff to actually make the build properly. So yeah, it's on GitHub.
I'll link in the description. Actually, is the link even updated properly? I think it should still redirect to the main one. I think it still should redirect properly.
So that's fine. All right. Uh yeah, guess we can start by making a file. So we're working on the lecture today. So let's just call it lexer.cp.
And also the header file elixir.h.
There's a lot of knockoff games. So mostly I got Pac-Man, Pixel Dash, and Fruit Ninja. Nice. Nice.
I remember Fruit Ninja.
Okay.
Yeah, I guess we could start. Okay.
So, the main point of the lexer is to basically take these raw lexims, which are the individual words, and parse them into tokens. Actually make them into tokens.
So that's what we need to do.
So like for example, if you have like int x= 3, it splits that up into different like seams and then identifies them into the proper tokens that they are. All right. So first we need to get our enumeration here. So basically setting up our keywords basically.
Uh, I'm going to say call once. That's usually what I do for every header file.
Yeah, private once.
And I have this goofy game called Mr. Snake 2. What's What's that about? Mr. Snake 2.
Yeah. All right. So, we have.h.
I'll add header files as I need as I go through it.
Uh, I'm going to close these ones. We don't need these for a long time later. So, yeah, we'll just have these set up like this. All right. So, lexer.h.
Basically, we need to set our different enum. So, let's just call this like enum tokens essentially or yeah, tokens or token something like that. And then we can set our different type. I'm just going to say token.
This is only a single one. So let's just call, you know, token.
And we want to set like our different types of tokens. I know the the book that I was reading did not only like the different keywords but also like literals or any any punctuation that was involved in there.
So I guess yeah we can kind of just kind of just set it up.
I guess let's just start with um let's just start with uh let's start with keyboards first.
So let's think some keywords that we might include in this compiler to the language. Uh well I do plan on Okay, so let's just kind of map it out real quick. Let me just like map it up. So I kind of do want to cover uh like control flow loops uh basic data types.
Yeah, probably basic data types. What else? Um like logical operators.
Huh. Um what else? Oh yeah, maybe probably like a probably functions as well. Probably functions. So, I could probably just have these to find um yeah, I can't really think of anything else. So, for now, we'll just define a couple. So, let's just start with some keywords.
Uh so some keywords might be like if else like else if uh you can't make it like that. See like this. Okay.
Possibly something like while four probably like print as well.
Well, for functions, we don't really need to we don't really type function unless it's like yeah, unless some specific language, but for ours, I don't think we need to type function.
Yeah, let's just um Okay, so that's for control flow loops.
Uh some other keywords I guess we can say like int float void char uh I'll say double I'm going to just specify now I'm watching Paddington. Isn't Isn't that like that one like raccoon or whatever that something like that? I forgot.
Bear or something. You don't know that one animated thing.
Uh this can also be keywords but this can be like control slash uh types as well, right?
Uh you can say like literals and then have I don't know like string that's like a number itself uh identifier like the identifier of the thing.
Uh what else?
I guess uh I don't know what do they call like punctuation uh punctuation slash I just call like single single characters.
And this would be stuff like like a comma semicolon plus minus like the asterisk star slash modulus operator is a and uh left parenthesis, right parenthesis, left brace, and right brace. Stuff like that. It's very simple. Just a set of these.
What else am I missing?
Let's call this one ops or just call it operations. I guess operations are also there as well. Uh so this is kind of it's kind of hard to like separate. This is just for like comments, but I guess people understand anyway. That's fine. I'm just going to bring these over. Then I'm going to call this operations and just move these over.
All right. Yeah, that's good.
And we could also say like Uh, exclamation equal sign.
Make equal equal sign.
than sign less than sign greater than or equal to less than or equal to.
Yeah, I think that's good.
Yeah, I think that's pretty good.
Did you know there's a developer mode in Roku? But if you use it in TV micro cuz it's not for regular users. What is the developer mode? Like what does that even mean?
Like the code on it or like just in general for like Roku developers?
And I guess we also really want we want our file token.
And I guess that that's pretty good.
That's we'll probably add some more that I'm not remembering. But I guess this is like a good starting point for the enomes.
Okay, let's let's keep going.
So, we did get our enumerations where we type in enumeration.
need to be able to track errors by telling users where it occurred. Okay.
Yeah, that's true. Uh what else?
Yeah. So, this is where we need to actually set the token. So, I think uh I'm going to call this token type cuz it's not exactly a token itself. Token type. Let's call it token type.
And oops. Um each token is grouped into four different things. So we have the token type as we said earlier. So it would be one of these things. Uh the actual lexim which is like the actual the word that we're using literal value. So this is like any optional value. So you have like I don't know three or like any random string literal and then the line number where you had the token at. And the line number is mainly just for like error handling. So when we get to the error part, error handling part, we can do that. Um, yeah, we can set up the strct right now. So we're going to need a strct that handles that has four of those.
So now this would be the actual token.
Okay. So let me see exactly what.
Okay. So token type, right? So we already have token type. So, we just need to pass in a token type and call that type.
It gives you access to other things. I don't want to try cuz I don't know what will happen to my TV. So, I might have to search a photo of what it looks like.
Seems interesting. I've never heard about that. Seems interesting. Exo. All right. So, token type will just be a string. So, just a string and like seam. So that's like the actual word of it. Oh, we need to include string on the top. So we can use that. All right. So did string like same. What else? A literal value. So there's like a couple values that we can use. So uh based on the book, there's like a couple values you could need. Usually what people do for very early compilers is that they pass in they pass in double for any type of integer literal or any any double literal float or anything like that. So we want to we want to pass in a couple things. So we want ST variant. Uh sorry variant not bad stood variant. We need to pass it at the top. That's why.
All right. So stood variant.
And we need a couple things. So we can get um so we can get a double value.
We could do a double value or we could separate them into floats. Uh what people usually do is just spread it out into double and any type of number will just be passed as double. But let me see uh let me just write out the way. So we can either do so we can pass in like this.
This is essentially just a nil value.
I'll send you a photo of what it looks like. I don't know if it's real or not.
Just saying. I'll DM. Be right back. All right. Bet. All right. Bet. See you soon.
All right. Uh yeah. So no value or we could do like we can do stuff like this. So your float and this will be like our literal value.
Something like this. I would want to say that this is fine. We could also probably do something like like a stiff variant. Same same thing with the normal state, but we'll pass a double instead and then have string and then have that be.
Uh, it's just a matter of design choice.
I guess we can do the first one though.
Um, well, hold up. I'm going to comment stuff for now. I'll ask Claude actually what should I what is a better design pattern to do there I've seen many ways like that I've seen many different ways where I can do that okay then we have the of course the line number that's just going to be so just call that like in line let's uh let's pass our constructor so let's say to type uh the different types it's going Token type. Oh, not token type. It's going to be token. Token type. T type.
Uh, it's just going to be the same.
Lex the variance I'm just going to copy this okay it's going to be literal value and and and we're essentially setting up the constructor here. So, I'm going to say uh type uh Lexi Lex and uh literal literal and then line line. So, this just basically sets up the constructor for how this should look like each time.
So, it should be like that. Oh, yeah, that's fine.
Okay, so the constructor's filled and that's basically the strruct we need. So now we have the enumeration for the token type and then we have the specific strct that gets that type and then actually sets up how the token will be.
Uh yeah, we'll check the design choice so far. Uh I'll check in DMs.
That's interesting.
Development application allows developers to test their own applications. Oh, you can like actually add your own applications into here and test it out.
That's interesting. Pretty interesting.
Okay. Uh, we'll check. Um, have this.
I was wondering about varants. I am aware some developers use double to represent any integer any number.
Uh I'm not exactly sure but uh give me some insight. Just give me some insight.
Oh yeah. Uh I'm forgetting something.
Uh for our literal value and also our variance, we want to close the move on them so we can actually copy them.
We can move them instead of copying them. It'll be much more It'll be much better.
Uh not the legend. Sorry.
Oh, wait. No. Yeah. Yeah, it is alexium.
Sorry. The string is alexium and the variance is literal. Yeah. Yeah.
Okay. So, that's good. So, now our constructor is properly built.
Uh oh, he did find an error here. Oh, that's accidentally a lower case. Good catch. I didn't even notice that. Okay.
Uh yeah. Okay.
See double maps to that which you haven't implemented.
Uh that's probably better choice for mine. I'm going to remove double. We haven't done double. I don't want to do double. Maybe in the future. Who knows?
That means I'd have to go back to the emulator and implement double, which shouldn't be that long, but it's like it's kind of tedious. So, I'm just going to skip that for now cuz I'm already familiar with that whole process. So, I just want to kind of learn some new stuff like this. I'll get back to it somewhat. Maybe though.
Maybe. Who knows? We'll see. All right.
So, instruct and yeah. So, I guess we'll keep it like this. Uh Cloud thinks it's a good idea to just keep it like this. Uh so, we don't need this.
Let's see. He doesn't actually have a LS keyword. It's Elsif Parsons and two separate token.
Oh, is that the case? Okay.
I didn't I was actually thinking like when I wrote like uh Elsif here, I didn't know if they combine these two or if it's a separate, but I guess they do combine it. Is that actually like a thing? Like is that okay if it works?
exactly. Oh, it's not a keyword. Okay. I thought it was just a keyword in general. Like it's just one full keyword.
else takes one statement else if that happens to be another if to be honest I did not even know this to be honest I did not even know this but I guess that's interesting I guess we don't need else then we'll just keep it as if else while and for probably not going to add like do while or anything right now we'll just keep it like that I don't want over complicate. We're just going to take a subset of C and compile a subset of C.
That should be that should be fine.
Uh let's also give him my new strct.
I wanted to know. I also modified my strct to use semantics for and so we don't need catch. Yeah, I'm aware. That's good.
What else?
Uh what else? Uh one small thing should also be taken by the valley so the colors decide to move in.
Yeah, they are. It's already correct like that. It's already correct.
Yeah.
Um, yeah, I guess that's fine.
Uh, if anything comes up, we'll move on.
Um, let's do our main file now. I believe that's next. I'm going to do the scanning later to actually set up the scanner class and everything.
But for now, we can.
We can kind of just set it up.
Set up the program itself. All right.
So, I do have some notes here that I wrote.
It's probably going to help me get through all this. So, okay. So, setting up our main.
This is going to be quite interesting to see how we set this up. So, um, yeah, I don't know if we should include Let's see.
We need to set uh set up some functions in here. So let's see.
Yeah, so we need to handle a couple things here. So we need command line handling, file reading. We need to set up a ripple and also error codes.
Okay, we can do that.
We can do that. So command line handling So we need to uh find a way to handle these separate files itself.
Okay. I have a feeling we're going to need so many. So let's just do the normal includes we used to do since we're handling files. I believe we're going to need um S3 frame.
And I believe when I was writing the notes, it was also like um there's also one other include stood o stream. What does that include again? Let me see. Uh stood o stream.
Let me go check that out again.
define and header by uh estream. Okay, I believe we also need that. That's what was written in my notes.
All right, so we have that.
All right, let's just go through this slowly by looking through my notes here.
Let's see.
All right, so all right, so we need to cover these four tasks. command line file rebel and error codes. All right, so we need to know when errors occur. So having a shared state that can flag the error as true. So I guess we just set a we'll just set a global variable here just call it b error equals uh we should set as false for now.
Uh what else am I not? So yeah, we need to okay handle file. We need to handle files.
This is kind of similar to how we did in our I believe it was the interpreter CP. I wrote this to actually open a file, read it properly. So yeah, we can kind of do something similar to that in here.
Um, let's first set up uh since we're reading the individual characters of a source, we the source will actually be the actual like lexium that we're taking the whole source.
So let's just set up like a little we're not going to actually implement the function but actually just like a floor decoration.
So basically just running to get that file and we don't uh we don't want to make a full we don't want to make a full copy. So I'm just going to say constant string reference to the source.
So whatever like source the person passes in into whatever it is the command line or the ripple or anything that should be there.
Uh we need to set up a couple of couple of things. So okay first handling files.
So we want to actually have a way to run these files.
So, let's call them file.
And the path is usually just the stitch string. Uh, let me see how I did that in here. Believe it's just a constant string reference. Yeah. And then the file name. So, we're probably just going to do the same thing here.
So constant string reference to uh five.
Um okay.
So we need to first read the file, right? So we're going to be using that if string that if stream uh can use the if stream to read that file.
Yes. So we want to we want to read that file uh going to call this file name or yeah I think that's fine. Well it's not really a file name.
path is maybe a better name for that.
So we're opening the file from the file path that we got and we need to also check for errors check for check to make sure the file opened properly. So if not file, so if it didn't open properly, uh we can just pass in like an error handling thing here. So just say canot file and pass in the file name. So file path the file path.
So the specific file path that was entered in itself and then we can give like an exit code as well.
Okay, that's good.
Uh what else is there? So handle the files. We also need to read the whole file and that's where uh stood old string stream comes from. Okay. uh like a string builds are stored up in a string. You need to copy the entire content of the file into the buffer.
Then read then return that build string.
Let's also check for errors.
Okay, so we had that first string. So instead of o string stream let's see.
I believe we should use this It's actually construct that value itself.
So call it buffer.
Uh file. Yeah. File is the right one.
file and then calling that um the member function there and that should essentially copy the entire content of the file into the buffer and then we need to return it as well after uh let me see returns the underlying raw string device object.
Okay, it's a function, right? So, it should be like that.
Uh, then we can also this is kind of just a source. So, that's we need to run it as well. And the source that we're taking will be uh the buffer uh we want to call stir buffer or wait uh got to set the content. So yeah, uh buffer dots there.
Uh what else? It must also check for any errors. So okay, we already have our error as error here. The error flag that we set. So if um I mean I guess if not error well no you need it to be if if it is an error. So if it's an error it won't it just kind of won't like work here. So let's just call like ST exit or let's just give it a different exit code than the top one.
Okay, that should be reading that should be reading the file properly now. So that's the way to handle files. If I do believe in the book, that's how they kind of handle it. The book is the book is Java. So it's not going to apply directly.
So I had to do some research on how to read these files and open these files properly. But yeah, the book is Java, so it's kind of difficult to read for me to be honest. But yeah, we're getting by. All right, so we have the run file. Uh let's see what else is in my notes.
Yeah, we need to do the reppel file now.
So the ripple is just going to be running prompts. It's not going to be taking any files in. So, we don't need to open any files and read. We just need to take in allow the user to actually pass in what they want to pass in for the ripple. Kind of just like how a Python ripple would work.
So I mean kind of just something like this where it has like the ability for you to actually write what you want to implement here.
So we need to somehow make that as well in here just allow basic rubble. So we're going to need a separate function here for that. So, let's call it run prompt.
Okay. So, we're not taking a file path.
What What do we take? Do we take anything in at all?
Uh, I mean, not really. If we're just running the prompt, we if we're running the prompt, then we just allow the user to type. We don't need to pass anything from the user itself. We don't need to get anything from the user.
Uh let me read this. So let me reread my notes real quick. So user can enter the ripple type. So yeah, if they type print three, then it allow them to print that value. And there needs to be an infinite loop. So unless the user actually exits the rebel, it will just be infinite loop. So kind of like a wild true loop there.
Uh then you can run the line type of user.
Okay.
End of file. See when the input ends.
Okay. That's like in like the command prompt I believe if I was reading that right. Like in command prompts when you type in control Z in the operation you exit out. So then you can run the line type of the user then reset errors.
Okay.
So, let's see. Uh, I close the thing.
I just want to kind of like see as a reference real quick.
Where was that one thing that we had?
Well, I guess it doesn't have that. So, okay.
Well, I guess we want the user the user input. So, let's say um let's just call user input. So that's the input that the user gives to us for that.
And then kind of just like a a wild true loop.
Uh, I believe like the the ripple there had like is there like anything like a C ripple or something? Let me see.
Is that is that even a thing? Let me see.
So I guess yeah. Okay. So like that.
So, uh, so kind of just like this extraction just this sign.
So, we can start off like that.
Um, well, we don't want to make a new line there since it's going to be on the same line. Okay. So, uh to allow one full line from keyboard input to be read.
So kind of like a stood seen I believe.
Wait. Um, isn't there one where you can just get just one line and to process a stream line by line? I guess that's what we need. Uh, damn.
What was it like? Let me go back to learn P for a second. I believe it was in here somewhere.
Uh, I don't even know where it would be to be honest.
Uh, did I write it anywhere in my notes at all?
I mean, it should be in like s string stuff like that.
Uh, let me let me check. Maybe it's in here.
Get mine. No, it's not in here.
If I can't find it, I'm just going to go to the the reference page.
Okay, here it is. Yeah. To read a full line of input into a string, you're better off using this. Okay.
Uh so I guess Okay. So you pass in stick to get line and then you pass in uh what is uh what is to the west again.
Oh white space. Okay.
Yeah, we probably want that too.
So, actually, let me go back down here.
Uh, hopefully audio on stream is okay.
Okay. So, reading the blue one full line of keyboard input.
Okay. So, I mean I guess if it's if it's not a line, if there's no user input there, Oh, what do you want Uh yeah, I mean that's okay. That's also fine. We can keep that. So it's it was like this. So WS and then I believe it was in the parenthesis or no it's not in the parenthesis. It's just um that's the first argument and then the second argument is the actual user input. So the user input that we have So, so for not getting any user input, it's going to basically get be the end of the line there.
So, we kind of just break out break out of this loop.
But if if we don't break out, we can just run the line. So basically the same thing. We're running the source here.
Kind of the same thing as we did up here. It's actually run the source. Uh not line run user input.
We'll be running the user input. Uh it said something about resetting errors as well.
So I guess uh that's just setting the error back to false, right?
Okay.
Yeah. So, we handled we handle files, handle the ripple here.
What else?
We need to do error handling after too.
Um, yeah, I'm going to check before we write in uh main the main. I also have the notes straight into main. I'm going to check with Claude though to see what I wrote is actually good enough.
Let's let's uh let's take a quick second.
Uh this is for file handling and bre handling.
Uh see if I implement it correctly. I believe this is the correct implementation. Let's see if uh I maybe messed up something.
Uh let's see.
Yeah, I figured that was right. Uh, global error. Let's see.
It's going to get harder to manage as the compiler grows. You'll eventually have a lecture error, parser error.
Okay.
Stern pattern is an error reporter object or even a simple strct you pass around.
Okay.
Yeah, we can write that. We can we can write something like that.
Oh, let's Yeah. Okay. If that's the only problem, we can we can reset that. Make it not a global. Uh let's do in main though.
All right. This is the important part.
So, for something like this, having in main, you need uh I was really confused at this when I first saw this, but you're going to need your need RC. need a char pointer for the arg v. This is your argument count and this is the array of the arguments that you have. So essentially if someone passed in like C uh test C, the argument count is two. So C and then test C are different arguments. And then the array of these two arguments right here. So argument zero is C. And then your actual file that you're testing will be the the first the second element. Sorry, not the first element, the second element.
And you can't have any arguments passed to. And if it's just one argument, then we're not passing in a file. It's probably going to be like the ripple prompt that we had up here.
The file handling's for two arguments.
And then ripple, we don't need any files. So we're just passing in one argument. So all right.
So let's see.
So yeah. So uh any argument pass? So we need to we need to validate this. So if our argument count is over two.
I guess we can just error error check that.
I'm just going to type this for now. Too many arguments.
I said too many arguments here. Um, what else?
Oh, dang. I realized we have a good amount of people watching.
Uh, so arguments is greater than two is that if our argument is just one, it'll be the run prompt. If our arguments is equal to two, then it'll be the file handling. So let's uh let's write that. So else uh what's going on here?
So else uh else if we need to else 16 this will check if RXT is equal to two.
So we want that and then we also want else for any other type of argument. So one argument zero arguments.
So yeah so the argument count is two.
Uh, so I guess we just call the function run five, right?
And we want to call we want to call the the first element of the argument array because that's going to be the file we pass in. So not the first element, the second element. Sorry, it's it's yeah, you know what I mean.
Uh so we want to call in arg of I believe that's that should work.
Yeah cuz running the file path and that should handle that.
Else uh this will be run prompt. So doesn't need any think to pass. It should just be run prompt. We don't need to pass anything in. So yeah, run prompt. Uh oh, semicolon. Missed that. Uh yeah.
Think that's it for main.
I think that is it. So let's just recap on what we have before.
Uh so I started off by setting up this lexer header file. So we got the token type. So whatever the type of the token is and then we set up the strct to actually set up what the token would be.
So it's type the actual raw word of it, the variant like the literal what it could be and the line number it occurs on. We set the constructor. Uh, oh yeah, we need to revamp this.
We set the run, we set the run function.
We haven't actually implemented it yet, but we will.
This is for file handling. This is for the ripple. And this is for um this for main to actually handle everything.
So yeah, that's a good amount we've done so far. I think I'm I'm still good to go for a bit more.
I think we can still see. Yeah, we need to fix this. Uh let's not make it a global.
Let's actually make it within a structure function.
Uh yeah. So, let's let's make a destruct.
So, we're going to put that in here.
So, the actual value and then we can set up like a function here.
uh we can have like a generic way of handling files. I believe like in the book in the book it had like a generic way of handling these errors and then also like a more sophisticated version.
So we could run something like void and call it error.
Let's call this hazard just in case.
And let's change something here.
Uh let's what should I call this trip? I guess just error handling I guess for now. Names can change. That's fine. Uh so void error. So what exactly do we want in the air to pass through?
Uh, I believe in the book. I want to check the book real quick. Uh, where was it that they did that?
Yeah, in the book. Okay, so they pass in a line like the actual implementation like what was wrong and the message and the actual string message. I believe that I wrote that in my notes too. Yeah.
So it would be something like that. So line error at whatever here. So like whatever you mistyped or miswrote and then the explanation.
So yeah. Okay, that's good.
Yeah.
What's up, Lucy? How you doing?
All right. Line um the actual like wear part. So, I'm going to I'm going to kind of follow them. So, they made this report an error. So, that's how they handled it.
I'm going to kind of follow them on that real quick for this part.
That's great. I'm doing good. I'm doing good. Just writing this compiler.
I finished the emulator the other day.
So now I'm working on the second part of the project which is the compiler to put it all together. Okay. Uh line uh the wear would just be like a string, right? So uh let's make it a string reference. a constant string reference.
Same thing with the message. So con string reference message and I believe uh yeah kind of just setting up so in Java they have the system. I don't know that about that but uh C++ we have stair so we can just use stair and kind of just like the same as that uh let me just see the formatting I wrote how I wanted it to be so line uh okay so line is going to be like that uh line.
I'm not going to lie, C++ like the way to write out these strings are kind of annoying.
They confuse you a bit, but yeah. Okay.
So, that put space and then the specific error. So, wear. So, we want the wear here.
where at that specific one will be the um wait hold up. Um, I wrote it wrong.
Air at and then this is going to be the layer.
Okay.
All right. Error at.
Okay. So, error at where I actually hate strings, bro. I actually hate output in C++. All right. Okay. Uh the where. So, that's there. And then whatever the value is, this will be the message, the message of our error. So, message and yeah that should be good.
Yeah. So this will now handle errors and we can also have like a more simplified version of that kind of like how the book did. So we can call that error and that will just be like a line and the message.
I believe that's how they did. Yeah.
Okay.
To be honest, I don't see a reason of splitting, but maybe I'll figure that out later on why to have a more lessened version of that.
Uh yeah, same thing pretty much.
Or I guess we can just call report.
We can call it report and then say the line. So so this will basically call the message.
Okay. Now I understand why now. I understand why. Yeah. This will basically call that function. Yeah.
That's why the they did that in the book. Yeah, that makes sense now.
So passing in those three arguments.
All right. Error handling should be good.
Um H I'm not actually sure what's going on here.
Why that's a problem.
Natic memor reference material to specific object.
Uh, to be honest, I'm not actually sure.
But I guess I can ask CL for a second.
Uh, yeah.
Yeah, I'll ask for a second. So the air is like number reference must be rel to specific object.
Yeah, it must be relative.
Uh, let's see. What's the problem?
So, what's going on? Uh, so you can't access it. Okay.
Oh, you're right. Oh, I totally forgot.
That's such a that's such a bad thing to forget.
So I guess in like outside of the functions we have we have to actually actually like make actually initialize the str. So I totally forgot. I was wondering why we couldn't because I've been able to do that for my other projects. I totally just forgot there.
Yeah. So we want to actually initialize the strct properly and any other like any other things that we need to do there have to get that and I guess it's in the same file so it should be able to see it right or it's a problem there's still a problem here.
Oh, okay.
Okay. So, we need to actually add that as a we need to actually add that as a parameter in every single thing. now to actually be able to use it.
So I guess then in any places that we called it, we need to actually like get that in properly. Uh and then also in here, right?
And then make that I think he did that every single place, right? Okay. Every single place.
So then uh I guess we need to make changes.
So this will take it air handling. Well, wait. No, it wouldn't take it air handling.
Um, it's still a bit confusing here.
It's still a bit confusing on what exactly What exactly went wrong here? So, okay.
So, they're Okay. So, what are we doing?
So, we're running. Okay. Let me go to this part. So, the if part, right? So, the if part Okay. Yeah, they're accessing. I totally Yeah. Okay, that makes sense. Yeah.
then accessing that um member.
Yo, what's up, Muhammad? Uh, I'm making a compiler right now.
I have my CPU emulator here already made. Now, I'm making the compiler for that. It's going to take a a subset of C and compile that through the virtual CPU that I made.
It's the first day of the compiler, so I'm just trying to figure things out as I go.
Yeah.
Okay. And I mean to Okay. So, every every single part I need to change now.
So, like that.
And we also need to access it like how we usually do.
Let me make a change. It's just me.
The error handling accessing that accessing the error for that.
And then both of these should also be changed. So I guess just error handling just to pass it through.
Yes, that's a compiler.
Okay, I think we did it. I think this is good now.
Small little roadblock we had there, but we figured it out. Well, Claude helped me figure that part out. But yeah, everything else we did properly by ourselves.
Yeah, I just totally forgot about that one part for the strct.
I'll definitely uh review.
Uh but yeah, that part should be done now. So, let's review. We have our ination token type.
So whatever like the type of the token will be then we actually need the strct for the token to actually make the token. So our type the actual word which is the lexim and whatever type the literal could be and also the line number where that is declared right and then we have error handling here I just revamped that right now this is to actually run any source that we're taking in either the command line or the ripple and this is the file handling so if a user passes a file through in the command line they will be it'll be taking the file and running opening it and then running it and this is for the ripple for the Crepel. So basically a user can pass in can basically pass in whatever they want. So if they want to print three or make a while loop or anything. So it's kind of just like the Python ripple kind of like that. And this is the main function. So I actually get everything running. So that works for that. Okay.
What's next? Um I guess we have to make the scanner class.
We should actually be scanning now.
So we can actually set up our scanner class. Let me try doing that right now.
So, uh I guess we we can set a back in lexir uh going to have to pro I haven't gone through the scanning part properly. So, we're probably going to take a look here as we go. kind of like uh kind of like convert that from converted from Java on over uh where is it actually? Wait, did I go too far? Yeah, here. So, they actually set up the scanner like this in Java.
So, okay. Okay. So, they're saying a a string for the source.
So, let's set our let's set our private private members in here.
So, string source I believe. Is it uh is it from is it before? I'm pretty sure it's like this tab like private members with the underscore. I think it's after. I think it's after.
Uh I know in Python it's before, but I think in C++ it's after if I remember correctly. Okay. So the string source And I guess they Okay, so they passed in a list of tokens.
Well, let me try and think why they would do that. So, I mean, I guess yeah, they need to pass in a list of tokens. So like if they have in x= 3 and they pass in the whole list of these individual tokens to actually scan through to actually make it valid. So that makes sense. So I guess we can here we can include vector and write that out.
kind of the same like how they did. So setting a token.
So list of tokens and we can also call that token. So okay that's pretty good.
What else did they end up doing? So, let's see.
Let's see the actual logic.
Yeah, I'm using a vector. Sorry, I'm using a vector.
Let's create a simple string. We have a list ready to fill the tokens we're going to generate.
That does that. Looks like the Okay.
So, it looks like they're actually going through it. They're setting kind of like a a starting point and also an ending point to actually go through the tokens, the individual tokens.
and start. Okay, so they have a they'll have an end they'll have an end like a current value like a current place in the token and also a start value. So okay I guess yeah we can set these variables here.
Um, okay.
So, I guess kind of like a line. Okay.
The line number as well that they have.
Yeah, that's important that we need that. That was in our strct that we had earlier for the line. So, uh, you can't really have, yeah, so that's why they put one. You can't really have lines here. So, one just initializing that. So, uh, is there an equivalent of that in is there equivalent of that in C++? Does that end that function or is that something they had to implement or is that C++?
No, that's something they implemented.
So, we can implement that.
Okay. All right. These should be all our private member variables. Uh yeah, let's go. Let's go for it. So guess we need to set our constructor now. So let's do that in here.
So it's taking the variables So we'll take that variable. So the source that we get and we'll tie that in uh we can call like uh to move on it. So just like how we did earlier st move so we can just move the string instead of making a copy of it. So the source uh yeah and then the constructor will have the vector in it. So that the link went and yeah called tokens.
So yeah, that should be good for that.
Good for the constructor. Um, and now we actually need to implement.
So, let me just quickly try and understand. So, they're starting at the start of the token and they're going through the token and whenever they hit the end of it, that's when they stop. So, okay, that's interesting.
Okay. Um, so we need to know if that be right. So it's going to be a blue.
And we want to be returning whether or not it's true. So, I guess when the the current th I guess when the current value is greater or uh would it be greater than equal to?
I guess um yeah greater than or equal to the size of that vector that we're having. So or not necessarily the vector because I guess it would be yeah it would be the source right because it's not necessarily the vector because that's the vector of all the tokens we need the source and we need the actual source so whenever hits the end of one source the token one token then we can move on to the next one.
So yeah. So basically just returning the size of this I guess. But uh we would probably need to cast that.
All right. Is that even possible?
Wait.
Um I guess that's fine, right?
I guess that would be that would work.
And then we need to keep going for it.
So let's just see how they're doing.
So they have they have different functions. So they have the advanced function, add a token.
Okay, that's interesting. Okay, let's try and implement that in here.
Okay. So for advancing I want to return like the specific char of that source and keep moving forward.
So, I guess it would be current that current value incremented by one essentially. So now this will pass that value back to keep moving forward.
That's fine. We we'll check it all with Claude anyways at the end to make sure I'm actually doing this right.
And I believe that add token scan token. Add token.
What other implementations?
How long have I been streaming? Let me see.
30. All right. Maybe like five or 10 more minutes and I might end the stream.
Okay. Uh Start field points to the first character and the lexim being scanned.
Current point current points at the character currently being considered. Line field tracks with source line current is on.
So we can produce tokens and know the location. Okay.
Okay.
Interesting.
These are for scanning the specific tokens. Okay.
Okay. Okay.
Char advance. We just implement the char advance add token.
Okay, that's how they're reporting back the error. Okay, that's interesting. Okay. Um operator division characters need a special handling because comments begin to slash two. Okay, got another helper function. Char peak.
So, it only looks at the current unconsumed character. We have one character up. Look ahead.
Only peak one or two characters ahead.
Okay.
And I guess that's that's pretty easy to find. So start peeking at the value.
It's also probably a good idea to make these little con.
It's probably a good idea to make them all const.
Oh, we can't make this one const.
Uh, so we're just peing at the current value, right? The current we're peing at the current value. So, so as long as we're not at the end, if that's the case, we'll just return it.
I mean, I guess return like a I guess return like the null terminator, right? I assume that's what they probably do in C++ uh in C and C++ as well. But yeah, so at the end it's just going to be the returning the null terminator and otherwise we just peek at the valley.
instead of just returning that value. Uh yeah. Okay, that's good.
But I believe we also should have like a another one as well. But that's to peak the next value cuz the I remember it did say in the book that we need they can peak two Uh we need to also check if it's at the end. So, so if our current value right if our current value if we increment that by one right So if that's greater than or equal to the actual uh source kind of just similar to how we did there our source size.
If that's the case you're going to need to just return out which will be an alter just how we did before.
Uh and H.
Yeah. So, if that's not the case, then we can peek ahead. So, let's see. Um I guess it would just be this value but just plus one instead.
So we do have the peak next functions now. Peek and peek next functions.
Okay.
Now they're handling longer legs. So, okay, that's interesting.
uh these will be implemented out of the class anyways to actually implement all of these but uh I'm just looking to make all the helper functions but yeah okay so we would need to make this helper function here too uh let me see exactly let me see exactly why we need to do this though Um look at the second character, right?
Okay. Yeah, since these characters are taking multiple multiple characters just like we had uh those would fall under some of these ones like exclamation equal equal equal. So they look at both both characters to actually implement. So that makes sense.
Okay. So, okay. So, for their specific case and then they're adding another token for that.
Okay. So, this one's exclamation mark.
If they're adding the token and they're matching this value, it's going to be not equal to and then this will just be not okay.
And then let me look at the helper function how it's how they did it. All right. So I assume it should be pretty similar to do C++ expected.
Yeah. So they're taking that value that they're expecting.
Yeah. So if they're moving forward, if they're moving to the next one, if it's at the end, it's going to be Yeah. So, okay. So, if they're moving, okay, so that at the end there will turn false.
That would mean that it would end up being one of these ones.
Yeah, I think I understand.
Let me try and implement. I understand it. Uh, okay.
I believe it was a boo, right? So, we need to see if it's true, right? You put a Yeah, it was a bull.
So, I need to see if there's a match or not. So, I expected and if we are at the end just be uh I guess just between otherwise it's too true.
But also I guess we need to also check one more thing. We need to make sure the token matches right. So Uh so kind of just like how we did there.
So the token that we're moving forward.
Uh no. Um what am I what am I exactly trying to do? uh not source size we already we're already checking that it's source dot um the the current value that's what I'm trying to see so I guess it's going to be current if I can type right now all right so current that's not equal to the expected value that we gave and we'll return false.
And we also want to comment this this value up after that's done.
So we can allow it to actually add it.
So yeah, that should be that should be good.
Okay, they did it in two checks. They did two checks. I think this is fine though. It's not a big problem. Uh I accidentally opened up Opera. Sorry about that.
Going to close that. All right. Uh yeah, I think all the helper function should be I think I looked over helper functions good enough.
I think that should be good.
Yeah, we've been streaming for a long time. I might and also stream here. We got quite a bit done. It started slowing down at the end a bit. I do need to read more about this stuff. I do understand it, but uh I need to read more about it.
But yeah, at the beginning we were kind of fast through going through the whole main function with their handling, file handling, ripple handling. We did that pretty fast and pretty well. Same with the enumeration. The strct got a bit slow once we got to the class, but that's fine. It's our It's the first time, so that's not a problem. Yeah, I guess I'll continue it off tomorrow after I do some more research about it.
Yeah, we'll continue up tomorrow, guys.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29
🚀 BCS613C Compiler Design | Module 1 to 5 Schema Evaluation 🔥 | VTU 6th Sem 💯 #VTU #bcs613c #exam
Pranavaa-y4y
104 views•2026-06-02











