NVIDIA's dominance in AI infrastructure stems from CUDA (Compute Unified Device Architecture), a parallel computing platform that enables general-purpose GPU computing (GPGPU) and became the de facto standard for machine learning frameworks like PyTorch and TensorFlow. GPUs differ fundamentally from CPUs by having thousands of parallel cores (vs. 4-16 in CPUs), making them ideal for parallel tasks like deep learning. The NVIDIA ecosystem includes RTX graphics cards with tensor cores for AI acceleration, the NVIDIA Container Toolkit for GPU-accelerated containers, and Triton Inference Server for deploying AI models. Understanding CUDA's hierarchical architecture (threads, blocks, warps, streaming multiprocessors) and memory hierarchy (registers, shared memory, global memory) is essential for effective GPU programming.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
NVIDIA AI Infrastructure for Beginners – GPUs, CUDA, Containers & TritonAdded:
Hey folks, this is Andrew and we are going to talk about what is Nvidia. I'm pretty certain you already know what it is. Um, but in case you don't, it doesn't hurt to do a quick refresher, right? So, Nvidia is an American technology company founded in 1993 by Jensen Hong. I do not know how to pronounce his last name. He had two other co-founders, but he's the CEO, the front man. Everybody knows who this fella is, especially for him wearing these expensive leather jackets. Um, which is interesting because before Nvidia blew up in stocks, I don't think anyone was ever talking about his fashion. But, you know, if you uh want to get a lot of attention on the stage, you have to create this public persona and have this kind of fashion statement.
And so, that is the approach that he took. Um, Nvidia is primarily known for their graphics cards, also known as GPUs, graphics processing units for video games. Um, and so the first thing they produced was that NV1 graphics accelerator in 1995. Then they started the GeForce series with the GeForce uh 256 in 1999. And then their most famous graphics card of all time is that GeForce GTX 1080 Ti from 2017. I know this cuz I wanted one. I think I had like uh the GTX 900 or whatever. Uh and this thing had come out, but I didn't have the money for it. Um but this thing was so popular like all the way up to uh 2023, people are still buying it and it just it was very capable for a good stretch of time. uh they have a lot of uh product lines um and they do things other than just GPUs but this is generally what they're known for and in the last five years um their stocks have just blown up. So you can see there uh how much they've gone up and if you look down below it says the market cap at 4.6 trillion dollars. Okay. And the reason they exploded was because of the AI boom because um to run LLMs you the best thing to utilize was Nvidia graphics cards. And we'll explain why in just a moment. Um but anyway, Nvidia has made a lot of acquisitions u and so you can see they've been very very busy um adding a lot of different types of technology across the board um to their organization.
But why is Nvidia the leader in AI? So there's a very good reason and most of it comes down to CUDA. So Nvidia developed CUDA early like in 2007 and allowed you to directly interface with your GPU via API with lots of convenient libraries. And so they had laid the foundation to make their GPUs uh useful for uh general computing, not just for you know video games. And so uh data scientists and other folks could utilize them for other things. And so when the AI frameworks uh wanted to leverage um you know GPUs uh you know the CUDA framework became the default for when PyTorch was created when TensorFlow was created these big machine learning frameworks that um everything was built on top of. And so that just kind of cemented Nvidia uh graphics cards as the de facto or CUDA as the de facto for developing this stuff. And so the other thing is that Nvidia just heavily leaned into AI quickly. You know, they had custom tensor cores. They supported early FP16, BF616. So, lots of things like that. And so, that's kind of the reason why uh Nvidia has um has held the top place there because it just works out of the box. And it's not that there aren't competitors that are better. is just that that ecosystem is so strong that it's um hard for anyone like Intel or any of the other chip manufacturers um to get close um to the space. Okay. Uh but there you go. All right. Let's make sure that we understand what is a GPU.
So a GPU stands for graphics processing unit and is a processor that is specialized to quickly render high resolution images and videos concurrently. That makes it great for video games. GPUs can perform parallel operations on multiple sets of data and so they are commonly used for non-graphical tasks such as machine learning in scientific computing. So there on the right hand side we have uh an example of um a CPU versus GPU and I think it's worth comparing the two. So CPUs can have on average 4 to 16 processor cores but GPUs can have thousands of processor cores. Okay. So you know when you see something like four to eight GPUs this can provide as many as 40,000 cores. So GPUs are best suited for repetitive and highly parallel computing tasks like rendering graphics um cryptocurrency mining uh deep learning uh deep learning and machine learning. So you can see why GPUs play a large role outside of video games in the AI space. There are other things that are specialized um for uh stuff in the AI area, but the GPU has been uh a very useful thing overall.
Okay.
All right. So, we did cover what a GPU is and we did do a comparison against CPUs, but I just want to iterate it again with another graphic, a little bit different information just so it sticks.
And so, here on the lefth hand side, we have a representation of a um a CPU and then a GPU. And you can see the green parts is the processing power and it clearly looks like the GPU can do a lot more. Uh so CPUs besides just processing uh needs to reserve die area for big caches controls units and things like that. We're seeing that. So we saw the the blue stuff, the purple stuff, the yellow stuff, right? There's a lot more going on than just processing. Okay, GPUs dedicate most of their transistors to data processing. As you can see, this big area is processing. Okay. uh down below what's important to understand is also just the way uh the um uh the threading works or how it processes tasks. So CPU processors work on minimizing latencies within each thread.
Okay. So down below like here we says computational thread and so here is an example of a thread. All right. And notice everything is stacked back to back to back to back to back to back to back. All right. So it's sequentially running whereas when you have GPUs they are running uh parallel to each other right and so the idea here is that uh GPUs hide the instruction memory latencies uh with computation so the idea is that yes some of these things are slower faster than others but the idea is you're not waiting in line um for stuff okay and so that's the clear difference there and you know hopefully that's good okay so this course is focused on data center grade Nvidia hardware. Um, but I think that it's very useful to start with what you have. Uh, because then you can kind of map that knowledge over. And so I'm going to be talking about Nvidia RTX, which is the type of card that I have in my workstation. Um, and so, you know, if you have a different one, you might want to try to map it to yours and just have an idea of the capabilities of your card. Um, and again, you know, there's newer cards than what I have out here, but it's just going to be something to help us map our information so we best understand it. So, let's talk about Nvidia RTX, also known as Nvidia GeForce RTX. It's a professional visual computing uh platform created by Nvidia.
So, Nvidia, they've had uh the GeForce series maybe like GTX and things like that. So, this RTX1 is focused uh uh for professionals, not just gamers. And these cards are intended for professional use like video production, product design, I don't know, some things some math folks would do. You can still play video games on them, but they're really intended for professionals. Okay. And when you have um a brand like Nvidia RTX, you're going to have a series of cards. And um you can kind of see that they're releasing almost a series every other other year.
Um and then within it, there's a couple down there that's like workstation cards, server cards, obviously for smaller data centers. But um since I'm just a professional consumer um I have that RTX 40 series um and though despite um you know me getting my new workstation in 2025 um I have a a card that is from 2022 and I'll explain why in a moment that I might have a card that's three years or four years uh older but anyway so within that series there is a bunch of cards and the card that I have is the NV Nvidia GeForce RTX 4060.
Um, and then from that when you have a graphics card, you have a specific board partner. So that could be MSI, ASUS, Gigabyte, or in my case, it's an O OEM uh system integrator, specifically Dell.
Okay. Um, and so the thing you have to understand when you look at um, graphics cards coming from Nvidia is that they design the chip. They don't generally design the cards. Okay? So, they're going to partner with another manufacturer who's going to um uh take the chip, put it on a board, add fans, um you know, design a board in this very specific way. And so, you'll end up with these if you like try to buy like a RTX graphics card like RTX 460, you'll be like MSI's 4060, ASUS 4060, Gigabits 4060. And that can be confusing because you're like, isn't there a definitive one? And technically there is like um Nvidia releases these things called founder edition cards and so that's where Nvidia specifically makes um the entire you know the entire graphics card but they're produced in limited batches availability because they really want you to go with the other consumers but to get the ball rolling they will release these. Um so I'm not sure how much board they are but um the point is that they do exist but generally Nvidia you know wants you to work with their board partners who are going to buy the chips and package them in different ways and obviously there are different capabilities between them which make it really confusing to what to buy. So when I buy one I just buy something and I shove it on my machine my machine and I hope for the best. [laughter] But anyway for me I don't even have a board partner. I have this um uh this one that is part of my Dell machine. So Dell is like a system builder. So they have like complete machines that you can buy. And the reason you might want to buy a complete machine um where um you don't have any control of the system, but it's fully integrated is that you get uh really predictable support and reliable build. So like um Beo really likes Dell, my my co-founder, and when I need a new computer, he said go get a machine from Dell. Uh, I think mine's an Optiplex, but um, Del Precision is another one, but they said, "Go get this machine, and then when it breaks, they will send in 24 hours a um, a person to my house, and I live in a remote town, so that's crazy support, but they'll drive two hours, and whatever needs to be replaced, they'll just replace it on site, and I'll be up and running uh, the next day." And so the thing is that when you work with a system builder, um, they're going to have their, uh, build around something very specific. And so, uh, you'll notice that this RTX there, the 4080, only has a single fan. So, like they've clearly built it specifically for their machine to be very compact and fit in there. Um, and so you might get older cards, but they're optimized for these things to be quickly replaced. So, you're getting a trade-off of reliable build, something they know what what works reliably.
than uh having the newest thing. So even though like it's 2026, I got a card from 2022 uh but it still has ML capabilities. Okay, because when I got the station I said I need to do some level ML and so you know that's that.
But let's talk about you know my card uh what's in it. So it is utilizing the love lace architecture um or ADA love lace architecture that is the basically um uh the design of the card we'll talk or the chip we'll talk about that in a moment and it features RTX third generation RT cores which is for ray tracing okay that's great for video games obviously and then um it has this fourth generation deep learning tensor cores and tensor cores is what you need when you are working with ML um and you know Here's [snorts] a photo of the RTX 4090 Founders Edition. I couldn't find a Founders Editions for 4080. I don't know if they make one for every single card, but um I remember this one and I thought when I would I would get my graphics card, it would look like that. It doesn't. [laughter] But that's okay cuz the ones made by Nvidia look really really cool. Let's talk about Love Lace Architecture. So here on the right hand side, whenever Nvidia releases a new chip, they're going to make this cool graphic. And if you look in the middle of it, you can see the chips die. So that is a a thin slice of silicon etched with billions of transistors cut from a a larger silicon wafer. So just imagine them uh etching billions of transistors, okay, into a piece of metal is the way I'd like to describe it. Um but anyway, uh let's talk about the capabilities here. So this is all the functionality of the love lace architecture and we don't need to know everything that it does here, but I've highlighted the stuff that is going to matter to us. So the version of CUDA is going to matter because it's going to determine what uh what uh functions are available to us and a lot of uh machine learning stuff is built around specific versions of CUDA capabilities. So you might be limited on what you can run based on your card because of what it can uh what CUDA capabilities it has access to. Um notice the tensor core it's talking about the precision. So the precision is going to matter if you are trying to run specific uh models that have been built for specific precisions. Okay. Um I highlighted uh MVLink because we learn about MVLink in this course and it's saying there is no support for it. Um I don't know much about MVLink outside of the data center use in this course but maybe MVLink is a consumer product as well for professionals. Uh but we will learn about in this course about how it is much faster than PCI and uh that it's great. Okay, so hopefully that gives you a frame of reference and again look up your card, look up its architecture, try to see if you can highlight the same things. Um, but I think it's worth for us to go to the internet and take a look at um, uh, maybe some leaked dyes because when you have these these dyes here, um, we don't get to see the full picture and all the capabilities of it, but it's fun to show you that people will go in and take up a close-up look and show you exactly what's being etched in here so you understand how these things work. Okay, but there you go. Let's take a look here at Nvidia SMI, which stands for system management interface. It's a command line utility bundled with Nvidia GPU drivers used to monitor and manage GPU devices. So this is something that um you know if you have an Nvidia graphics card uh even on your uh own machine and you are running ML models or things that are intensive this is a great way to get uh real-time information about how your um graphics card is performing or just getting information about your graphics card in general. Though I did show you via the control panel in Nvidia app how to do that. If you're in a data center with no uh user interface like graphical interface, then you'd be using SMI here.
So provides real-time data on GPU utilization, memory usage, temperature, power consumption. Essential for tracking performance. It supports Linux and Windows 64-bit. Um, so these commands that I'm going to show you, you should know what they do and remember them because there's a high chance they might show up on your exam. Okay? Okay.
And so I'm being very deliberate here to show you very specific ones. Not all the commands, but the ones that you absolutely should know. The first is dynamic monitoring. So you type in demon dynamic monitoring. Notice you get the uh the GPU, the power, the temp, uh the memory, additional information. There we have process monitoring. So this is showing what processes are running. So maybe you have Python 3 running something. And notice there's a PID number so that you could, you know, kill the PID or whatever. Then we have query specific details. So if you need to get very specific information um like you know memory usage, memory total or temperature or uh very specific things you can do that know you can format things as CSV so you can export it out into a CSV format. um this thing and we'll learn about MIG later, but MIG is the uh capability of taking a uh a um a GPU and isolating into segments for different uses. Again, we'll cover MIG later on and that will definitely be covered in this command.
You're going to want to know. So, when you cover the MIG section, come back and look at this command. But the idea here is that um with MIG, we're going to create GPU instances of a specific size and profile for MIG capable GPUs. So things like the A100, the H100, those ones, they are MIG capable. Not my RTX, that can't do it. But really expensive graphics cards where you can logically isolate them um into uh into separate things. We can do that. And when you utilize it, you're going to be using uh utilizing um a very specific uh profile.
Okay? Okay. And so see how it says 0 1 914. So that's the number that you're putting in here. And that's going to uh determine uh different parameters for uh that segmentation. So CGI creates a compute instance with a GPU instance, right? And the C flag specifies the compute instance profile ID. Okay? But anyway, when we cover MIG later on, just remember this and come back and learn these flags. Okay? There's another thing in uh Nvidia Smi when we first use it.
It'll give us a bunch of information. I don't have a screenshot for that right now, but I'm going to go over and show you Nvidia Smi with my RTX. Okay. All right, folks. We're going to take a look at Nvidia Smi. When we installed the drivers with Nvidia, it should have installed Nvidia SMI. So, um, if we need to get it to work on Linux, we can absolutely do that through WSL, uh, like Windows Subsystem Linux or if you're on a Linux machine, obviously that's a different story. Um, I would say that most people that are working with Nvidia graphics cards as a consumer are going to be more familiar with with it on the um on the uh uh Windows side of it. Uh because when you're dealing with containers or containerized environment like Windows Subsystem Linux, there's some extra things you got to do to access it and it's kind of a bit of a headache. I do believe we cover um some Docker stuff in this course specifically for um getting access to the graphics card through a container. So, I'll try to include that here, but um not in this video, but we'll try to do that because that was a big pain point for me when working with um making sure I'm utilizing my GPUs in a containerized environment. Let's go ahead and type in Nvidia Smi. I believe that's all it is.
So, we type in MVD Smi and right off the bat, we're getting some basic information. So we have um the version, the driver version, the CUDA version. So notice that um earlier when we were looking at Wikipedia, it said it was 8.8.9 compatibility. And so I'm guessing that was the lower threshold, right? So it wasn't that um I couldn't use um newer versions that I just can't use anything older than 8.9. I believe that was the version. But here you can see I'm running uh CUDA version 13. Um what other information here? So here you can see uh if anything is in use currently.
So these are our processes and we clearly have four right here. Um some additional information would be I guess this is its use over here. And let's see here we got GPU fan name, temperature preference.
Well anyway we got usage. It'll make more sense once we start working with it. But let's go ahead and try some of those other commands. Okay.
So I think we had Nvidia SMI demon.
Okay.
And so that's reporting that information there.
And we can see its temperature right here. 28. I'm going assume that's Celsius. I don't know. Memory percentage being utilized only at uh 16 12%. So nice and low. I I'm assuming that's for encoding, decoding, JPEG. I'm not sure what the other ones are for, but what we normally care about is temperature and memory usage. Okay, so we'll stop that and we'll type in clear. Let's go ahead and try Nvidia SMI PON.
Nvidia SMI PON.
It's not supported in this configuration. Interesting. Um, okay.
So, I guess maybe I can't utilize it on my Nvidia graphics card. That's kind of interesting that there are ones that you're not allowed to use it, but we do have a screenshot of it, right? And you also notice over here when we went up to here, we have it right here. So, the only thing is that it's not it's not actively monitoring in real time. Let's take a look at the queries work. So, we'll say Nvidia Smiyen query GPU um equals memory used and then memory total.
Okay. And see, we go go ahead and hit enter. And so we're able to get that exact information. We can try a few other things here. I'm just going from the power uh PowerPoint slides that I was able to grab. These are the default examples in a cheat sheet that I found.
Temperature, GPU. I don't want to pretend that I have these memorized.
Okay. Power draw.
We'll go ahead and do that. So you can see that information there.
say Nvidia SMI hyphen hyphen query GPU equals name. Okay, we get the exact graphics card. I think I said 4080. I guess I have the 4060. I'm not paying attention, am I? Um, I'm going to open up the control panel. I thought I had 4080.
Oh, I have a 4060. And in the slides, what did I say that I had? Did I say I was richer than I than I am? Let me go take a look.
I mean, that is a 4080 here, but I guess I guess I thought I had a 4080. I said 4060, then I said 4080. I guess I'm getting mixed up. They sound really similar. So, I guess I have a 4060, but anyway, that's Smi. And you should know those commands. Even though we can't do the PMON here, we should know it. We could technically try to spin up a GPU in something in the cloud. Um, that might be a little bit hard to do. Um but uh the point is that you know if we're in the cloud we should be able to utilize the same interface in Linux um you know and and get different kind of information. We could also try to start running something to get a sense of of usage here. But I think you understand that like these numbers will go up when we are utilizing some kind of uh uh GPU stuff. But I think that was kind of a hard thing was that when I was um utilizing um utilizing stuff here, I wanted to make sure that I was using the GPU versus uh the CPU. Um and I'm not sure if I exactly remember how to do that. Um like show them side by side. There's probably some kind of user interface uh thing that we can utilize, but uh obviously we can access through SMI or we have the MV app as well that gives us some stats.
But there you go. Okay. All right. Let's take a look here at Kudos. So CUDA stands for compute unified device architecture and is a parallel computing platform and API by Nvidia that allows developers to use CUDA enabled GPUs for generalpurpose computing on GPUs also known as GPGPU.
Okay. Um and you know you saw earlier uh with my RTX graphics card that it is CUDA enabled. So I mean I don't know of any Nvidia graphics cards that aren't CUDA enabled but you know there are ones that aren't. Uh, I'd be surprised. Um, here's some examples. If you know ADS, here's some examples of Nvidia graphics cards that you could access in the cloud like Tesla V100, M60, T4, A100, things like that. Um, all major deep learning frameworks are integrated with NVIDIA deep learning SDK. Um, and so the NVIDIA deep learning SDK is a collection of NVIDIA libraries for deep learning. And one of those libraries is the CUDA uh deep neural network library. So CUDN and so CUDN provides highly tuned implementations for standard routines such as forward and backward convolution pooling normalization activation layers. So what I'm trying to say with that end part is that you know is and I said this earlier in the course was that you know the reason why um Nvidia is doing so well in the AI space is that you know they made CUDA and CUDA has this this uh library within it that just made it really easy to start working with their hardware um uh with parallel computing especially on general compute. Um so you know hopefully that is clear. Um if you want to know a little bit more about the vision behind CUDA there is a white paper from 2008 by by Lindholm and it is very interesting and has some nice graphics in there. So I just wanted to point you to it and um it goes a little bit deeper more about graphics but it still is very relevant to um AI portions of it. So it's up to you if you want to read it. I just want you to be aware that it does exist. Um, the way we're going to install CUDA is via the CUDA toolkit. So, you just go ahead and you download it and it's going to give you, I don't know, make sure you have the drivers or whatever the um, uh, compiler you need for it. I think it's called NCC, NCCL or something, the compiler.
Uh, but CUDA is typically coded in C++, but there are wrappers for higher level languages, uh, like Python. Uh, it doesn't show there on the right hand side, but I know there's a net wrapper as well. Um, but we will look at some comparisons in this video of what it looks like to um, kind of use Python as a wrapper and make it a lot easier to work with. Uh, there's a library called QPI and this is a Python library that basically gives you the functionality of numpy sci. Um, but it leverages CUDA underneath and so we'll definitely look at that in just a moment. I want to talk about like the architecture behind CUDA and I found it very hard to make sense or understand it. So I had to break it down a little bit differently as I was really blanking out there on the internet trying to make sense of it. So hopefully you like my explanation a bit better. But on the right hand side you have a CUDA program. I'm just going to get my pen tool out here. Okay. So here at the top is your CUDA program and then down below you see GPU GPU. So these are your GPUs and each GPU they are distributing the workload into um their their uh their compute. Okay. So the idea is that the compiled CUDA program has eight blocks up here. Okay. And the CUDA runtime can choose how to allocate these blocks to multiprocessors as shown with the streaming multipprocessors.
Notice based on how many SM there are in the GPU that the blocks are being event uh say eventually but I mean evenly I spelled that wrong. It's supposed to be evenly distributed. Um, and some other terms that kind of confused me, but I figured it out eventually was when they're talking about host specifically when they're talking about CUDA, they're talking about the CPU on the system and host memory is the CPU's memory. If they say device, they're talking about the GPU. And if they're talking about uh device memory, they're talking about the GPU's memory. Okay, so those are some broad terms we should know. But what I want to do is I want to break down exactly what is a CUDA block. Uh, so let's take a look at that. And uh so the idea is that you have first a CUDA thread. So this is your lowest level unit of parallel execution for a kernel function. And generally you're going to have um uh things organized as a block.
And so that mean that's a collection of threads. Okay. So they might call CUDA block or CUDA thread block and they come in numbers. Well before we get into that we'll say um the idea is that when you have all these threads you want to execute it against some piece of code and that's your CUDA kernel. So, and I spelled kernel wrong here. It should have an e on this. But your CUDA kernel basically is just a function usually written in C++. It's a function that performs some kind of mathematical operation and it's going to run it across all these threads within your block. Okay. Um, and generally you have these blocks made up of 32 threads. A block made of 30 threads. And we call that a warps. I didn't look up what warps stand for, but it's called a warps for whatever reason. and a warps will run within a streaming multiprocessor which we'll see in just a moment. Then we have CUDA core. So the idea behind a CUDA core is that it's a simple arithmetic execution unit inside the um the streaming multi uh processor that performs integer and floating math for CUDA threads. So CUDA core is not like a a CPU core. It's much smaller and simpler. It's operating on things like um floating points, integer math, basic arithmetic stuff. Okay, so hopefully that is clear. If we go up a bit, uh we were talking about these streaming multipprocessors. So the idea here is that you have um these SM which are the core execution unit inside the Nvidia GPU that runs warps of CUDA threads. And so an SM contains the CUDA cores, tensor cores, load store units, registers.
We're just focused on this one right here. Okay, CUDA corores. And so for this SM, it has a couple CUDA cores and this one has a couple CUDA cores. But when you think about all of the um all of the uh uh CUDA cores across the entire SM, that's what we call a CUDA grid. So a CUDA grid runs across multiple SMS and a CUDA grid maps to a single GPU. Okay. um individual thread blocks are scheduled onto individual SM as you can see here in the vertical lines here. So hopefully that is clear when you look at those diagrams. If it doesn't make sense when you see those architectural diagrams, just come back to this explanation and this will break it down a lot faster. There are some debugging tools that are worth knowing when you're working with CUDA like uh NSITE uh GBD and mem. They're not going to show up on your exam, but I just wanted to point out those three so that um you know we can add those to our toolkit if we need to uh figure some stuff out. I wanted to show you some comparisons of code. So here's an implementation of a CUDA program and it's written in C++ and generally you know these things are written in C++.
C++ isn't that hard to write but if you want to use Python or.NET you absolutely can. And so we need to define our kernel function. So that is the function that's going to run across threads. As you can see, it's not complicated. It's just this one's adding two arrays together, right? So, whatever small repeatable thing you want to do across a bunch of stuff. Notice that um in CUDA that we're managing the memory ourselves. So, see the CUDA maloc and then um we have CUDA maloc device to host and things like that. So, that's why I said, you know, why did we want to learn the word host? That way, we know that we're talking about the CPU, right?
So, we're learning those little terms.
But let's take a look if we were to use Python instead. So this is the PyCUDA as as up here. But notice that we're still using C code here. So even if we're using PyCUDA, we still have to define our function that runs on our threads in C++. But at the very least now we are using um Python to uh manage uh it. And so the syntax is a little bit nicer. But again, you can tell it's not that hard to do C. Um and we're still manually running our memory. Now if we were to do something like Numba, Numba will greatly reduce the amount of code. And so now our function is our our kernel function is written in Python. We have this CUDA.jit just in time compilation which will turn it into basically not C++ but whatever the machine end code is. And so you can see this is a lot shorter right also that we are not really managing the memory. It's kind of doing that indirectly for us. So we have get some automatic memory. And then we have um uh qi, right? And notice that we don't even have to define a kernel here because the kernel is going to get outputed somehow based on the operations that you're doing. We're doing no memory management.
And so all these pieces of code are the same. But you can I just wanted to make clear that coup makes it like dead simple. So if you're in the data science space and you're going I don't want to write C++ that's too much for me. And I know um pandis this is how easy it will make it for you. Okay. Um but yeah we should understand the the interactions of memory for CUDA. So we saw earlier the CPU memory. Let's take a look at the GPU side of memory. So you have registers. These are private to each thread compiler and they're not visible across threads. You have your L1 shared memory. This is a fast on memory fast on chip memory just like the CPU memory shared with CUDA blocks. Uh physical resources shared AC across blocks on the same SM. We have read only memory. So this is per SM instruction cache constant memory texture memory and RO cache read only in kernels. Uh we have our L2. This is shared across multiple SM. Okay. And we have a global memory.
Uh and so this is the largest but slowest memory tier. And so, you know, I'm saying that you basically are always working with these uh memory caches. And if you are working C++, you generally want to know how you're interacting with them. So, this is what it's going to look like when you're interacting with the global memory. So, we're doing CUDA maloc. So, global memory is really, really slow, but it has the largest capacity. You have your shared memory and that's how we're accessing there with the shared keyword if you'll notice here. Um, yep. And, uh, you know, shared memory is very fast, but it has a small capacity. You have your constant memory.
Uh basically it's read only but it's super super fast but it's also super super small. You have your register memory and notice as soon as we do int that's putting it into the register. So that's how you're utilizing the register. It's super fast per thread. Um and then you have local memory. This is where it looks like you're using the register but you use a very large value and so it ends up placing it into um into that instead. Okay. For a CUDA processing flow that's generally good to know. you copy data from the main memory to the GPU. Okay, so I'm trying to find my pen. There it is. So here, main memory to the GPU. Uh CPU initiates the GPU um uh compute kernel. The GPU's CUDA cores execute the kernels in parallel.
Okay, copying the results of the data from the GPU memory to main memory. So there's things you got to know about memory. Um but again for this course you know this isn't a uh this this this certification is not about knowing how to write CUDA in great detail but you come across it so much you really should know it when you're in this Nvidia ecosystem and that's why we're covering it. Um the challenge here with CUDA is that there's many versions um and CUDA has different uh features per version. Okay, so it's not uncommon to having to work with multiple CUDA versions, but managing multiple CUDA versions is a little bit tricky and we'll talk about that in a moment, but you can see over here trying to find my pen tool um that we have the CUDA versions going all the way down and you can see the coverage. So we have test uh Tesla, Fernie, etc. So mine I have an ADA love lace, right? And we know its capabilities um starts from eight all the way to the latest. Okay. So, remember when we looked at um the data sheet for um or the feature set for Ada Love Lace, it's like it said it was eight uh uh 8 compatible. And so, here it's indicating that at the 8 or 8.5, which is unusual because over here it's the eight. Down below here, it's 11. So, I'm not really sure why it does it that way as we know that it can go a little bit farther back, but generally this is the supported range that we have for that thing. So, yeah, you can definitely get locked out of features. Um, and sometimes there can be features that only exist in a very specific one. So you can run into that kind of issue. Um, the CUDA SDK provides a low-level API, the CUDA driver API, and the highle API, the CUDA runtime. And here you can see in CUDA 8. These are all the packages that are part of it. We said earlier that was um, CUDN.
I don't see in this one, but we'll continue on here. Maybe it'll show up in this one here. This is number nine.
Number 10, 11. Okay, so maybe it's not directly in I thought it wasn't there, but the point is you can read each each of these and see, okay, this is CUDA fast for your transform library. So, a bunch of bunch of data science math stuff that's very useful, right? That's kind of built in and works right away.
Notice that um sometimes there are things that are deprecated and then they're renamed in a in a later version.
So, you can run into issues where there's things that only work with very specific versions. So, how are we going to be able to change our CUDA versions?
um being able to upgrade downgrade when we have different uh programs that were designed for very specific versions. Uh well, CUDA doesn't have a version manager, but it relies on environment isolation using Mamba, which is a smaller version of Docker. And so, basically, here we're using condom, we're creating environments, and we're saying I'm using this version of Python, but I want CUDA 118.
uh or here, you know, we're specifying the toolkit as the the general way that you um well, sorry, this is just the name of the environment. Sorry, this is just the name of the environment. This is actually the way you install it because the way you install CUDA is through the toolkit. And here we're saying, I want version 11.8. And so, you basically make a bunch of environments and switch between them. Uh and that's how you would do that. And at some point in this course, I think we'll show you Docker because there's more to it than just saying install it. But if you are using Docker, um this is just a Python isolate environment, but Docker there's more work to map it over. But anyway, hopefully that gives you an idea of CUDA. Again, it's not something we need to 100% know, at least not for the time I'm making this course, but we should know what it is. Hey folks, this is Andrew Brown. This video we're going to take a look and try to build ourselves our own little CUDA program. I figured it's a good exercise to uh make sure that we are familiar with what we got here. So, I'm going to go over to the CUDA toolkit. We'll go ahead and download the latest version. So here it says CUDA 13, etc., etc. Um, I mean technically we should be creating environments to manage this. So I'm just trying to decide what I want to do here.
Um, one thing I can do before I start doing anything is I can just go check command prompt as I might just do this on the window side to make it super easy. But we can say NVCC. And so it says it doesn't recognize the program. And so what I'm going to do is I'm going to go ahead and go to Windows. I'm going to go download it for x86. And I'm on Windows 11.
[laughter] And I'm going to download it for local. Um, and we have our toolkit at 2.3 GB. And then it's talking about the driver if you've yet to install the driver. But I guess while we're doing that, we should go over to our um there's a few ways we can do this. So, let me go over to our control panel for um Nvidia because I have to remember what version of um CUDA I have here or CUDA capabilities. So, if I go into here, one second, you'll notice here we have driver type DHC driver version.
And so, what I'm hoping to look for here is like CUDA information.
Uh we say CUDA cores.
Sure.
Another thing we can do is just do um Nvidia SMI CUDA version 13. Okay. So what's up here? 13.1. So we are in good shape.
We'll go ahead and use the CUDA installer. And so we'll wait for that to download. Okay.
All right. So uh that's download. We're going to go ahead and uh give that an install. If I can make my way over to wherever it is.
Just a moment.
And it's uh Come on. I don't know why it's giving me a hard time here today, but um we'll go over to my downloads.
I'm doing this off screen, obviously.
And so I'm just double clicking and launching CUDA toolkit.
And I'm waiting for it to open up.
Oh yeah, I definitely clicked it. It's going. There we go. And so we'll go ahead and hit yes. You can see that it is now extracting out. And so we're doing everything on the Windows side. I personally prefer working in WSL Linux, but I don't want to over complicate things just yet. Um, and so until I need to do that, I'm going to just do it on the Windows side. If you're on a Linux machine, like without Windows, it'll be a lot easier for you. Uh, but if you're trying to do Windows Subsystem Linux through Windows, you might have some trouble here. So, we have express or custom. Oo, we can do custom. What are our options?
And so here we can choose our runtime, our libraries, development compiler.
Okay, there's the NVCC compiler that I was talking about earlier that I didn't know the name of. Uh what other tools do we have?
Okay, endsite, which is for debugging. So we're already going to get that.
And so we'll go here, look at the runtime, and these are all the things we want to install. So go ahead and hit next. and it's going to place it into Nvidia GPU computing. Okay, that's fine by me.
We'll hit uh next and I'll say I understand.
That's fine. I don't care about Visual Studio, but I just left it checked no matter what. I guess that makes sense because like Visual Studio is a program for writing C++. So, I was just going to try to use Visual Studio Code, but we'll give that a moment here to install.
All right. So, to install Visual Studio, that's fine. Yep, that's fine. We're fine with that. I think this should be a non-issue as I believe that we can use MVCC from the terminal. So, what I'm going to do now is I'm going to go back over to command prompt and I'm going to assume that we can access it from here.
NVCC uh hyphen version. And so, I'm just going to close it out. I'm going to reopen up command prompt. I'm uh you can't see this, but I'm opening it up in administrator mode just in case. And we'll go ahead and type MVCC hyphen hyphen version. Okay. And so now it's showing up when we see CUDA. Okay. So that is good. And um I guess the next thing is we need to write a CUDA program. So CUDA programs I believe have the extension cu. And so I'm just going to go quickly use something like chatbt and ask it to generate me out some code.
So just give me a moment. Okay. And so now I have a little bit of code that I will bring onto screen right here. So we definitely have one. I'm just going to first go open up Visual Studio Code so that we can start working with this somehow. So, I just need to make a new file here. And um I'm going to grab uh this code here.
And we have hello from kernel. Kernel one block threads wait GPU to finish.
That's not very exciting. I would have thought that like we would have done something that um uh actually looked like something. Okay.
Can we can we actually perform a uh a uh kernel kernel function?
Okay, because that's what we should be doing, right?
I'm not sure why it's uh doing that, but we'll give it a moment here.
All right. So, here it's doing add vector. We're just trying to find anything. As long as you have some kind of function and we're going to run it and it's doing the maloc and etc., then you know you're on the right path. So this looks to me this looks like CUDA code. So we'll go ahead and grab this.
And I'm going to bring this over here and paste it in.
And oops. And I'm going to just save this. Save as.
I'm going to save this to a new folder.
We'll call this CUDA examples.
And I'm going to save it here.
And I also just go to CD backout here for a second. CDC CD CUDA examples.
Oh, it's actually saved as CUDA examples. So, I'm going to make a new directory called CUDA. Well, sorry, we'll move CUDA examples to uh test.cu.
We'll make a new directory called CUDA examples.
And then I'm going to move the test CUDA into the CUDA examples. I'm going to CD into our CUDA examples. And we're going to do code period just so that we can focus on this code. There it is. Good.
Say don't save for now. Okay. And so we have our code here. Just bump this up a bit. And we'll open up terminal.
Well, that's the only challenge here is that um this terminal here is not going to have MVCC, is it? Well, let's just take a look. Does it? Oh, it does. Okay.
I'm in WSL right now, I believe. So, for whatever reason, it seems to be detecting it from the host system. So, if it works, that is fine. But I was assuming that we'd run into some some issues. Okay. And so here suggesting to compile it. I'm going to make a new file here called readme.md.
By the way, folks, if you're watching this, do your best to try to follow along. Um, you know, sometimes I can provide code, but this is so small and uh you should try to do this on your own that I don't necessarily think that is important. So MVCC should compile that code. So we'll go ahead and we will Oops. We'll grab this. We could make this a um a a CMake file, whatever. But we'll just go ahead and see if this actually works.
Um, oh, it's not called add vector. So, I'm going to rename this to add vector.
Okay. And so now we actually have our program vector add and we'll run it. And so we get some output. Um, and so it's adding up vectors.
Um, so if we look here, here it's printing the results. So yeah, the program works. Could we observe this? I don't know. We'd have to do something a lot larger um computational and run it against it. But for the purpose of what we're doing here, I think that this is probably, you know, a good hello world, so to speak. If there's a reason for us to do more CUDA, we can. But um I just want to show you like it's not hard to get running with it. Um you know, and you know, we didn't really talk about memory management or all that other stuff, but that's your hello world for CUDA. Okay, chowo chia. All right, let's take a look here at ResNet uh 50. So, ResNet 50 is a basic backbone vision model that can be used for various purposes. And the reason we're covering this specific model is that I saw it a ton on the exam. And also, it's just such a generic uh or all-purpose model that you see a lot in a lot of examples. You should know what it is.
So, on the uh right hand side, you can see uh the visualization. And I can't remember the name of the visualization uh tool, but there's tools to visualize out models. And so you can see it's really really long and you can see uh a close-up view of how that works. I'm not going to explain how the model works in this video. I'm going to tell you what it does. I'm not going to explain uh the math behind it and stuff like that. Um but anyway, so ResNet 50 is a convolutional neural network that uh that's great at image classification. It uh the ResNet 50 belongs in the ResNet or residual network model family. So there are other variants like ResNet 18, 32, etc., etc. So ResNet 50 was released in 2015. The model size is around 100 megabytes. Um, ResNet is small and can run on hardware from 2016. So if you have a fourth generation Intel Haswell uh CPU, you could run out of that. If you have a 2017 iPhone, so an iPhone 8, if you have a GTX 700 for a series 203, it'd be slow, but it would work pretty well. Um, and so you can run this thing and that's why it's so popular because it does a really good job for that task.
It doesn't matter that that it's um an old um an old model because old models doesn't necessarily mean bad models, but it just hit a certain point where it was very very useful. So ResNet 50 could be trained to run uh on your phone and detect a dog uh a dog in a photo. For some reason with ResNet 50, they love in every example to talk about a photo dog, but there's a website here, tensor space, which gives you a visual way to explore that model. Um, and so they give that example there. Uh, ResNet 50 Onyx model included with the Onyx model zoo.
So what I'm trying to say there is that uh Onyx has a bunch of models and they have an onyx model specifically for ResNet 50. So that is what we're visualizing there on the right hand side. So I ended up downloading that model and utilizing it. But Onyx is like the standardized format for um representing models and we are going to learn about Onyx in this course. But um that's ResNet 50. Okay.
Hey, this is Andrew Brown. I just wanted to quickly show you tensor space because you can go and see the visualization of um of how ResNet 50 works. And here you can see that it's predicting that this is a dog. If you go over here to this, you can see it's predicting uh ice cream sandwich. And so you can see all the 50 weighted layers. Uh and if you click into any of them, you can kind of see like how it's starting to try and detect it, you know, in different ways. Okay.
As it narrows it down to the final the final uh conclusion of what it is. So uh yeah, I mean I'm not the best at explaining it, but I I can just show you that there is a visualization and there's something happening here and it's it's cool. Um, would it be useful for us to uh learn ResNet 50 on our Nvidia graphics card? Sure. I'll have to make a decision if I want to include that. Right now, I'm not sure. But uh if you do see a video that follows this, then you know that I've decided that I thought it was useful um to get some Nvidia practice. But for now, I just want you to know what ResNet 50 is because you're going to see it again and again and again and again. And notice that the model, like it's only 100 megabytes, right? So think about think about um that a model has to live in memory, right? And so if if you have um you know 100 megabytes of memory then this thing can live in memory no problem. It will have no issues with inference. Uh but there you go. All right let's take a look here at the Nvidia container toolkit. So this is a collection of libraries and utilities enabling users to build and run GPU accelerated containers. uh and this uh kit uh toolkit contains the container runtime, the toolkit CLI, the CDI hooks, the container runtime hook, the container CLI, and the container library. So, a bunch of uh things collected in there. You don't need to remember them all, but just know that there are multiple things in there when you are using it practically. Um, and so when you have this toolkit installed, it's going to make it really, really easy for you to leverage the GPUs on your host system. I wish I understood this a lot sooner as I used to see these flags quite a bit when I was working um with containers and models, but I did I never really connected that it was with the Nvidia container container toolkit.
So very useful that we can see that there. But up there you're seeing we're see pseudo docker run and we are setting the runtime as Nvidia and we're specifying to use all the GPUs. Um, and all this Docker is doing is it's uh through Ubuntu, it's running the NVD Smi command, which we already learned about.
And so it's just showing through the container that we're able to access the information about the GPUs. And again, I really wish I knew this a lot sooner.
Um, but there are some options that you would want to know from a practical standpoint, but also you might get these questions on your exam, so it might distinguish between these parameters. So it's very important that you know them.
And so I've specifically handpicked these ones to help you on your exam.
Okay. So one thing you can do is you can select your GPUs for use with the GPU flags. That is the oldfashioned way to do it. That's the way that I always knew how to do. You hype and hype and GPU say I want to access all of them. Okay. Um another way that you can do it is you can specify the exact device. So um numbers are going to be assigned to your device. A device as we said earlier is another term to say the GPU. Right. Um, and so we're saying zero device zero and one. So we're selecting two devices, two GPUs in this case here. Um, and then another way that you can do this is you can enable the container device interface CDI and then you can specify your device using device flag. And this is the recommended way of doing it. So just getting my pen tool out here for just a moment. You can see here we're saying device, we're saying nvidia.com/gpu, and then we're saying equals all. So, a different syntax, but for whatever reason, it's recommended. Um, the container device interface is a specification designed to standardize how devices like GPUs, FPGAs, and other hardware accelerators are exposed to and used by containers. And so, I believe that this is specific to Docker. And so, here Nvidia container toolkit is utilizing that Docker feature. That's how I understand it. But the thing that I want you to remember is um this GPU's device and specifying it this way and the fact that you can do device and then just say what you you want to specify, right? It probably would have been better if I used an example of using a very specific GPU here. But that's what I got. So that's all right. But anyway, um we should do some do this in practice. Of course, if you don't have an Nvidia graphics card, this might be hard. You could probably go to the cloud and try to have something that has uh a GPU attached and try to run it. Um, but I'm just going to run this on my local computer. If anyone wants to try in the cloud, please tell me if you're able to do it in the comments or wherever uh on this video. I'd love to know. But anyway, let's go see if we can access our GPUs through our uh through a container. Okay, chowo chia. All right, so um this is a retroactive video. So I the next video after this is a lab. And so this stuff was so painful for me to figure out that I'm frontloading this lecture information because you don't want to watch the whole lab, but how to get this thing working. This is something I've been trying to get to work for like over a year and I had one environment that that works on my other computer. I've never been able to figure it out till just now how to do it. Um, but let's take a look at exactly what it's going to take to get this thing working. The first thing you know you have to consider is that when we're talking about the MVA container to cool container toolkit, I'm specifically saying using it within WSL 2 within a Docker container because that is how you're going to want to use it, right?
And if you have to use it other ways, that's fine, but you're not going to be able to know for certain if everything's working as expected. So, what I would recommend is I would suggest to install or create a new WSL2 environment. So here I'm doing Ubuntu20204.
Okay. And then we export it, unregister it, make a new directory, and reimpport it. And the reason we have to do all of this is just so I can rename it so it has Nvidia because I want to know that once this is working, I have an environment that I absolutely know works and I can rely on it. Okay. Um and then this last line just runs environment.
You also have to install Docker. I show that in the video, but I'm not showing it here in the steps. Um the next part is that Nvidia Smi should already work.
Okay, so we didn't install the Nvidia driver into this environment. It automatically should pass through from the host and so when you run Nvidia SMI, this should work. Okay, that might make you think that this is ready to go, but that's not necessarily true. Uh one thing that I found the instructions was really confusing is that the CUDA toolkit and the ambid container toolkit, they look very similar names. Um, and I think the reason I was spinning my wheels is that I thought I was installing the Nvidia container toolkit, but I actually was installing the CUDA toolkit. I do believe that I ended up installing the CUDA toolkit, but you says you don't need it and I don't think you need it. Um, so and it's not hard to install. It's just like writing this command here. It's like Nvidia or like CUDA tool CUDA toolkit. But anyway, we install the Nvidia container uh container toolkit. And then we want this line. This line's super important. It's going to set the runtime because the line I kept getting was like, "Oh, there's no Nvidia runtime. We don't know what you're talking about." And so I was endlessly getting that that issue. Um, then we need to restart the container and then we can do this. Now, I just remembered that I actually didn't test the other ones with the device and the CDI mode and the CDI mode is obviously more preferred, but I was just so excited getting this one to work. Um, I didn't think about it. So, I don't think I'm going to go shoot the additional one there. I don't think it's necessary.
But anyway, I wanted to put this forward because this was such a pain for me. And at the end, it was chatbt that picked up the right information. I did go through the docs initially. That's how I like to do stuff. But it just had some of the commands. I can't exactly explain um why we can't find that online easily, but the point is is that uh we solved it.
And so I hope it solves it for you. And that is going to be a large enable enablement for you. Okay. Hey folks, this is Andrew Brown. In this video, I want to see if we can utilize the uh Nvidia container toolkit. And so there's a few things we have to consider. Um and it's like where are we installing it, right? So there might be a way to install on the Windows side. Primarily the way I would want to work um on my local machine with this would be through uh the Linux um the Linux experience which is you know the WSL Windows subsystem Linux or WSL 2 in particular.
And so we have that additional layer that we need to figure out. And so supposedly there is this guide here which will help us through this. If you don't have this then this is going to be a little bit difficult for you. The other solution you could do is try to get access to an Nvidia graphics card on the cloud. I find that kind of a pain because anytime I want a graphics card I need to increase my service limits and then I'm always waiting for a thousand years. So um I think just to make it really easy I'll show here on this one and we just want to prove that it works.
So, if you have a Nvidia RTX graphics card or something that is able to do it, that'd be great for you to test out here. If not, just watch along and we'll see what we need to do. And so, here in the guide, it is going to say that we need to have WSL install, which we do.
Here, saying install the Nvidia driver GPU support. It's specifically talking about GeForce game ready uh or the RTX Quadra Windows 11. Now earlier, um I think we were looking at the Nvidia um app, right? And I think I was saying like, oh, I think I might have the uh the one I didn't want installed installed, which was the game one. So, if we go over to where is it? Settings here. Trying to remember where it was.
Um graphics.
No, not here. System.
We're looking for that driver information. We saw it earlier. Oh, drivers maybe. And so there's the studio driver right up here, but it says it wants the game ready driver. Can I just switch between them? Oh, I don't know now.
Okay. But anyway, here it says GeForce game ready driver. And so that's what it's asking for. So this is the one I'm supposed to have. Um, and so what I think I'm going to do is I'm going to update my driver. Last time I updated was April 16th, 2025. So, just give me a second. We'll be back. I'm going to do a really risky thing and update my driver.
Okay, it's still updating here. It's not taking very long, but we're going to see what happens. I would think it needs a restart. Now, it says install. So, we download it. Let's go ahead and install it. We say yes here. You can't see it off screen, but it says yes. Um, upgrades current drivers and retains existing ones. That sounds good to me. I like that. And normally when you update drivers, you're supposed to have a backup recovery point in Windows or whatever, whatever. Um, probably supposed to do that. I never do that.
I've never had any problems. Uh, maybe this will be the first time I have a problem and I'll regret not doing that because I'm jinxing it. But we'll see what happens here once it updates. Okay.
All right. So, the it's done. I did have to restart my system. What's really interesting, though, is that OBS, my recording software, crashed. I still have the original first part of it. And so, we've updated that driver, that game ready driver. I don't know why the other one might not work, but you can't always trust Nvidia's documentation as I mean they have good documentation for a hardware provider, but it's, you know, not the best. So, we have that all all the way up to date. And so, that is now uh that requirement is now out of the way. I already have WSL 2 installed. I already know that I'm running it. Um I already know how I can get into my uh WSL environment. The question is how do we get uh CUDA support for it? And so, this will be the next section. Let me just give it a read. Okay. Okay. And so I know this text is really small, but let's just bump it up here. And the key thing that it's saying here is that if we're using WSL 2, and I am using WSL 2, that um uh it's already supported. And so it's saying here, I think it's saying that if you're using a pre a prior version of WSL or a preview, you have to do a a a build specifically with the CUDA toolkit. But I'm pretty certain I have it. So once Nvidia GPU drivers install the system, CUDA becomes available within WSL 2. So apparently instantly it just happens. CUDA driver install on Windows host will be subbed inside the WSL 2. So as of here, therefore users must not install any GPU uh driver within WSL. So we're not putting anything on the inside. See, I would have thought that we would have had to, right? But if it just instantly works, that's great. Uh one has to be very careful here is the default CUDA toolkit comes packaged with a driver and it's easy to overwrite the WSL Nvidia driver. So that's why we don't muck with it. Recommend developers use a separate CUDA toolkit for WSL available over here to avoid overwriting this. Uh so I don't think I have to do anything. First remove the old GPG key. So we don't have to do that. Installation of this. So that's whatever whatever. So what are constraints? WSL 2 GPU acceleration will be available etc etc. Ensure you are on the latest WSL kernel or at least 419121.
I'm not sure how we would check that.
Um, if we open up command prompt on this Windows machine and folks that are on Mac, well, I don't know Mac, you wouldn't have an NVIDIA graphics card, would you? But anyway, so what we'll do here is we want to know um uh about WSL. So if we type in WSL, we might get some information about it. No, it actually just put us into WSL. So say WSL, I'm not sure. So find out what kernel version WSL2.
How do we find that out?
Oh, it is hyphen hyphen version. Okay, that's what I was going to do. It's frustrating when when you want to do that. Now you can do that via PowerShell or here. It doesn't matter.
And so here we have kernel 515. So what was it specifying for? It is specifying for what kernel version what 419 or 510 so well above it. Windows 11 I'm on Windows 11. Um if you continue with Windows 10 etc etc known limitations. So Maxwell GPUs are not supported. So I don't think I'm using Maxwell but let's go take a look.
Remember there are different versions of it. If we go to CUDA, CUDA Wikipedia, I think it kind of tells us um the range of these things, right? So, we go here, we go, okay, Maxwell. So, I go um I'm pretty sure that I'm using a love lace, which is all the way over here, and Maxwell's all the way down here. So, this is like a 2014 graphics card. So, if you're using something really old, like 2014, well over 10 years ago ago, you're going to not be able to use this. So, that's fine. Uh unified memory fully managed support is not available. I don't think that matters. Pin system memory I don't think that matters. Root user on bare metal not containers doesn't matter with the Nvidia toolkit for Docker 1903 only GPUs all is supported. So it's going to matter what version of Docker we're on.
So now what I'm going to do is go over to um Visual Studio Code. Usually often with WSL when you have it installed and you should have it installed. I'm not showing how to do that here. They have the instructions there um that we can take a look here. So if I drop down here, like I'm in I'm in bash, so I know I'm using WSL 2 right here. So I want to see what Docker version I have installed here. I'm not showing how to install Docker either. And we have version 27.51.
So we're totally doing fine here. And I don't have to use this GPUs all mode. So that's really good. Features not supported yet. So the following table lists of the set of features that are currently not supported. So MBSMI does not support all queries yet. So it is supported but just not all queries.
OpenGL CUDA interlope is not supported yet for our exam. This stuff doesn't matter. We are doing this for hands-on because I know this is super useful when you start trying to work with workloads locally. So that's why we're spend the time here. Okay. So we already have a Docker working here. And in my instructions in my PowerPoint presentation, we have a command this one here. And we're going to run that and see if it works. So, we'll type in. Now, normally you if you install Docker, you might need pseudo. I've uh changed my Docker, so I do not need pseudo. I'm not not going to show that, but that's fine.
So, go ahead and type in hyphen RM. And I'm going to say runtime equals Nvidia.
And we're going to say hyphen GPUs all.
And I'm going to say Ubuntu and we'll say Nvidia SMI. And we'll hit enter. So, here we have an issue. says Nvidia Smi unknown or invalid runtime Nvidia.
Okay.
I mean that should work.
So what we'll do is go up here and we'll take a look. And so they have a command here. So I guess we really need is the full path. I mean that's an example of one.
Um in some cases running the container you may encounter this. No. So I want to go Nvidia smi container because there should be a command here that's saying it's not found. There should be a command here to run it and SMI is now supported but in order to use it you have to copy it manually to the user bin directory. Really? That can't be right. Well first of all let's just see if we can do Nvidia Smi in here.
Okay so uh we have Nvidia SMI. I didn't install anything on this thing and so we are able to access Mid Smi in WSL but I want to run it within a container in WSL. That is my end goal. Okay, that's what I want to do. Um, so we will go back over to here and I'm trying to figure out how we can get that. So over here we have the Nvidia container runtime.
That's the thing we're trying to run right now. And we scroll on down. So, Docker is the most widely used one. So, we have Docker pseudo run whatever.
Let's go ahead and just try this.
Okay, I feel like this should just work.
What am I missing?
Docker run requires at least one argument. Well, according to them, it doesn't. Run CUDA container from DockerHub. Yeah, because it should pull the Nvidia container from there. But over here, it's saying Docker run requires at least one argument. Let's go ahead and we really shouldn't have this.
We'll do pseudo. But yeah, it looks like it's not even specifying an image, right? Normally you'd have the image name in here and it doesn't look like we have it there. That's why we had Ubuntu earlier. Um, so I wish they would just give us an example.
We'll follow through into the toolkit here and maybe we'll find some information on it. Uh, user guide.
Please don't bring me to this. Yeah, it brought me to the Nope. Nope, different site. Okay. And so what we're looking for is literally the command to run it.
That's all I want is the command to run it. It's got to be somewhere here.
Running a sample workload. We'll go over to here. Okay. So, this is the one. So, this is exactly the same. It looks like it's exactly. That's probably where I got it from. Pseudo. And you can see that I obviously got it working at one point. That's how I took that screenshot. So, we have pseudo docker run rm runtime Nvidia GPUs all Ubuntu MV Nvidia SMI. So, I'm not crazy. We'll go ahead.
We'll try this again.
Unknown Damon or invalid runtime Nvidia.
I must be crazy.
Okay. Give me just a moment. Okay.
All right. So, over on the Windows documentation or Microsoft, they're saying, and look down here, it says enable Nvidia CUDA on WSL 2. Remember that we read that it said we didn't need to install the GPU driver on the WSL2 side. So, I'm hoping that it's going to tell us we don't have to do that. So, Windows 11 and updates um includes PyTorch, etc., etc. This includes um as well as Docker and Nvidia container toolkit support available in native Linux environments. Right. And this computer is only a year old and so it should have a version that is up to date to have this. But download install the Nvidia CUDA enabled driver for WSL.
But we said we didn't need it, but now we do. Um so we'll go over to here, I guess. Conflicting information, right?
We're back over here.
Okay, so clearly we're missing something here. Install the driver. We did this, right? This is the only driver you need to install. Do not install any Linux driver, uh, display driver in WSL. As far as I'm aware of, I have not done so.
Launch your preferred Windows terminal command prompt. Ensure you have the latest WSL kernel.
We could do that, but it says that we have a version that is the latest that we need. Set up a Linux developer environment.
The latest Nvidia Windows GPU driver will fully support WSL 2 with CUDA support in the driver. Existing apps can run unmodified in the WSL environment to compile new CUDA applications. A CUDA toolkit for Linux 86 is needed. CUDA toolkit is support for WSL is still in preview stage as as developer tools such as profiles not available yet. However, CUD application development is fully supported in WSL 2 environments. So maybe it's saying it's supported but you still have to do something. As a result, users should be able to compile new CUDA Linux applications with the latest CUDA toolkit. Okay, so maybe we do have to do it. So once NVIDIA GPU drivers are installed, we have done so. CUDA becomes available within WSL 2. It makes it sound like it's already there. The CUDA driver installed on the Windows host will be subbed stubbed into the WSL 2.
Therefore, users must not install Nvidia GPU drivers with the WSL 2.
One has to be very careful uh here as the default CUDA toolkit comes prepackaged with drivers and it's easy to overwrite the WSL 2 and via drivers with the default installation. We recommend developers use a separate CUDA toolkit for WSL2 Abuntu available over here pages uh page to to uh to avoid this overwriting.
Okay, so let's go over to here.
Wait a second. This is so confusing. Um, we recommend developers to use a separate CUDA toolkit for WSL 2.
So that makes it sound like we already have we do have to install it into WSL 2.
But then they say users must install the GPU drivers. Oh, so I think what it's saying here maybe is that don't install the drivers separately. Install the CUDA toolkit which will install drivers on WSL 2. But I was thinking that it just gets inferred through it. And so there's probably where the confusion is. Option one, install the Linux x86 CUDA toolkit using WSL package. Okay, so the CUDA WSL local installer does not contain the uh GPU drivers. So by following the CUDA download page, you'll be able to get the CUDA toolkit installed in WSL.
Okay, so let's do that. And so we're over here and we have a bunch of links and we have Linux x86 WSL WSL Ubuntu 2.0. So it knows exactly what we want.
And then we have our installer type. Um I don't care. Let's just get it working.
Network, I suppose.
Uh I think we want No, no, just uh local. Local, right? Because I will download and install. This one's easier, though.
Yeah, we'll do it this way. So, we'll go ahead. I'm just going to grab all three of them. I'm going to or all four. I'm going to be real dirty. I usually don't do that, but we're going to go for it anyway. And we're going to paste that in and let it rip. Okay. And we'll give it a second. All right. So, we've gone ahead and um the toolkit should be installed here. Let's go take a look and see what happens if we go back and run this command again with the pseudo wall.
Um and so we're still getting this issue.
H okay. So, the toolkit is definitely installed, but the other thing is that we were able to do Nvidia Smi before.
Hm.
Hm, good question. Because if the toolkit was already installed and we was already able to do Sorry, if if we just installed the Nvidia toolkit, we ran Nvidia SMI, does that come with the driver? Just give me a second. Okay. So, what we're going to do is I'm just going to see if we can just restart our WSL.
So, I'm stopping mine here and then uh see if we can get this back up again.
WSL.
I think that um if we go back to Visual Studio Code, it might just launch it for us.
Okay. So, I'm just hoping that it automatically triggers WSL.
Go here. Make a new one. No set is hanging.
Okay, we'll just write WSL. I never remember the WSL commands. Like, this will get me into WSL.
I mean, we could also just do it from here. We don't actually have to use um uh we don't have to use uh Visual Studio Code for that, but we can go ahead and I will try that command again.
I promise you getting this to work is worth it. Um it seems like a random video in this course, but it might be the most useful video in this entire course is is getting this to work.
Seriously. So, I'm going down here. I'm looking for that individual command. It is not that one.
It is not that one. Where is it? This one here. Okay. So, we'll grab this. We don't I don't need pseudo, but um we'll go ahead and try it anyway.
Let me just take that out.
And it just will not it will not let me do that. So, the only thing I can think of is that, you know, maybe I had the driver installed here prior um and that's where it's running an issue. The other thing is that maybe we could create a new um WSL environment and then install it there and we'll have less of an issue, right? Um, so that'll be something we need to figure out. So I'll just like how to create a new WSL environment.
Okay.
And I'll just go over to here.
And so we'll just try this really quickly. I wasn't going to do this, but obviously you can now see what we're doing. I don't I want to leave my other one intact. And so maybe this will just be a CUDA one. So we'll do this.
is it but uh but can be installed with the pseudo app get not found. Oh no. Oh, it's because we're in WSL right now.
Okay. So, we'll reopen command prompt.
I'm opening in ad administrator mode.
It's like a button. I can't show you.
It's on my other screen. And we'll go ahead and we'll try this again.
And so, I know it's already installed, but I want to install another variant of it.
Okay. Okay. So, what I'm going to do, we're going to go ask Claude or Chat GBT. I suppose it doesn't really matter which one.
We'll go here. Um, you know, so I need to a separate um Ubuntu version to rule out uh Nvidia drivers. What's the command to install Ubuntu as a separate environment and name it with Nvidia and let's see if we can get that command there.
Yeah, it does depends on the version um to uh then rename it after the export import.
Okay. Well, that's what I want to know.
I want to do the list. So, we'll go over to here. We'll take a look. hyphen hyphen list hyphen hyphen online.
And I'm just type exit here.
WSL hyphen hyphen list hyphen online.
And it says Ubuntu. We have 2404.
Looks like the latest that I can install. If we go up to here, that's what I want.
Then rename it after the import export.
They're not explaining that part of it.
Where how do we rename it?
Let's solve it. Normally, export it. Oh, that looks complicated.
But this is the only thing I can think of because I've been having this problem for a long time. Works on my other computer, but not this one. I just want to rule it out. So, now I'm literally installing another uh another one here. And so we'll just give it a moment here. Hopefully it doesn't override my existing one. If it does, that's a bummer. But we'll see what happens. Okay. And by the way, it's going really slow. Uh, so it is progressing. There we go. I think initially it was downloading. Now it's installing. So I thought I was just going to say it's going like it was going to take forever, but maybe not.
Here it wants a name for the new Linux.
Um, we'll just say Andrew.
And then we'll have a password. Um, I want to have no password. Can I have no password?
Oh, it wants me to have one here. So, I'll put in a password here.
Obviously, I'm being secretive about it.
And so, in theory, we should have one installed.
So, let's go ahead and we'll say WSL Ubuntu hyphen hyphen list.
Uh, sorry, hyphen hyphen list here.
Oh, we're inside of it right now. Uh, I think exit it. Okay. So, we'll go ahead and we'll try this again.
I think list here.
And so, this is the default one, WSL.
And then there's the 22.04. So, I believe that's the new one that we just installed. And so, I'm going to go ahead here and we're going to go and say WSL export 2204.
Um, like this.
which is silly that we have to do it this complicated way.
Okay, that's now exported. We're going to go ahead and unregister it.
And before we do that, I'm just going to scroll up and see what we had before.
Uh I can't. Well, [snorts] let's hope I don't blow anything away. Um and so we have here where it says unregister unregistering the operations completed.
And then we'll reimpport it back into here. Assuming that these are the same paths. I believe that it is.
And we'll go ahead and try that. Yeah, because it can't find it. Um, and that's what I was wondering about.
So, yeah, like where's the pathing?
Like, it obviously isn't there.
You didn't tell me to do that. You must create a destination folder before running the import command.
What? What do you mean? I have to unregister export WSL import from here, etc., etc. Ubuntu.
Well, hold on a second. What if I do ls or pwd diir?
Oh my goodness. Um, [sighs] sure. Why not? Um, it's confusing that this says PowerShell because this one's clearly PowerShell.
We'll go ahead and we'll do that, I guess.
Make sure the tar file exists. DIR.
It does, I believe, based on what I'm reading here.
I mean, I don't really want to put in system 32. I guess that's just where I was, right?
And what we'll do is we'll go back to this other command here where we're importing it.
Great. Now we're importing it just so we have a rename. And it's now imported.
We'll do WSL hyphen list. And so now we have one. This is Nvidia 2204. So, what I'm going to do is I'm going to go back over. Well, I guess we can just launch it now. Let's say we want to use that.
Um, so that's what I'm looking for now. It's uh to use it. Um, so we'll go here and just say wsl help because I don't know what the command is.
Like how do I specify which version I want to use?
Okay. How how do I launch it now?
It's probably WSL follow by the name.
That's what I'm thinking. It Well, very close. So, it looks like it has a hyphen D. Unless hyphen D means it run in the background. Hyphen D set it as the default. No, I don't necessarily want it as the default, but it looks like maybe hyphen D as set defaults this. I'm not sure if that's the same command. What's the uh what's the hyphen d command for?
It's probably distribution.
Distribution. That was my guess. We go down here. It could have been short for default. That's why I have to double check, right? So, go ahead. We'll run this now.
And so, we're waiting for this to start up. Do I hit enter here?
There we go. Oh, wait. Hold on. We'll do clear. I don't know what's going on here. So, I'm just going to close that out. We'll try that one more time. So, say command prompt. Run this as administrator.
Yes. We'll go back over to here.
Oh, it's not uh it's not listening to me here. We'll hyphen D Ubuntu.
2204. Sorry for the small text. It's very hard for me to bump it up, unfortunately. And so, we have that.
We'll say Nvidia and hit enter.
And I think we're in there. clear.
There we go. Definitely looks different.
U which is fine. The the default one has like nice coloring and stuff. Uh so let's type in Nvidia SMI. And so off the bat, we already have that information here. So it says CUDA version 13.1 driver version NVIDIA SMI Nvidia GeForce RTX 4060. So it knows I have it. Do we have Docker? There's no Docker. Okay, good. So, um, now I'm going to just type in exit because I'm just curious, did I blow away my old one? If I just go ahead and say WSL and hit enter, do I get nice colorings and all that kind of stuff?
Yes. So, my original one's still intact, but I'm going to go ahead and type exit again, and we're going to go back up to here. And so, this one doesn't have Docker installed. So, I guess I'm doing everything from scratch here. So, we'll go ahead um docker install uh buntu and uh we'll get this installed quickly.
So, I guess you folks are getting a little preview. I can't tell you how many times I've installed Docker so many darn times, but um we'll do it again.
So, install from the app repository.
This is the official way to do it. Um and this is the way that I would normally do it. Add the GPG. So, we'll go ahead and do that. Can I bump this font up? Nope.
Nope, it will not cooperate with me.
What I'll do is we'll go back to Visual Studio Code because we're now running that there. And so we should be able to access it here now.
So that is starting that, which is fine, but I don't know which one it's running.
This is probably the default one. This is the default one because we have our beautiful coloring. But if I go here and we say new terminal, I should be able to, if it stops closing out here, um, I should be able to choose that other one in here somehow.
Um, it's not letting me switch it.
Usually you would see the options here and then you could switch them but it's not doing that here today.
Bash t-mucks bash origin split terminal.
So it's not being very clear. We also have this option here WSL connect to WSL. So this is like WSL remote. So we can switch to it. I'm going to click on this.
I mean it's already connecting but I want to connect to a very specific version.
Right. So, I'm going to go back down to here and we'll say connect using distribution. So, we'll click this one here. And so, now I'm specifying the Nvidia one. So, now we can do that. Now, I can bump up that font and we'll have a much easier time. So, that one's now running here. We'll go ahead. We'll go to terminal. Okay, I'll hit enter and let's go get Docker installed. Shouldn't take too long.
Okay, so this one is going off the screen here. And then that's done. So we'll do the next section here. Add the repository to the app source.
Okay. So we're just pasting that in.
That's running. That's already done.
We'll go down below and we'll grab the latest version. We're getting Docker Compose plugin, Docker Build X plugin, containerd. So a bunch of stuff with it.
That looks good to me. I'm I'm happy with all that.
It's a little bit hard to select, but we'll grab that. Um I feel like the last time I installed it the instructions were a little bit different but that's what I find. Hy capital Y. So it just forces everything along the board to install.
Okay. And so that is going there.
And so now that is uh installed. And so we'll do pseudo docker hello. Was it like hello world?
Hello world. Is docker installed? It is docker hello world. I thought I need to uh make it pseudo list.
Yeah, pseudo.
I always forgot how to do like uh docker hello world. Let's just type that in.
Hello world. And if we go here Oh, run. I forgot the keyword run. Yeah, you think like with all the courses I've made, I would already know how to do it.
But no, I want to see if it's sudolus.
I think it worked. Sudalus, didn't it?
That's really nice. Normally I have to run um pseudo in front of it. So I'm not sure how the instructions change that, but I whatever like whatever this new instructions are for installing that was really nice because I didn't have to specify pseudo. So not sure why, but not going to complain. So we don't have the NVDA toolkit installed. The NVD SMI thing works in a new WSL environment, which is really interesting. So, what we want to do is go back over to that command which was uh not here uh over here.
Uh where is it? Here. This one. And then we'll give this one a try. So, fingers crossed that it works.
Unknown runtime.
Unknown runtime. All right.
Oh, well, well, hold on. We didn't install the MV tool kit, right? So, that's it's a new fresh environment, and it's saying that we're supposed to install it, right? So, we'll go ahead and we'll grab it. Uh 13.1, I'm assuming, is the latest one. I I can't imagine it would take us to the oldest one. Um so, we'll go back over to here.
We'll try this.
Okay, so now this is installing.
And we'll give it a moment to finish.
All right. So, we have the NVIDIA toolkit installed. And so, in theory, this should work. And it just keeps telling us that there is no runtime envir. Like, this is driving me insane.
Okay, so we've ruled it out that we have a fresh um a fresh WSL environment in there, WSL 2. Uh we can see that it already is accessing the driver from the other side. The only thing I can think of is like restarting my computer. I'm not sure why I would have to do that, but we did install a new driver, but clearly we can see that is accessed through there. We installed the Nvidia toolkit as per instructions, even though it's a bit conflicting. So there's something that's missing here, but give me a moment. Okay, I'm going just go off screen try to solve it. So, everyone's talking about pseudo as the issue. Um, but um this one's saying in particular that there is a configuration for rootless um and over here it says I had the same issue invidia runtime not being recognized in rootless mode. Turns out there's a specific docker configuration for rootless mode over here. Um I mean technically I believe we're setting up to root list because I'm not having the pseudo in there. But normally when I install Docker from scratch, I have to say I want to not use pseudo um which I hadn't had to do this time. But here it says to configure container runtime for Docker using bruteless mode. Follow these steps. Configure the container runtime for Nvidia Ctk.
Uh Nvidia Ctk run I don't even know what that is. I we haven't touched Nvidia CTK which is not even a command apparently even though I'm pretty sure we installed Uh, let's go back to the top here.
So, I'll be back in just a moment. I'm going to like maybe the toolkit's not even installed. Well, hold on here. What happens if we did CDK? It says command not found. What if we put pseudo in front of here? Because we definitely installed the toolkit, right? It installed, right? Updated running.
Did we not install it? I could have swore we installed it.
Okay. How how do we confirm Nvidia Toolkit is installed?
Oh my goodness. We didn't need that photo there. MVCC could be it. I mean, that's the compiler though, isn't it?
MVCC hyphen version. I don't think that's correct.
could not be found but can be installed with the MVID thing. So maybe we actually just never installed it.
So here it says yeah we did this. If I go back to here and try this the latest version is installed.
Okay, we'll go over to here.
Uhhuh.
Mhm.
Okay. So, how do we do our sanity check?
How do we know? Oh, hold on here.
Um, run the installer. Run the installer. Pseudo should hold on here.
So, perform the installation. Reboot the text mode. Cons number three.
Reboot the system. Sure. We're not rebooting our actual system. We're just rebooting this one here. So, it's now going to reconnect. Maybe that was the way we could have restarted it earlier.
I just didn't think of doing that. So maybe it'll reconnect here.
I've never done a reboot on WSL, so I'm not sure if you can do that just like a regular machine.
Seems upset that I did it.
Um, we'll just go ahead here and try try this again.
And so I believe it's been rebooted.
We'll go here to terminal.
Terminal. Oh no, I buggered it up. We'll go ahead and command prompt here and we'll say WSL hyphen list.
We'll say WSL uh I think we learned earlier that was like shut down.
And then we could specify the distribution this way. Hyphen D like this.
And so now what we'll do is we'll go ahead and restart it.
And so in theory, yeah, it should be running now. We'll go back over to here.
Try this again.
It should establish a connection here.
Sometimes that messes things up. So, we'll go back over to um I'm telling you, it's really worth it.
You think I'm crazy, but if we get this working, it's a big deal. So, we did that. Um, we did the reboot here. What was the command? It was like this that can't be found.
Yeah. So, I mean, we could also just install it like DNF like they had there or appget.
I just don't understand because we already installed it.
Let's go for it. pseudoapp install.
We really shouldn't have to do that because we used the correct way of doing it. Toolkit here and it's installing it. It looks almost the same. It's like 13.11 NV NVCC version hyphen v.
Uhhuh.
So here to avoid automatic upgrade and lock down the toolkit to the X version. That's probably why we're installing that specific version. Driver installation basic instru instructions. Installation performed the pre-installation actions.
Um this can be accomplished with number three. Consult your system. Reboot uninstaller. No.
Oh my goodness. This is like very hard [snorts] to get working. And so what I'll do say what what can we do to check if NVIDIA toolkit is installed on Ubuntu WSL EGW NV NVCC.
Yeah. But we know that that Nvidia SMI works, right? So we go here, right?
Right, we get that. No problem. Then we have MVCC version. I mean, we didn't do have inversion, but it says it's not found.
If we get this, you should expect this.
This is an issue in your Windows driver, not your Ubuntu. You should see the GPU.
If this fails, it doesn't fail. Check if CUDA toolkit is installed.
I guess the tool Oh, that's the CUDA toolkit. That's separate, right? If installed, you'll see something like this. If you get that command not found, CUDA toolkit is not installed in the distribute. Check if CUDA libraries exist. Okay. But do we need CUDA toolkit for the um CUDA container toolkit?
So there's three layers of the WSLGPU system. We have Windows WSL GPU support. We already have that.
CUDA toolkit optional. You're compiling CUDA code. We're not. You need the MVCC command. We don't. What about the toolkit for Docker GPU WSL? We need it.
Um, the container runs its own thing.
We can try this command which is giving a very specific version. I don't think that's going to work. This one's not saying runtime at all in here though.
Still pulling the image though. Could not select drivers.
Yeah. And so that still doesn't work.
Are you running Docker inside WSL? Yeah.
Yes.
Right.
installed in Ubuntu, which is confusing because it says you're supposed to you only need the Windows driver plus the Nvidia to Oh, yeah. Well, we know that.
We know this, right? So, we go down here. Confirm WSL sees the GPU. We know that it does. Confirm the Docker version is installed in WSL. We know that it's installed. It's right there.
Install the WS toolkit. We've already installed it like three different ways.
Let me go hit up and just see what command we ran. No, that's the tool.
That's the CUDA toolkit. And then we go back over to here. Have I just been installing the wrong one this entire time?
Oh my goodness. Have I been installing the [laughter] the CUDA toolkit?
Please tell me that's not what it is.
That I'm just like super super dumb dumb.
Okay, so what I'm just going to do because maybe that's our problem. Maybe we just haven't installed this entire time.
Please tell me I'm just dumb.
Okay. And then um we'll go back over to here.
So here we have I'm not sure why we'd have to run this.
What's this for?
tells Docker to use the Nvidia GPUs by registering the runtime.
Okay, well that sounds good to me.
Nowhere did I see that other other places. Config file does not exist.
Wrote it is recommended to restart the Docker Damon. And so that was the second instructions we had.
We'll go ahead and try this.
There we go. Um, and so now what we'll do is we'll go ahead and I'm looking for not that one. The real nice one we got. This one.
Oh, is it working? Is it working?
Yes. Okay. So, there you go. Um, I'm going to go make a slide. you're going to already have watched this video, but I'm going to go make a slide of that video and just get this stuff up front because that was such a headache and I cannot tell you how much I've been struggling with that. Um, but anyway, the reason why this is really useful is that you know if you are trying to work with Nvidia and WSL and containers and a lot of times you'll find containers online that just you have workloads.
This is the stuff we run into. Okay, but anyway um that's all I wanted to accomplish here. So technically we're able to run this. Maybe I will go and now make the the ResNet 50 video as I really wanted to make that but I did not want to run it the other way. I wanted to run it, you know, through a container like this and so I might come back and try that in a separate video. Uh but yeah, we finally accomplish accomplished what I wanted there. So there you go.
All right, we're taking a look at Triton Inference Server and this is an open- source inference server. I don't know why I don't have the R on the end there, but it serves the model so you can make inference. And I know it sounds like I just said what I mean, but the easiest way to think about it is like think of a web server. A web server serves up web pages. So an inference server uh will serve up a um access to a model to go make a prediction or inference and come back with something. And I believe that they market this probably as like an enterprise solution that it's open source. It's good for any cloud provider, etc., etc. So, we have that nice big diagram on the right hand side where it says NVIDIA Triton inference server. Um, and it has parts in it, but let's go take a look here. So, Triton supports inference across cloud data center, edge, and embedded devices uh that are running Nvidia GPUs. It works for x86 and ARM CPU. So, technically, you know, my computer should be able to do it. um adabus inf inriia [snorts] which I believe is Adabus's own silic uh silicon for um acceleration. You can deploy the model using tensor RT, PyTorch, Onyx, OpenVO, Python, Rapids fill and more. Um and the Triton I know it's hard to say inference servers part of the NVIDIA AI enterprise. So there's that AI thing. Um but yeah, these things are not normally that complicated to use. Um, a lot of times you'll download a repo or you'll just run a run it through a Docker container. Um, and there's a lot of inference servers out there, but you should know that the only thing you have to really remember is that Triton is an inference server and it's for calling your LLMs, okay? And it serves up your LLMs. Okay. All right. Let's see if we can use this uh Triton inference servers. I've used inference servers before. I haven't used this one. And so, uh, it's saying here that it should be able to do it. But I'm not using enterprise enterprise grade equipment here. I mean, I could spin up an, uh, you know, some compute here. But let's just see if I can actually utilize it locally because they do have a simple example here. And it looks like it's using densenet onyx as the model. So my question would be like, what requires denset? So let's go ahead here and just look up denset um because I I just don't know what that model is.
And um what might help here, I'm just going to do this off screen for two seconds. We go over to chatbt and I'll just ask like what is what model is this?
Is this something I could run on my RTX 4060?
Because I imagine this is a very small sample, but I'm not sure. I don't remember all the the name of the models.
It's almost certainly a dense net. Yeah.
Image classification exported to Onyx format and yes, you can run in 4060 easily. Good. So that's all I want to know. Um, so what we'll do is go over to here.
I assumed it was something we could run because they want you to be able to run this on some kind of consumer hardware.
So you know that it's easy. So here's create the example model repo. Launch launch it here. And see we're running through Docker. Um, and the good thing is is that because I solved the thing with WSL earlier, if you didn't watch that lab and get that set up, you might want to do that. I should be able to run this um containerized within WSL 2.
Okay, setting sending a request and we should be able to do that. So, let's see if we can get this working. So, go over to Visual Studio Code just because it's a bit easier to work with. It's connecting to my Nvidia 1, which has the Nvidia container toolkit installed on it. So, that should make my life easy.
And um I'm just going to look where we are right now. PWD. Uh do I got like a home directory or something? We'll go to Andrew here. This is a fresh machine.
So, make directory. I'm just say sites and we'll go here and I'm just going to clone that. Assuming I have get in here, get is there? Yes, it is. Can I clone it? Let me clone it. If it's public, should be no issue here.
There we go. We're cloning it.
Hopefully, it's not too big. We'll cd into the server.
There we go. We will fetch models. Not exactly sure what that's doing, but I'm assuming it's downloading the the actual model.
Inception V3. Yeah, I'm not exactly sure.
Once that's done, I can just do code.p period and then we can see the structure what it's downloading. So, we'll just give that a moment to download. Okay, it actually looks like it's doing more than downloading the model. Looks like it's getting all the dependencies and everything that we need. Um, I'm not sure if it's creating that virtual environment. I don't see that here, but it could be doing that as well. All right, that only took, I don't know, a few minutes, so it wasn't that bad. And so we have the model clearly download the model. The model's small. It's only 44 megabytes. You got to consider the size of the model is what gets held into memory, right? So I mean clearly I have enough memory to hold 44 megabytes. Um the other factor is I have to consider is that I'm running OBS here which takes up memory and is using graphics acceleration or something there. Um but I should be fine. So now that we have it downloaded, there is the lines to go ahead and run it. Um and so looking here, we see GPUs one. So use one GPU.
We could say all because we only have one net host. So use maybe networking host out to the internet uh pwd for the current uh working directory to the repository to the models and then here it is running the tr uh triton server as a container pulling it from probably docker hub here and uh here the model repositories it's just specifying which model to load. So possibly um there is more than one model that we downloaded not just a single one in here that's what I'm thinking. Uh, but then down below here we have server, docs, examples. Is that where we're supposed to be? That's where it put us.
And we do have model repository here.
And notice here it says model repository. So it's possible that there's a bunch of models. There is. So we have densenet inception. I don't know what all these are, but we're going to run denset. Obviously does something with images. And we'll go ahead and hit enter. And so it's saying it can't find triton because of course we have never used that image. So it's pulling that down from uh DockerHub. So that's getting pulled down. So that's going good.
We'll give it a second. All right. So right away it says unable to get power usage for GPU. So that doesn't necessarily mean a bad thing. It just means it can't get those metrics, but it's still successful for uh pulling the image. That is good. And so we ran that line. And so I'm assuming that it's running uh the way that we would normally check that I guess.
Oh, did we actually run the container? I thought we just did fetch model, right?
Oh, okay. Fair enough. Um, so somewhere here on the lefth hand side, we should have an extension for containers. Maybe I don't have it. Docker. I could have swore [snorts] I had the Docker container.
Uhhuh.
Maybe it's this one. Oh, it's not installed. I'm not sure where it went.
It definitely was installed before, but the reason I want that is I want to see if it's running. So, it's clearly running here. It's not stopped. That's really, really good. We're going to go ahead and make a new tab. And, um, we'll bring this back open here.
And so what I want to do here is you're sending an inference request.
So in a separate console, launch the image client example from here. And so I'm guessing it's there, but it says workspace install, etc., etc. Is that really even there? Let's go back to here. It didn't open up the same directory, which is home.
home, Andrew, CD sites, server, um, was it examples? Where did it go?
Here we had, um, docs examples.
It seems like a really strange way to navigate, but examples.
I'm going to go m repository here into the denset one. I'm just guessing.
No, that's not it. Because here I'm looking at the next step right at it's a little bit hard to read, but we'll just take this command out here for a second and we'll go up to here and we'll make a new new file here. And what I want to do is just get the lay of the land here. I just want to be able to see what I'm doing.
Okay.
And I'm just trying to make sense of what I'm looking at.
This looks like Oh, it's another Is that another container?
Okay, [snorts] so it looks like what we're doing is we're just launching another container or is that a different example?
Launch the Triton uh Trident container, sending inference requests, launch the image client example from the NGC Triton SD container. So I think the second container here it's just I guess what they've done is whatever it's doing it's has the image creating the image and it's going to send it over the network I guess right without opening up the container and looking at it. I'm not really sure, but I mean that's what it looks like, right? Because they're saying one is running the server, one is sending the image, but we'll just do that. We'll send the image. It's going to download that. Again, it's probably just literally a container with an image in it. That's my guess. And so that will download it and then send it over to the Triton and make uh inference. What the API looks like, I don't know. What we're dealing with LLMs, a lot of them standardize based on the OpenAI standard. This isn't an LLM. This is just a smaller traditional model, I believe, like vision model. And so we'll see those capabilities here in just a moment.
Okay.
And so that is that's pulling u and it looks like it sent a request and we have an image, right? So it's gone here. It says so if it's working uh there's the image and we have coffee mug, cop cup and coffee pot. So that looks like what it did was it identified it. So I'm curious like densenet. Let's go back over to here. Densenet. What is densenet?
is well I just curious like denset denset versus ResNet.
ResNet adds features while Desnet concatenates features.
Designed to train very deep models by addressing vanishing gradient. Okay.
Oh, okay. So, it's just dependent on your your requirements here. So, here like limited GPU training priority or you have limited training data. So, I guess it's trade-offs in in its approach. That's kind of interesting.
Maybe I will go retroactively go back and add denset just because we've covered in this course. Um, and we might want to just have a comparison. I'll have to do a little bit deeper research on that. That's not enough information uh for me. Why isn't Desnet adopted as extensive as ResNet? Yeah, I don't know.
But anyway, the point is is that we were able to perform um our operation there and that was really really good. Was that the last one? That was the output.
Yeah. And so not hard folks. Um I'm going to go over to here. I'll just stop my container. So right click stop and I'll see you in the next one. Okay. Chow chia. All right. I just wanted to take a moment to compare uh denset uh versus ResNet. So they're both convolutional neural networks and I found out that they actually all come from this imageet competition which happened over multiple years. Um and so they had this one very specific um thing or use case where they wanted to do image classification which had to do with like detecting a dog and a plane and whatever whatever. So that's when you ever whenever you look at like that ResNet example or or denset they're always using those five images because that was the original use case. Um obviously these things can be retrained not just for image classification but for other things but the imageet challenge was to solve things like vanishing gradients degradation as uh as layers increase low convergence but it what it told me was basically to make networks deeper and make training easier and specifically with convolutional neural networks. Okay. And so if we just compare these two um the key difference is um you know speed and training. So, ResNet is a lot faster to train, requires less memory, whereas dense sense is slower. Uh, denset is going to be a lot better for medical imaging or higher higher accuracy. Uh, ResNet's going to be for general vision, faster inference. Um, these things can run on, you know, pretty pretty um, uh, lower lower tech, but um, there's a lot to unpack here with these two uh, or just in this family. But again, I just wanted to make it very clear what the stuff was. I did notice that on the exam they did talk about vanishing gradient points. Um, and I'm trying to remember [laughter] vanishing grading points. And so I think in my Jenny essentials course I cover vanishing ingredients. Um but basically the idea is that every time you are training there's a val like those weights are changing and eventually they might eventually end up at zero or they'll they'll stall because there's only so much you can do as you change them. That's the best way I can explain them. But if you do see vanishing gradients, you don't have to really worry about that um on your exam.
Just always try to remember about you know the technical requirements. Um, and these conceptual things aren't that as important in the exam. But anyway, I just want to compare Densenet versus ResNet. I still want to see if we could train ResNet for something for fun. Um, but yeah, we'll see you in the next one.
Okay, cha chia. Let us take a look here at Onyx, which is an open neural network exchange, and it's an opensource framework agnostic model for format that can be exported from most major frameworks. Onyx provides a common language any machine learning framework can use to describe its models. So imagine uh you need a linear regression.
Um Onyx contains all the math functions necessary for ML models to implement its inference function. So by sticking with Onyx, we're normalizing um uh all these function calls so that we can move from one thing to another. Okay. So ML models implemented with the Onyx uh with the Onyx framework is often referenced as Onyx graphs. Okay. And that's kind of a visualization of it. ANX does have an entire ecosystem of tools to help build frameworks and converters, deploy, optimize, visualize Onyx. And um there's more than just these, but like the ones that I thought were really useful are the optimizers and the visualizers. One in particular that's kind of fun is Netron where you can go here and load in an Onyx model and it'll completely give you the graph and it supports quite a few things. It's not the only visualizer, but it's one of the cooler ones you can use in your browser. Um you can saw it locally. There's no cost to it. So, it's pretty darn cool. Um, and it does support more models than what we're or model formats on the lefth hand side, but um, Onyx is what we want to use it for. Okay. So, Tensor RT for RTX is an inference library that allows optimized and easy deployment of AI models for the RTX RTX GPUs, which I have, which is great. Um, and Tensor RTRTX ensures that your AI workloads never steal performance from graphics, which that's a challenge that we have.
You know, I'm running OBS and so that might be something that I'm fighting with all the time. A drop in replacement for the Nvidia Tensor RT in applications targeting NVIDIA RTX GPUs from uh Turing all the way to the Blackwell generations. Tensor RT introduces just in time optimizer. Tensor RT for RTX is compact under 200 megabytes. Um, and Tensor RT for RTX optimizes CNN, diffusion and speech models expressed in Onyx or native uh C++ APIs for running on NVIDIA RTX GPUs. Unlike NVIDIA Tensor RT library inference uh inference inference library, this library does not support other NVIDIA GPU platforms like data center, edge, and embedded. The reason I'm covering this is because when you're looking at tensor RT, it can get really confusing because the wording is very similar. And so I mean, of course, there's a benefit to me to know this cuz I have an RTX graphics card, but um just to not get confused by the terms and for you to know that RTX is a professional workstation, okay? Not for data centers.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03











