NVIDIA Inference Microservices (NIM) encapsulates all microservices needed to run AI models at scale into a single Docker image, enabling deployment on serverless GPU platforms like Azure Container Apps where applications can scale to zero and users only pay for actual usage, simplifying the complex process of deploying enterprise-ready, scalable AI models.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
EP 1 Highlights | Run Open Models on Serverless GPUs
Added:We are going to be looking at how we can run NIM models on Azure serverless GPUs using the container apps platform. And we're going to be covering quite a lot.
>> Deploying an enterprise-ready, scalable, high-performance AI model is really not an easy task. There is [music] a lot that goes into it. NVIDIA solves this complexity by creating NVIDIA inference microservices, or NIM for short. A NIM takes all of the microservices that you need to run AI at scale, and it encapsulates them into a single Docker image. This makes it very easy to run and very easy to deploy.
>> I'll talk about Azure container apps.
It's actually my favorite platform deploying [music] anything on Azure because it now has GPUs, it's a great fit for this as well. You can actually say like, "Hey, you know, like if there's absolutely no traffic going [music] to my app, you know, to scale down to zero." And you're only going to pay for what you actually use. [music] >> Well, if we wanted, let's say, a gigantic workflow to write code, we would first pass our prompts to an orchestration agent, which could plan and interact across multiple different agents, which all have their own different tools.
>> There's a lot of code here involved in calling these tools. That is why we typically use agent frameworks [music] when we want to have agents that call multiple tools. What we have here is a bunch of examples using the agent framework from Microsoft. This is a framework that was just released. And it's basically the successor to both semantic kernel and autogen. So, that's just doing it with a single tool. You can see our code here is really nice and clean. So, we're just going to add in another function here. Wait, it's going to figure decide which of those to call.
First, it called get current date, which is great cuz that's basically it should figure that out [music] before it calls the other ones. And then it called the other two, and then, you know, came back with its response. So, you know, lots of options for how you can do logging and observability.
>> Want to dive deeper? Check out the full episode at the link in the description.
Related Videos
AI Agent Mastery Certification Course: Lab 4 – Tools & MCP
arizeai
350 views•2026-06-16
Real-time Voice cloning, Kimi K2.7 CODE, GLM 5.2 and 3D reconstruction | AI News
kaiexplainsYT
111 views•2026-06-16
He Believes AI Could Replace Humanity Faster Than Anyone Expects
LondonRealTV
815 views•2026-06-15
General Session by Rami Rahim-The next generation of networking: From vision to self-driving reality
HPE
108 views•2026-06-17
[PLDI 2026] Flatirons 3 - LCTES (Jun 16th)
acmsigplan
191 views•2026-06-16
Google DeepMind’s AI Halves UK Housing Planning Time
60secondsignals
467 views•2026-06-17
The Creators of Claude Code and OpenClaw don't Prompt Their Agents Anymore?!
ColeMedin
569 views•2026-06-18
Why prompt injection is AI's biggest fail
usemultiplier
1K views•2026-06-17











