Install our extension to search inside any video instantly.

EP 1 Highlights | Run Open Models on Serverless GPUs

Added: 2026-06-20

217 views132:07MicrosoftReactorOriginal Release: 2026-06-15

NVIDIA Inference Microservices (NIM) encapsulates all microservices needed to run AI models at scale into a single Docker image, enabling deployment on serverless GPU platforms like Azure Container Apps where applications can scale to zero and users only pay for actual usage, simplifying the complex process of deploying enterprise-ready, scalable AI models.

[00:00:04]We are going to be looking at how we can run NIM models on Azure serverless GPUs using the container apps platform. And we're going to be covering quite a lot.

[00:00:14]>> Deploying an enterprise-ready, scalable, high-performance AI model is really not an easy task. There is [music] a lot that goes into it. NVIDIA solves this complexity by creating NVIDIA inference microservices, or NIM for short. A NIM takes all of the microservices that you need to run AI at scale, and it encapsulates them into a single Docker image. This makes it very easy to run and very easy to deploy.

[00:00:42]>> I'll talk about Azure container apps.

[00:00:43]It's actually my favorite platform deploying [music] anything on Azure because it now has GPUs, it's a great fit for this as well. You can actually say like, "Hey, you know, like if there's absolutely no traffic going [music] to my app, you know, to scale down to zero." And you're only going to pay for what you actually use. [music] >> Well, if we wanted, let's say, a gigantic workflow to write code, we would first pass our prompts to an orchestration agent, which could plan and interact across multiple different agents, which all have their own different tools.

[00:01:11]>> There's a lot of code here involved in calling these tools. That is why we typically use agent frameworks [music] when we want to have agents that call multiple tools. What we have here is a bunch of examples using the agent framework from Microsoft. This is a framework that was just released. And it's basically the successor to both semantic kernel and autogen. So, that's just doing it with a single tool. You can see our code here is really nice and clean. So, we're just going to add in another function here. Wait, it's going to figure decide which of those to call.

[00:01:44]First, it called get current date, which is great cuz that's basically it should figure that out [music] before it calls the other ones. And then it called the other two, and then, you know, came back with its response. So, you know, lots of options for how you can do logging and observability.

[00:01:57]>> Want to dive deeper? Check out the full episode at the link in the description.

Related Videos

Artificial Intelligence

AI Agent Mastery Certification Course: Lab 4 – Tools & MCP

arizeai

350 views•2026-06-16

Artificial Intelligence

Real-time Voice cloning, Kimi K2.7 CODE, GLM 5.2 and 3D reconstruction | AI News

kaiexplainsYT

111 views•2026-06-16

Artificial Intelligence

He Believes AI Could Replace Humanity Faster Than Anyone Expects

LondonRealTV

815 views•2026-06-15

Artificial Intelligence

General Session by Rami Rahim-The next generation of networking: From vision to self-driving reality

HPE

108 views•2026-06-17

Artificial Intelligence

[PLDI 2026] Flatirons 3 - LCTES (Jun 16th)

acmsigplan

191 views•2026-06-16

Artificial Intelligence

Google DeepMind’s AI Halves UK Housing Planning Time

60secondsignals

467 views•2026-06-17

Artificial Intelligence

The Creators of Claude Code and OpenClaw don't Prompt Their Agents Anymore?!

ColeMedin

569 views•2026-06-18

Artificial Intelligence

Why prompt injection is AI's biggest fail

usemultiplier

1K views•2026-06-17

Trending

Nobel Scientist Creates Device to Harvest Water From Desert Air

DrBenMiles

2200K views•2026-06-16

GROW A GARDEN 2 UPDATE

KreekCraft

668K views•2026-06-20

Something's off about my cat...

griffingraue

4534K views•2026-06-16

উটের কুঁজের মধ্যে কি থাকে?

MrBonGrow

1861K views•2026-06-18