This video provides a comprehensive overview of Microsoft Azure AI fundamentals for the AI-901 certification exam, covering key concepts including artificial intelligence as software that imitates human capabilities like prediction, evaluation, and creativity; the distinction between assistants (user-controlled, productivity-focused) and agents (autonomous, goal-oriented); responsible AI principles including fairness, reliability, privacy, and transparency; Microsoft Foundry's model deployment options (global, data zone, regional) with guard rails for safety; authentication methods using Entra ID or API keys; and various AI services including natural language processing, speech recognition, computer vision, and generative AI models with their specific use cases and deployment considerations.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
AI-901 Microsoft Azure AI Fundamentals Study CramAdded:
Hi everyone. In this video, I want to go over some of the key information to help you pass the AI 901, the updated Microsoft Azure AI fundamentals exam that in addition to understanding core AI concepts, you also have to understand how applications can use AI, including understanding some Python code. Now, with that in mind, I've created a couple of separate videos I recommend you watch if you don't know programming. Now, I kind of set them up on an easy to access site. This is just like a savile.
And really, the only two that I think you should definitely watch is down the bottom there's a development section.
And if you've never programmed, there's this Python first hour and then there's one about AI development for non-developer. But I'll cover in this most of the dev and AI specific things.
But just spend one hour. I tell you how to set up Python, how to get an environment going just so you basically can understand the code. It's just going to be like a useful grounding to have when you think about actually understanding the the code and the things we're going to go through.
So those will give you the grounding for Python, but again, we're going to cover a lot of it there. Now, I also recommend you go through the free online training, which also has links to some hands-on labs that will put you in a really good position for the exam. Now, you're going to need access to Azure. You're gonna need you do a trial Azure subscription just you can go and try the things out and some really good resource here is actually the AI 901 page and you can scroll down and it has a link to the study guide which is over here that tells you the core skills that you need to know about.
So make sure you think yes I can understand each of those. talks about scheduling the exam and then it's got this self-paced set of learning that goes through all of the key concepts. So definitely make sure you go through all of that. Um definitely make sure you've tried the hands-on parts, but it is fundamentals.
You're not going to write complex code.
You're not going to architect complex solutions. It's really about understanding where do you use which types of capability? what are the core types of AI concepts and again understand really simple code and understand what the bits of the code are actually doing so that all gets highlighted in those skills measured I just talked about okay so with all of that let's actually get started on this kind of exam cra so firstly what is this AI this artificial intelligence thing and it's really as the name implies It's software. So things in computers that imitates some aspect of what is the human capability. So as humans we have our brains uh we have kind of vision we can hear things uh we can speak. So it's things that mimic aspects of that. So if I break that down, I could think about well our brains we can do prediction, I can based on historical data evaluate and make decisions based on those things. So I can also do evaluation. I've learned trends.
I can also think about understanding visual inputs. So from my vision I can understand well what is this text?
I can tell you what is this an image of?
Where is the cat in this picture? Draw a box around that cat. I can understand language.
I can engage in conversation.
I might be able to translate between languages. I can think about extracting knowledge.
I could summarize.
And then I also now think about this is one of the big ones.
I could be creative.
I can create text. I can create images.
I can create songs. I can't uh I can create videos and this is kind of one of those big areas now and we have these language models that are at the center of all the excitement around AI and these new opportunities. But I can also think about it as being a to summarize a body of text. It can understand those things.
And then there's also something called machine learning which is all about the idea that instead of having to program a computer how to do something. The system learns from data and it finds patterns.
For example, I could give it a set of label data. It could make predictions and classifications based on what it has learned.
Now one of the key ways we see this creativity this generative types of AI is we have these concept of both assistants and also we'll hear about agents and I think a lot of what happened at the beginning was assistance. So the whole point of an assistant is it kind of just gets a request from a user and then it gives an answer. Now it could be multi-turn. You ask it something, it gives a response. You ask it something else, it gives a response.
The human is controlling the sequence.
So what this is mainly aimed towards is boosting your productivity.
productive all over the place on this one. So, it's about boosting productivity.
Whereas agents on the other hand, the goal here is I actually let them be proactive. They can be triggered off of an event such as an email arrives, a schedule, some combination of things.
They can do multi-step planning. They are autonomous. So, they have a high autonomy.
I can think of them, they're great at automating tasks. They're great for automation.
And the big difference here is they focus on a user goal. So you give it a goal and it just really goes off and does that complete endto-end automation. And obviously I see a huge amount of value in that agent world where we can use that.
Now, as you think about these different ways of using AI and those capabilities, where I have software potentially making decisions, evaluating things, um, being creative, we need to make sure it's responsible. And so these core ideas around responsible AI and these have to be baked into any AI solution we create. So some of the big things we think about here is fairness.
So the idea here is there isn't any kind of unintended bias. Everyone should be able to use this and be treated the same way. There's the idea that they should be reliable and safe.
System should behave as expected. If I'm saying like a self-driving car, that's a very important thing.
We want to think about privacy and security.
It should protect data. It should protect identities.
And in a similar way to some of the fairness, think about inclusiveness. It should be designed for diverse users. We should all be able to leverage these technologies.
and then transparency and accountability.
So the idea here is you you should understand why it made the decision it made why it classified it this way and then someone has to be responsible and accountable for those outcomes of what AI is doing.
So those are super important things. So whenever we think about AI, we have to make sure we're keeping top of mind those ideas of, hey, is this responsible AI? Are we using it the right way? Are we doing the right um considerations around that?
Okay, so we want to do AI, we talk about these fantastic capabilities.
How do we actually get access to it? So this is AI 901. This is Microsoft's AI solution. And so what we're going to be using here is Azure.
So Azure is Microsoft's cloud platform.
So I can think about this idea. Okay, we have a cloud.
But actually the first thing we have to think about is actually we have to have an identity. So as a human to use a computer system, I always want to be able to track who is doing the thing both for auditing but also they'll have different levels of permissions to things. And when we think about AI agents, if they're not working on behalf of a user, they're going to want their own identity as well. I want to be able to audit those and look for signs of risk and look for problems, give them their specific access to tools and resources.
So in the Microsoft world, the identity provider is something called Entra.
So entra ID and every organization so your company you have your own specific tenant so you have some kind of name you'll probably see at the end of your email address so mine is savvlettech.net for example and within that tenant your objects live. So your users have identities, your groups, your computers have something and when you get these kind of things running your agents will have identities as well. Your security policies live there.
And the reason this is really important to understand is that in Azure, one of the key constructs is your organization will have one or more subscriptions.
Maybe this is subscription one. It will have a better name than that. But a key point is a subscription which is a boundary of cost primarily but also some kind of access. And there's certain permissions I can apply at the subscription. and they get inherited through everything inside it. That subscription trusts a specific tenant for identity purposes. So what that means is only identities in this entra tenant can be given permission to use those resources. So it's really important. Then within the subscription I have resource groups. I have one or more resource groups. I'm just going to draw one.
You cannot embed nest resource groups.
So I cannot put another resource group in a resource group. I can have other resource groups and I can still use and link to resources between them, but I cannot nest them inside each other. And then within a resource group, well that's when I actually then go and create the resources. So, for example, maybe I have a storage account, maybe I have a virtual machine, I have a a network, all of those different types of things. And then also, for example, I could have an instance of Microsoft Foundry. I'm going to do a terrible job of drawing its logo because I'm finding it very difficult to do, but it it's basically this idea of hey, it links. So, there's an instance of Microsoft Boundaries. that is the prodeveloper AI app and agent resource.
So just to kind of walk through that if we jump over really quickly.
So if I go and sit in Azure so I'm looking at a certain subscription I have lots of different resources.
If I search for Microsoft Foundry type resources and I look at all of my foundaries, I've got a particular instance here called Foundry Agent Project that's in a certain region. So Azure has sets of data centers all throughout the world. So this is something called East US2. So if I go and look at that, I can see this particular resource where it lives in a resource group called RGH2875 lives in a certain subscription and hey now I can go and do various things inside Foundry. Now Foundry itself, notice here it's telling me so I created the Foundry instance. Now I can go to the Foundry portal.
So Foundry actually has its own portal which I would just go to ai.asure.com and I'm actually using the new Foundry experience. So it's all built around discovering building um and then actually sort of operating and leveraging those sorts of resources. So these are the the core capabilities. And one of the things that's interesting here is if I go to the discover. So I'm looking around this top list of options.
I can go and look at models.
And what's really cool here is over 10,000 models. Like one of the fantastic things is you're not just kind of stuck in using only the models Microsoft creates or o only the open AI. It's model diverse. So hey there's models from anthropic and gro and all the different ones from hugging face.
There's um just massive numbers of different models you can use. So when I think about the idea of foundry so we're going to dive into some detail on that.
So with Microsoft Foundry that is where firstly we have the idea of models and these are think of the model as kind of the brain of our AI applications and I have to deploy a model before I can use it from an application. Now when I deploy a model I pick a number of different things. For example, there's a deployment type and very common ones for example there's global and what that means is the model could be running on a cluster anywhere throughout the world.
There's also the concept of data zone.
This is available for US and Europe, which means it would only run on, if I picked US, a data center in the US. If I pick Europe, it'll only run on data center in Europe. That would be maybe important if I had certain regulatory requirements that hey, my data had to stay within a certain geography. So, I would really care about that. There's also regional, which means it will only run in the region where you created your foundry. So, like East US2 in my case, not every model is available in every region. So sometimes you would not be able to deploy based on that.
But also there's things like the versions. Some models will have multiple versions. There might be different types of limits. So I could set the number of kind of tokens I want to be able to use on this. Then there's also guard rails.
So safety elements around hey can it be attacked? Um what sort of content it can create. So let's actually have a look at that. So if we jump back over again.
So if I just picked um any kind of model. So if I search for GPT5 uh 4.
So if I selected 54, I would have to deploy it. So it's telling me the information about it.
It gives me the pricing information, etc., etc. But if I hit deploy, there's default settings that would just be for example here global standard and default quotas. But if I do custom, well, this is where I'll see some of those different options. So for example, I could pick to do data zone. I might do regional if I'm doing provision throughput. This is where I actually get provisioned throughput units, PTUs, and it's where I must have a certain amount of capacity available for me with a certain latency for maybe really important types of scenarios. There's also an option to do priority processing, which gives me that um higher priority compared to maybe neighbors using it, but I would pay more money for that.
Notice maybe I can pick a certain model version here. There's only one model available.
I can set limits for example a number of tokens I want and it's got a default set of guard rails that are available. So those are its kind of safety mechanisms.
So I have those different options available to me. If I actually come out of this for a second and we jump over to build.
So firstly build is showing me the models I've actually deployed and can therefore use. But if I look at guard rails for a second, if I just select my default two, you can see well it's got jailbreak protection, different types of content safety and different types of protected materials. So if I had maybe some medical case where actually violence, I didn't want to block it. I needed a a higher tolerance because that could happen. Well, I could go and change those levels of guard rails. So, we have different options we can apply to the safety on what we're doing.
But absolutely, once I've deployed models, they now become available and I can use them.
And then one of the next kind of key functionalities we have here is the idea of agents.
Now when I think about agents, this is where how I'm going to create it to perform some action to maybe orchestrate multiple agents together.
And for Microsoft Foundry, there actually two key types of agents we can have in here. So let's just I'm going to change the colors a little bit. So from an agent's perspective, I can have a promptbased agent. So a promptbased is no code.
I'm literally just describing what I want it to do. I give it its instructions. It's super low weight. But there's also the option to do hosted.
So with hosted, this would be I'm doing like a a pro code. I'm writing code to describe my agent. I have maximum flexibility and then I would create an image with that code in that then the Microsoft Foundry environment would host and execute for me.
There's also the concept in here for example. So for my agent if it's that prompt based it's going to have those instructions as part of it. And even if I do a codebased I'm still going to embed instructions but it would be within my code. So one of the core things is you're telling it what you want it to do and how. We also have concept of tools and knowledge.
Now if we jump over to the portal again.
So here I've actually got an agent just you can see it. So this is a promptbased agent. So really the only important thing here is hey the instructions on how I want it to behave and then you can give it different types of tooling.
So I've got it connected to like an Azure AI search for its um some capabilities and knowledge. So this is my sort of content library. But you can go and define other tools. So these for example could be MCP servers. This is model context protocol. It's a standard way for AI apps not only talk to tools and knowledge, but those MCP servers can reflect back its capabilities. It makes it easier for the AI app to understand.
So I could go and hey connect to new tool and I can define these. I can do a custom. So open AI model context protocol. Agents can talk to other agents. There's a catalog of different tools available.
And then knowledge. This is where it's going to hook into Azure AI search. I can create knowledge bases that consist of multiple different knowledge sources that again is all enhancing the agent with information that it wasn't trained on. Because if I look at my agent, one of the key things is I tell it which model you're actually using. So I'm using the model router. This automatically selects a certain model based on the complexity of what is being asked to do. But those models, they're trained on a certain set of information. So when I think about the agent I create, my agent, well, it uses a certain model for its brain, for its thinking. And then also it may hook into certain tools and it may hook into additional knowledge. So tools I want to be able to perform actions. I want to talk to other systems. Knowledge information that either didn't exist when it was trained or wasn't part of its training set. My email, my one drive, um my knowledge base, but actually it will always use some kind of model. Optionally it can use tools or knowledge. And again those instructions define its behavior.
Now for my application to be able to use this fantastic stuff in Microsoft Foundry, it has to be able to talk to it. So the way this works is programmatically every single instance of Microsoft Foundry has an end point.
Now, we're used to talking to websites.
It's really just that. So, the website is gonna it's going to be encrypted. So, it's going to be HTTPS and then a particular URL.
And that is how the app programmatically can communicate using REST. This is just a standard way to place a request and get a response. Normally, it's using JSON.
And then additionally to be able to talk to that well I have to be able to identify myself. So I have to be able to identify I need some way of doing that. So ideally what we could do is we could use an entra identity.
So those identities we have up there.
Hey, in an ideal world, that's my kind of yes, let's do that. Let's use an entra identity.
And it would then use that to authenticate, I prove who I am to that endpoint. The other option is, and we don't like this as much. I don't know what color I should use for that. Maybe it's like a a grayish color.
I use an API key.
The challenge, and the reason I'm kind of doing it in this gray, it's like not as great, is you have to then be able to store that somewhere. You never ever ever want to put it in your code.
Maybe you use some kind of like Azure Key Vault will be a way of storing that.
Maybe your app environment has something. But I now have to deal with storing that, protecting it. If someone else got it, they could then use it and go and talk to your service. You have to heavily protect that. Whereas what's really nice about the entry authentication is for example if it's another Azure resource they have something called a managed identity it's completely handled by Azure there's no secret or my code would have to deal with um but yeah if I if I have to use the API key realize don't put it in your code never ever go and submit it to a git repo um use some mechanism ideally some kind of keybolt uh to keep it safe now when you're writing your application So, hey, fantastic. I'm writing my fantastic AI app.
It's going to be awesome.
Yes, there is rest being used to actually send the request and then yes, you get a response back, but you don't have to worry about all of that. What actually happens is most of the time you're going to use something called an SDK, a software development kit. And what that does is whatever language you're using, for example, Python or C, kind of name it, it gives you friendly um commands you can use in that language using the language's native types of variables and constructs and it does that communication for you. So all your app has to do is kind of talk to the SDK. The SDK goes and creates the rest calls to actually go and talk to that.
It will also probably go and do things like hey there'll be a library to go and authenticate with Entra. So it takes a lot of the pain away from you. So if I go and look at my foundry and I actually go to home, it shows me, hey look, there's my endpoint and it's longer. I can click the little copy to clipboard and I can go and use it in my code. when I look at code samples within Foundry, it will kind of populate that for me. And then there's my API key and it's offering to copy it to the clipboard as well. If it's ever compromised, you should regenerate it. So if you ever think someone's got your API key, go and regenerate that key. And those are also available in Foundry in the Azure portal. If I go and look at my resource management, it shows you your keys and endpoint in there as well. And then from there, it's actually really easy to if you need to um regenerate them. So there's even option up here to regenerate key one, regenerate key two.
You have two, so you could be regenerating one and just use the other one so you don't lose access to the application. But it's super simple to regenerate them if you think they've been compromised in any kind of way. But never ever ever share them. Keep them super protected.
Now, with all of that, the key part here is that when I'm using the Foundry portal, I can just go and look at a model and it will actually show me code on how to use it. So if we jump over, so what we're going to do, we've already, let's say we've deployed a model, went to discover, we did models, we deployed. So now we're in build. I can go and look at my models and we'll just go and select one. So if I look at the models I have available.
So I'm going to select GPT54 mini.
I can experiment with different types of maybe I can tweak depending on what parameters are available. I can tweak the tools, the knowledge, the I can do memory as a service. I can change the guardrails. But if I hit code, it'll actually generate code for me on actually using this. So it's in this case because I'm using the completions API.
It's using the OpenAI module and it's importing it. It's setting up the endpoint for my project. It's in the API key mode. So it doesn't put my API key in. I would have to deal with that. Then it's creating a connection to OpenAI with the endpoint and the key. And then it's calling, hey, I want to create a completion. So this is actually making the request, telling it which model and what I'm asking it to do. And then it saves the response to a variable. Then it just spits out the message from the first choice of that response. But notice I could also choose a different language. I could also tell it actually just use the rest API directly instead of the SDK or hey actually make it use entra ID authentication instead of using the key.
So let's actually look at this super quickly. So this is really that same code that it already wrote.
Now I changed it slightly because rather than putting a key in, I saved it as an environment variable called Azure OpenAI key, but the rest of it is exactly the same. So I have Python installed and I already did a pip install OpenAI. So that's how I installed that library. So again, it's kind of just pip install open hit caps pip install open AI. That's how I got the open AI module installed. And then to run this now I'm in PowerShell and I can just hit tab.
Nope.
Basic inference. py. So I'm just going to run that.
And there's the message. The capital of England is London.
So the content is part of that overall message. That's actually the response.
But it also gave me some additional details if it made uh calls to tools and any refusal and other things. There's also safety information. So, for example, if I took out message and just saved it and ran it again, you'd actually see a bunch of details about right what safety it did, if there was any self harm, if there was any violence. So, there's other parts, but I was just choosing not to have that cuz I don't need it.
But then I've also got the version here that's using entra. So the only difference now is I did a pip install azure.identity and all it's doing is getting a token using my current credential. So notice the current credential just means hey on Windows I have for example already authenticated. So I've done an Azure CLI. I've authenticated with the Azure CLI and it just creates me a token for ai.asure.com.
So now there is no secret there's no key here. So it's just using the identity.
Then it does exactly the same thing. Now before what is kind of important here is we just had a prompt the users prompt.
I'm adding now something called the system prompt. So I'm giving it the instruction that hey you're a helpful assistant but always answer like a pirate and with humor. But then the ask is still what is the capital of England?
And this time I'm just actually going to output the content part of it instead of the whole message. So now we'll run it again but we'll run the entra version. So fundamentally I've given it an instruction. I've now told it I want it to behave a slightly different way. And so now the response is the capital of England be London. A mighty city full of history tea and more pigeons than a pirate can count. And then a little uh black flag and a skull and crossbones.
Fantastic.
But the key point in these is you could just take these the links in the video description. You would change this endpoint to be your endpoint. And again, for simplicity sake, if you're struggling, you don't have to use the entry integrated off. You can just use the key version and just save as an environment variable your key value. And again, that's in the video description on how you do that. But the idea here is with the system part, we are giving it some actual further instruction on driving the behavior of exactly what I want it to do. That that's the whole point around this.
Well, a key point really here is that everything else we're going to do is really just built on that. It's some variation of that. Again, there's aspects I can tweak. I might be able to change on some of the older models, things like temperature and randomness and anything else, but for the most part, it's just going to be iterations on versions of this. And again the key part when I think about my agent when I want to give it the instructions that guide its behavior this is the idea of the system prompt and then the user asks is the user prompt and that that's the whole goal around that and after I've kind of created those agents I can deploy it as a prompt-based agent very easily or if I'm writing code then I could go and run it as a hosted agent.
Now, one of the key aspects around what I demonstrated over there was very much the idea of I'm talking to a generative model and that's again driving the big deal around the latest waves in AI invation, the agentic era.
Now, very typically this is known as a large language model or LLM.
Now there are also small language models that are still generative but they've been tuned maybe I've distilled a larger model down to a smaller number of parameters. So it's a faster cheaper inference. So when I think about this, so the models and again the huge focus here right now is around this idea not limited to I want to make that very clear not every model we see in foundry many are not they're not all these large language models but I have the concept of a large language model or a small language model so generative AI so a large language model we're normally talking in terms of billions maybe trillions of parameters whereas a small language model, I'm probably measuring in millions. And think of a parameter just as a number that makes up a strength of connection between the digital neurons in their digital brains.
Normally, the more parameters, the more powerful it is, but also maybe the longer it takes to respond, the more thinking and maybe more expensive it is.
Now, as I consider the idea of working with these large language models, we'll realize there's an input, the request I'm making, and then I'm going to get a response, my output.
And it's thinking is when it's doing an inference.
That's its thinking process. Now these inputs and outputs may be across different types of modality. So it could be text, it could be audio, it could be images, it could be video, it could be code, forms there. There's many different types of things.
Some models expect text in and then text out. Some might be text in and audio out. Some might be text in, image out. Some might be, hey, I can accept a text and an image in and I spit an image out. Or maybe it's text in, but I can spit out an image and an audio. There's different combinations. So a key part here is if it speaks two or more of these as either the input or the output, it is known as multi-odal because it supports more than one modality. That's different from multiodel. Multimodel would be when my AI application instead of only talking to one model, it uses lots of different models. And sometimes that's the best option. Maybe it might use one model uh to convert speech to text and then different models to actually do the reasoning based on complexity then another model to do the text to speech.
So it seems like it's an uh audio to audio all the way through. So multimodel is I'm using more than one model.
Multi-modal is I support more than one modality. So that's the the whole point around all of that.
Now when you talk about these generative models, the word you nearly constantly hear is the word token. And the reason for that is your input is the prompt, but actually it it has no real concept of what your words are. And so although we type in a prompt, what's actually happening is that prompt gets converted to tokens.
So a token represents a word or part of a word. So a number and that actually gets converted to an embedding which represents the meaning of it and that actually goes in as part of it.
So these embeddings, they're highdimensional vectors. They represent the meaning of a thing. And then what the models will do, these transformer models that are focused on attention.
You might have heard of attention is all you need. How they relate and matter to each other. So that's kind of a whole key point around that.
And so anything you ever type in actually goes through these various conversions.
Now one of the big deals around all of this and I could we should stress this point. is used in a lot of different places is these embeddings are not just for the semantic meaning of what we're asking it to do. Many times when it wants to work with like our data, we have to use these embeddings there as well because in the English language or any language, one word could mean many different things. Many words can mean the same thing. So if I was trying to find information on a certain topic and I just search for the exact words, I may not find it. Instead, I have to try and search based on the meaning of what I'm looking for. And so, we create these vector databases that reflect the meaning of data we have in something.
So, for example, what this could look like is imagine I had um a vector space and maybe this vector again it would be maybe a thousand dimensions. This is not representative at all. Remember I had a vector for dog and fairly close to that is a vector for puppy and then there was a vector for cat and then a vector for kitten and there's a vector for king and queen and paper and hot dog. You name it there there's a vector for it.
But you could think about it in a way of you can perform manipulations on these vectors. So if I said um dog actually no change that puppy minus dog plus cat in the semantics space if I took puppy removed dog from the vector added cat I'd probably end up at kitten.
based on the semantic meaning of the things. It's weird. I know you're not going to be able to visualize a thousand dimensions. I can barely handle three.
But this powers massive amounts of how we think about natural language and semantic meaning of of really everything we do.
Now, these generative AI models are super powerful. Again, they're they're at the core of many things we're doing.
But understand there are many other different types as well. So I wanted to dive into those because that's important for you to understand. So as you take the exam, if you're trying to do some kind of creative handling lots of different scenarios and reasoning, hey, I'm probably going to want to use a generative AI model. That's going to be the thing to use. But there are other types as well.
So some of the types of AI capabilities we think about a lot here and these are all available in Microsoft Foundry is another key type would be the idea of natural language processing or NLP.
So the whole point about this is it's focused on understanding and inferring the meaning from our human language.
Now there's different capabilities that it's going to do here.
So there's things like extracting key terms, identifying named entities, classifying text. So this is positive, this is negative, this is neutral, summarizing content. Now traditional NLP pipelines would break text into tokens.
It would normalize them such as uh making it all lower case, removing punctuation, filter out common noise words like the uh tag each token with is it a noun, is it a verb, is it adjectives. So these are can all be thought of as the idea of preocrocessing getting it ready and to be able to understand it and then it will then go and actually then do analytics as part of that. Now modern natural language processing actually uses those same kind of embeddings and transformers that capture the semantic meaning of words. the phrases in the context they're used enables it to better understand the relationships, the intents, the nuances beyond kind of just the old style analytics would look at the frequency of certain things.
Now the question here then is this sounds great.
How do you do it? And as you might guess, one of the ways is I can use generative AI models.
I can just use one of those. So, let's actually take a look and we'll experiment.
So, here I'm actually looking at GPT54 and what other models do I have? Let's see what do I have installed here.
Yeah, actually use GPT chat latest.
Let's try this one out. So, what I could do here, I could ask it.
And one of the things we'll actually ask it to demonstrate some of the things we just talked about. I'm going to ask it, show me a simple NLP breakdown of this sentence. The quick brown fox jumped over the lazy dog. Every, as you learn to type, this is what you use because it's every key in the alphabet. So, tokenize it, lowerase it, remove stop words. I'm asking it to really show that entire process.
So here you can see exactly those things I talked about. So we can see it tokenized it. It lowercased it. It removed the and over and then it worked out hey the determin the determinizer of what's going on.
It broke it down into nouns, adjectives, prepositions, verbs, past tense and then its understanding of actually what happened. So it's doing that complete uh set of capabilities there.
So that's definitely one way that we can solve this.
But there's also specific tools designed just for text analysis. So the other thing we can do here is Azure language.
So they're more specialized. They're more deterministic models. So they're not using that more generative type capabilities.
It's actually models trained to do specific things. And deterministic is the same input will yield the same output whereas generative models will not. The same input in a generative AI model is nondeterministic. I can still get a different output. So a specialized deterministic model will give me a more consistent predictable result and very often it will actually cost less money than using a generative model.
So you can actually go and look at the Azure language in Foundry tools.
So let's go and now we'll try this. So if I go back and if I go to AI services, so I'm in my models and I'm looking at AI services. I could search for this in the kind of discover as well.
But notice in the language category.
So I've got these kind of five at the bottom we have language detection, PII reduction, document PII reduction, text analytics for health, conversational PII reduction.
So different types. If I was do language detection, have a bit of fun on this one. And so what I can do here, I'm in the playground, so I can just test different things out.
So I could say this is some easy text and detect.
So not only does it tell me the language, it gives me a confidence level. So it's a 100% confidence. And as always, it will give me the code. So, if I want to do this in my app, I can click code again, pick different languages, and it will tell me the code to do it.
Now, if I hit edit, and I change it to some different text, it's now 100% confident it is Spanish.
And as that code shows, there is an Azure language SDK to be able to leverage that. So while generative AI models may work, it might be the right option if I'm trying to combine multiple different tasks together, I want a natural language response, but it is nondeterministic.
The results may vary. Whereas if I use Azure language, it's a way more structured, consistent, deterministic response. You can see I get the language and I get a confidence level. So if that was my primary goal of what I needed, hey, I would rather use Azure language than using a generative AI model. And again, it's going to cost me less money as well.
So then if I take it to the next sort of way I may want to interact, well obviously we then have the concept of speech.
When I think about speech, there's really two directions here.
So there's the the concept of speech to text. So I've got some kind of waveform speech and that needs to go to the text that it actually is. And then there's the idea of text to speech. So then I have the words and see if I get the waveform the same. Nope. And I want to create the waveform from it. And so if I combine those things, for example, that's really useful for interacting with humans. think about transcribing meetings, customer service agents, um accessibility solutions, and then those same agents being able to talk back, being able to have that customer service, but maybe notifications, training, creating voices for entertainment.
And so this gives me this ability to have a kind of like live voice.
And there are various different solutions around this. So if I go back to discover for example and I look at my models there there are many different um kind of uh text to speech but Microsoft has the my voice one.
So if I open it in the playground they have these dragon there's different voices. So, let's uh pack pick Iris here.
And maybe it's just gonna say, "Hey, John.
Hope the AI training is going well."
Hit play.
>> Hey, John. Hope the AI training is going well.
>> And once again, hit the code. I want to put it in my app. there's code that would show me exactly how to do this.
And likewise, obviously, I can go the other way.
So, if I think about um here, so speech to text and I'm in the playground again. What I'll do is I prepared I can't do an actual voice because I'm recording for the video, but I did record a uh a little audio that says everyone loves to take an exam. And to kind of prove the point, everyone loves to take an exam.
>> But you can see it uh it took the voice and it converted it.
And as always again, it has the code if you want the code. So always aware that that code buttons there to help you go and do these things. It tries to make it super easy for you to go and actually interact with them. And then there's even these concepts of voice live that I could like push to talk and it would be an interactive. And then for any of these things I could then go and add it to an actual agent that's just then hosted in Foundry and it's super simple to use.
Okay.
So some of the next type of things we have. So we've had natural language processing, we've had speech, the next one is computer vision.
And this is obviously focused around the idea that if I think about for images for example, well there's image classification.
So I give it an image and it does one label for the entire image. So this is a boat. This is a car. It's at that level.
There's an idea of object detection.
So if that is just like one label for the whole image. So that is just a label.
The idea of object detection, it will tell you what and where. So if there was kind of a picture and there's a person in it, what this would do is it would give you the coordinates to say, oh, that is a person.
And it would be able to do it for multiple different objects. There's a car over here. I can then think about the idea of semantic segmentation.
So in this case, if the picture was like this, and it's not going to work with a stick person, but imagine again I had that same stick person. Well, this time what it would do, it would actually color the pixels that represented the person.
So it would actually tell you exactly which pixels and image belong to that person. And then there's the idea of contextual um image analysis.
So it would say, let me just make sure these are linked to the right place. Um really poorly drawn picture of a person um on a thing. And we can use different models um based on what we want to do.
Now there are models that use convolutional neural networks. So they're trained on labeled sets of data.
So then we have to classify new images and there are models based on vision transformers. So this is where again it uses these embedding vectors to represent the meaning of the images.
It's how we have these multimodal types of models.
And then we get to the idea of actual um image and video generation. So and you we've seen these like obviously they're always super cool. We love seeing these things, but it's like image a little bit.
I mean, it's crazy kind of the quality that you see with these. Now, now these are super interesting because they're based on something called diffusion.
And it it's odd to think about, but the way they train these models is you start off with an image and you blast it with noise and you do different layers of noise. So it's a really good picture initially of a cat and then it's less good and less good and less good and less good and less good until it's just noise at the end.
And the model learns well based on that noise added, how would you reverse the noise to get back to the pitch of the cat? And then that layer of noise. So it it it's steadily worse, but it's trained the model to go from noise to a picture of a cat.
Also, the training is to reverse the destruction step by step. So it's learned types of noise to remove to get to it. And so what now happens is the model has learned to start from pure noise. You just give it pure noise and then it pulls an image out of that by repeatedly dnoising in tiny guided steps which each one nuds nudges the pixels closer to the concept you ask for in your prompt. So it's basically controlled chaos. The model learns how to turn static into structure, noise into shape, shapes into the final image, which is sometimes if you actually look at these, it looks like you're watching a picture materialize out of thin air.
So this could be an image, it could be an entire video. So if we jump over.
So I look at my deployments. I deployed GPT image 2.
And let's see how lucky I get with this.
So, let's create a cartoon of a cheeseburger that has arms and eyes and is typing on a computer. Now, notice why it's doing that. So, I have options. For example, I can do things like I can set the resolution.
I can set things like the quality I want. There are other parameters in terms of compression levels and the image format number of variations. I could have given it a source image as well. So that would be multimodal. So I could give it an image and a description of what I wanted to do. Some of these models the power is actually that you can edit existing images with these.
Like they they get better at following instructions and it's just crazy qualities you can do with these. Now, these can take different amounts of times. I did go ahead and create one of these earlier.
So, a little bit of fun. So, this was the image I created earlier with that exact same prompt. I kind of love this.
I think I'm going to use it in the thumbnail for this video. Eat code repeat burgers and focus be awesome. I think that's just a super friendly cheeseburger. And that one's obviously still running. Maybe we'll come back to it in a little bit. But those are image generation models.
uh I can also think about video generations. So Sora for example and again I can have all those different tweaks and multimodal models may generate images for us if one of those output modalities is image.
Now the next type of thing we have is information extraction.
Now, that's about taking content and turning it into useful data. I could be extracting information from a receipt to help me submit my expenses. It might be pulling out details from a contract to populate a database. There's huge numbers of scenarios. Now most of them will start with optical character recognition finding the letters in an image then the letters to words words to sentences. So it's extracting text from images and then if I think about from that I can map that to the extraction of fields from whatever that content is and then I can actually go and map it to maybe some computerized form to populate the data in the right way. Now the way we can do this, it's actually built in is Azure content understanding.
That was terrible. Understanding.
So that is a built-in service. So there are predefined classifiers, but I can also create my own for things I want it to be actually to go and find.
And again, let's jump over and see this.
It's always Oh, it did it. That looks amazingly similar.
Oh, it doesn't have the post-it note.
That's funny. I don't know if I prefer my one. Okay, what does that have? Be kind. Stay focused. eat burgers. I'm amazed a burger would say eat burgers.
That seems like a flaw in logic. Um, okay. But I do like that one. We're going to we're going to save that one just in case. Okay. Um, but outside of that, if we go to AI services, I have content understanding.
And I'm just going to open up the regular content understanding here because the playground is actually pretty nice. It has some demonstrations.
I could drag and drop my own. And again, it has the code on how I could use this.
But if I do, for example, a receipt, it has sample receipts. So, it's just an image. Notice what it's done here. It's identified the text, but then has actually used intelligence to understand what part are we actually looking at here. So, I can see, hey, look, it understood the name and it's got a confidence.
The address got a higher confidence.
phone number, very confident, dates, the items, the cost, tax, total. So, it's broken that down into the component part. Same for uh this kind of invoice. Again, it's broken it down into the various elements around it. So, these are really powerful where I need to take data from images, the real world, and be able to leverage it in my computer system. Now, could a generative AI model do this?
Sure.
But again, the consistency, the accuracy is probably going to vary. So, don't just always think, oh, I should use a generative AI model. It's probably going to be slower. It's probably going to be more errorprone compared to where I have specific models trained to do a particular function that are way more deterministic.
And that's it. I mean, that's honestly all the stuff I wanted to quickly cover in this kind of cram for you going ahead and taking the exam. Just understand the types of services that exist, where they fit in, the problem they meet. Hey, I've got receipts and documents and contracts. Hey, Azure content understanding is going to be great there. Um, hey, I've got various types of images I want to classify in an effective way or find where these objects are. Hey, look. Computer vision speech to text, text to speech understanding and doing semantic understanding of hey, is this positive?
Is it negative? You want that sentiment from things. Yes, we have the generative models. We have the prompt that goes to tokens. That's an embedding. Multimodal, two or more types of modality either for the input or the output. It could be both.
We deploy models. We have different options for is it it could be running anywhere in the world or maybe a particular set of data centers the US or Europe with a data zone or just regional. We might pick a version of the model. There were limits guard rails are safety. So it will stop maybe uh different types of self harm violence etc. help protect it from being jailbroken attacked.
um agents.
We give it instructions, the system prompt, and I can do prompt-based agents in Foundry that are just the prompt instructions and then it can use tools and knowledge hosted. It's that pro code. I've written it and then I put it in image and I can run it in Foundry.
to use it we have to talk to the endpoint and we have to authenticate entra integrated where I can use my identity or if I was in like an Azure VM or container it can use an a builtin identity using managed identity so I don't have to store any secret or the API key never put it in code always be careful around that but that's another way I could go and authenticate we use SDKs to abstract those restbased calls to it and really just go and play around with it. Go through the training. Um, go for that Python code if you don't know how to code. Get up co-pilot can go and help you and and I talk through how to do that. Go through the online learning and make sure you go through the labs.
If you don't pass the first time, look at the results. Look where you are weakest. Uh, you'll get it the next time. So, I hope that's helpful and uh, good luck in your exam.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











