K-Means is an unsupervised machine learning algorithm that partitions unlabeled data into K clusters by iteratively assigning points to the nearest centroid and updating centroids as the mean of assigned points, minimizing intracluster distance while maximizing intercluster distance. The algorithm requires specifying K beforehand, initializes centroids randomly (or using K-Means++ for better initialization), and continues until convergence. It is widely used for applications like customer segmentation, image compression, document clustering, and anomaly detection, though it struggles with complex-shaped clusters and requires knowing the number of clusters in advance.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
K-Means Clustering: Unsupervised LearningAdded:
Okay. So today's we are going to discuss the first uh machine learning algorithm of the unsupervised learning. In previous lecture we discussed introduction to machine learning. What are the types of machine learning? What is supervised learning? What is unsupervised learning? How we actually train a model? How we evaluate model?
What is data analysis? importance of data analysis and many other uh things that we discussed in the previous lecture. If you have any questions related to that or you would like to discuss anything you can otherwise we are going to start the first unsup unsupervised learning algorithm.
Okay. So this is quite easy. Let's start.
So first of all we are going to start from the unsupervised learning and the algorithm that we are going to see is uh one of the clustering algorithms.
Unsupervised learning just a quick recap you know uh whenever our data set is unlabeled or our given data set is without labels.
I I mean the data set is not supervised or labeled by a human, by a machine or any any person. You can say the experiments of the previous history are not labeled. Then of course we cannot train a supervised learning model. So we will train an unsupervised learning model.
So model must discover the patterns from the data itself because the data samples or the experiments are not enabled. So of course model should discover the patterns itself relationship exists between the data. Of course if data set has some relationships then this model will be able to learn uh develop those relationships.
It needs more complex processing and applications of Kin clustering is uh in anomaly detection in document categorization relevant news clustering or segmentations grouping and then the customer segmentation. Customers having similar interests having similar kind of uh service requests can be segmented using the clustering techniques.
Clustering is basically a grouping uh technique.
So if I talk about the K mean clustering algorithm the one of the famous unsupervised learning clustering algorithm which is called the C means clustering. It is actually the C means is like a team of people you can consider those are trying to find the best spots to build the K fire stations in a city. Okay. In a city you need to build the K number of fire stations and you have a team containing let's say two people three people those are actually trying to find the exact or the better or the optimal positions for the installation of fire stations. You don't want them all in the same neighborhood.
Of course we don't want different fire stations at the same place in a city. We want them at the best optimal place that they covered the complete city and they are at the best particular positions and you want each station to be as close to as many houses as possible. Of course, we also want this we don't want the Kfire stations like um outside the city or boundary of the city. We want the Kfire stations where uh the population is actually living.
that optimization of location is exactly what the algorithm is doing mathematically. So this is how actually the K mean clustering algorithm works.
You have a k number of k number of people in the team and in kmin clustering you have k number of centrides and those are actually trying to find the best possible location in your data set that actually covers the diversity of the data set better.
We will see the examples and you will get the idea. This was just a bigger picture how testing actually works. This is the visualization. For example, if we have this kind of sample, you can see we have a different things mixed up with each other. But if we apply a cam clustering algorithm. So clustering algorithm on the basis of some similarities maybe on the basis of the color similarity maybe on the basis of size similarity there could be other features as well on the basis of those the clustering algorithm actually groups similar items.
You can see in this particular image, this is the input to the model and there was not any label like we did not label that this sample is this, this sample is this and this sample is this. There is no label of this thing. They are just mixed up with each other and we just passed on this to the algorithm and algorithm automatically identifies the similarities of the items and then group those items together and you can see that the similar items are grouped together. This is what we are expecting the output of this clustering algorithm is you can see it is a very good algorithm and we are getting our output very clearly in the form of the clusters.
Okay.
So another example is for example if you have a lot of images in the database and you want to cluster you have seen many of you have the uh smart mobile phones and if you just open your gallery uh there are different options nowadays in your mobile phone galleries you can see automatically generated stories and uh most of the images belong to same nature maybe you are on a trip you capture different memories and mobile phone automatically identifies okay these are the images of the similar area okay looks like from a similar trip and it uh generated its um it's separate story including or merging all the images of particular that trip uh this is on the basis of grouping or the clustering of the similar items The same way you can see these are the images and as output you can see the similar images this similar group of images uh grouped together and the other similar images are grouped together and the third similar images are grouped together.
So this is what the output of kind of clustering the grouping algorithm.
Okay.
So how it actually works in the in the case of the documents listing we have different pre-processing steps before we group the documents on the basis of their text. For example, we have the tokenization step. We have to tokenize or break down the sentences into words, words into tokens. And the stop words removal. We need to remove the stop words, the question marks, full stops and many other stop words. We need to do the stemming. We need to do the term waiting uh stemming. For example, we need to convert the words into actual words. There are many adverb in the word. For example, uh going, writing and many other words. So this step is actually uh converts the adverb words into plain words or the nouns kind of words and then the term waiting on the basis of the repetition of the words and then after the pre-processing we will make the documents representation and then we will apply the clustering algorithm.
there are multiple clustering algorithms and then after that you can check the the validity of the clusters made of the documents.
Uh this is another illustration for the understanding.
This is the example given as a input.
You can clearly see that every point or every sample in this input don't have any kind of label. We cannot actually differentiate that which sample belongs to which group if we want to group uh together. But once we applied the unsupervised learning came in clustering algorithm we can get the similar groups like this. You can see these samples belong to the similar group and these samples belong to similar group and these samples belong to the third similar group. You can see the output like this.
uh one important point here I would like to mention is how many total number of groups could be in this data set is actually given to the K mean clustering algorithm. For example, if I want to make three clusters, I will give the input K variable is equal to three and then it will create the three similar groups. So generally as a human being you can see that uh after seeing a visualization a human brain can see okay how many different groups could be in this particular data set. But yes there are some complicated scenarios I will show you in upcoming examples in which even a human cannot clearly see that how many different groups could be in this data set where this algorithm performs much better. And there are some other improvements the other variants of K clustering available for the better clustering algorithm. We can see those later on.
Okay. So why the these number of samples belong to this one group because because of the similarities that we at this point we don't know we will see later on but of course on the basis of some similarities these points are grouped together and these points are grouped together on the basis of some similarities and these points are grouped together on the basis of some similarities.
two things that are very important according to this algorithm I need I need to discuss in upcoming slides but I'm just giving you uh I mean an overview of that thing the distances within a cluster you can say the intracluster distance and the distance between one and other clusters we can say the intercluster distance these two distances are very important to evaluate the accuracy or the validity of the clusters that we have made.
Just give me a minute. I need to pick a call.
Okay, it's connected.
So these are different clustering approaches for the unsupervised learning data sets.
Clusting local outlier factor expectation maximization principal component analysis independent component analysis non- negative matrix factorization singular value decomposition and isolation forest we will be focusing more and seeing the clustering techniques in the unsupervised learning within clustering there are multiple algorithms like kmin clustering I am discussing with you hierarchical clustering Okay. And many other algorithms in the within the clustering technique. So we have to see one two different clustering algorithms.
The other we are not going to discuss the other techniques of the unsupervised learning.
Okay. So partition unlabelled examples into disjoint subsets using uh the clustering algorithms such that the samples within a cluster are very similar and the samples in different clusters are very different.
So of course our end goal is the samples within each cluster should be very similar that is why they are in the same group and the samples those are in an other cluster should be very different uh from each other.
So if this is the thing then your clustering accuracy is good. That is why I was mentioning in easy words you can say the intracluster distance intracluster distance within cluster distance it it should be very less. The samples should be cluttered with each other and the intercluster distance should be high so that we can easily differentiate.
Okay this is one cluster this is the second cluster.
So yes, so select a similarity or dissimilarity function and on the basis of of course you can group you can create the groups either on the basis of similarity or on the basis of dissimilarity.
So group samples based on this function similar ones be in one cluster and the different ones put to the other cluster.
This is what I was talking about the intracluster distance and the intercluster distance.
If your intercluster distance is high, it means the clusters are good because they are very apart from each other. They are not mixing up with each other and the intracluster distance should be minimized because we want that whatever the samples those are inside a cluster are actually the samples of that cluster.
For example, if there is a point at this stage.
So this point will impact the strength or you can say the accuracy of this cluster because we can say that uh this point is far from the all other points and the distance of this point from the other clusters. Of course from this one it is very high but from this blue cluster the distance is it seems greater but of course it is confusing actually the red and the blue cluster. Okay. If it is exactly at center then of course it is confusing but even if it is far away from blue or red it is a kind of outlier sample which is affecting the the accuracy of either red cluster or the blue cluster. We will see these kind of samples.
So there are many types of clustering algorithms.
Many algorithms use similarity or dissimilarity criterias. The way they find the grouping is different. Some popular clustering algorithms are the K mean clustering. There is actually a variation of Kin clustering which is called Kins++ Kin++ clustering and then we have the elomerative clustering, hierarchical clustering, main shaped spectral clustering and mixture of gazians. These are the different clustering techniques that we use for the unsupervised learning.
We we will be seeing the C means clusting first and then following onative and hierarchical clusting. CIN clusting is a very famous simple and powerful unsupervised machine learning algorithm. It has been applied to many complex unsupervised machine learning problems. I'll show you some of the use cases of CINS later on at the end of the lecture. A kin clusting algorithm tries to group the similar items in the form of the clusters. The number of groups is represented by K. K is the number of groups. It should be given beforehand.
Yes, we need to tell actually how many groups we want from this data set. This is you can say some kind of limitation of K meaning and K should be the predetermined.
Each cluster is associated with the central point, the center point of the cluster. You can say the gravity point of the cluster. Each point is assigned to the cluster with the closest centrite and number of clusters K must be specified. The objective is to minimize the sum of distances of the points to their respective centr. You can say the objective is to minimize the intercluster distance and maximize the intercluster distance.
This is one example uh illustration for you. And you can see if you just uh for a moment if you just uh if you just consider that there is no red, yellow and blue dot. It is a simple data set of the black dots. You can see first we decided that we need to apply the unsupervised learning because all of these dots were not having any kind of labels. So we decided okay we will apply the unsupervised learning and then we decided which algorithm of unsupervised learning. So we decided we will apply the K means clusting and then for K means clusting one predetermined thing is K. You know about this that we need to tell the K mean clustering what are the number of groups we are expecting.
So what will be the K value? So we decided the K value will be three. And then the second step is okay K value three means there will be three different groups that we want to make in this cluster in this data set. And once we said that okay we will have to make the three clusters then the point is what are the starting points of these three clusters.
Okay this starting point is called the centr the gravity point of each cluster which attracts the other points uh towards that group. So the one approach is the random centride creation. So let's say we randomly selected this black point and colored it as a red. We randomly selected this is our centride one means the the the gravity or the center point of one cluster and this is the gravity or the center point of the second cluster.
This is the gravity or the center point of the third cluster.
So once we decided that there we will create the three groups and then we selected the three different centrids randomly in the data set. Now in the step three we will actually calculate the distance of each point from every centr. For example, now we need to check that this point will be the part of group one I mean C1 or this will be the point of group two I mean C2 or this will be the point of group three. So for this we need to calculate the distance of this point from each cluster. Okay.
So we will calculate the distance from this one. Uh it is very simple. We can use different distance formulas uh you know ukidian distance minute distance alon and many other uh you can use the distance for the distance calculation of one point to from all the cried points or the gravity points of each cluster and this distance and this distance.
Once we calculated this uh distance of this point from each cluster, we will assign this point to the cluster having minimum distance. Okay, what does this mean? This actually means this point has more similarity with this point.
Okay, the gravity or the force of gravity or the attraction of this point is much greater towards this. Okay. As compared to the other uh particular two points, two cent points or the gravity points.
So this point will will pull this uh this point towards itself and it will be assigned the red color to this point. In the same way we will calculate the distance of each other point from every cluster and then on the basis of the minimum distance we will assign it to the minimum distance cluster point and eventually we will get this kind of output.
All these points has been assigned to this cent this group and all these points has been assigned to this one and all these points has been assigned to this one.
Story is not ended here. There is another important step. So you know we selected these three central points randomly in the first iteration. So of course on the basis of this randomness we cannot say that our grouping is good and we cannot stop at this stage. So we need to we need to change the centrides with respect to time and for this there are some of course functions and many other things that we will be using but let's see first of first of all let's complete the one iteration of the algorithm till the end. So for example this was the problem we have given set of x with the n number of points in a dimensional space and an integer k group the points into k number of clusters c1 c2 and c3 such that the cost of this function is minimized. Okay cost of this function is minimized.
What does this mean? The cost of this function is minimized. Remember what we discussed previously that we want the intracluster distance to be minimum and the intercluster distance to be maximum for the better output. So once our first iteration is complete we randomly selected the centrides. We calculated the distance of every point from every centr and then assigned the point to the nearest centr value after this grouping this kind of grouping generation. Now we need to evaluate our groups. How much our groups are better or good and what is their intracluster distances and what is their intercluster distances. And then we need to uh we also need to update the the centrid points. I will show you in next couple of minutes. Okay.
So initial centrides are often chosen randomly. clusters produced very far from one and each other.
Uh the original points these were the original points. For example, this is the data set. As a human, just consider that this was the data set of uh the similar colors. We don't have done any grouping yet. So we need to apply the clusting and we need to do the the grouping on the data set. So after optimal clusting uh on this data set you got this kind of groups. This is one group. Okay. This is second group and this is the third group. This is after the clustering algorithm.
If we see that uh which kind of clustering output is better. For example, these are also three clusters.
You can see very clearly.
Okay.
So what we did here we give this data set as input. We applied K mean clustering algorithm. The K value was three. We make the three groups and it gives us the two outputs. Of course these are the three number of clusters and these are the three number of clusters.
Why you can say that this is the suboptimal and this is the optimal.
Anyone please?
What do you think of?
Why we are calling this suboptimal clustering?
Good point. Good point. Uh we can in other words we can say the intracluster distance of this group is uh very high and we are saying that the intercluster distance should be low minimized and the intercluster distance should be high and you can say the intercluster distance in this grouping is very low. It is actually inverse of that and the intercluster distance is very high.
Good.
Okay. So the these are the algorithm steps of this particular algorithm. We initialize k number of clusters. We randomly initialize the centrids of those groups. Then we calculate the compute the distances from each point and then we assign the minimum distance on the basis of minimum distance the points to the relevant clusters and then we actually update our centrides.
This is but the last point that we need to see and then we will move towards an example how we how and why we update these and drives.
So actually we do the multiple runs and select the clustering with the smallest error. Okay, the smallest error means again the same thing the minimum intracluster distance and the maximum intercluster distance. We actually expecting if there will be uh the intracluster distance high it is a kind of uh error for us. Select the original set of the points by methods other than the random. Yes, this is a good good line. Just think of if you have any other idea how we can do other than the random how we can initialize the initially centrides points that we are initializing randomly at this moment.
What could be the other methods?
Pick the most distance from each other points to the cluster centers. Uh this is the variation of K means plus+. Yes.
So K means plus+ the difference between K min and K minus is actually the initialization of first initialization of the cent points. In simple K means we initialize the cent points initially randomly and in K min plus we initialize the initial number of uh the centrides on the basis of some some calculation that we will see later on on the basis of actually the distribution of the points. We our our attempt is to initialize the clusters uh the centroid points those are very far from each other for better convergence or you can say for faster convergence because randomly you can pick you can pick the cent points those are very close to each other. So it will actually take a lot of computation and time to to come to that uh that optimal convergence point.
Okay. So let me uh before moving towards these uh mathematical formulas for the calculation and the evaluations and many other things let me first tell you the the reason behind the updation of the points. Okay we can consider let's say this this particular example and yeah this one. Okay. So what we can do is we can remove everything and I'm going to create the I'm going to create the the the clusters uh on the basis of my guess. Okay. Just for the for our understanding.
Okay. So as if we guess is it's uh if we guess it correctly this will be one red cluster group and let me change actually the color if I can get it ly this should be the second cluster at the end of the algorithms output I am talking about okay and this should be The third yellow cluster means that this point will be part of the yellow cluster. This point will be the part of yellow cluster. This will be the part of yellow cluster and this will be the part of the yellow cluster. And this is how actually the output will look like.
I'm just going to mark it a little bit.
Okay. And finally can use the red one.
This is what the output of the algorithm will look like after iterations. A lot of iterations. Okay.
One only one step that is remaining is actually the updation of centrides. So two points are very important to understand why we update the centroidids and what's the benefit of updating the centroidids.
All right.
So let's consider that we were actually yeah initially we randomly initialized this point as a cent point of this cluster and this point as a centr of this cluster and this point as a centr of this cluster.
So you can clearly see that this if if you see the the boundary of this cluster and the boundary of this cluster maybe this one this initially initialized random centroid is on the boundary of this cluster. Okay you can see this this is actually on the boundary of this cluster. So of course the distance from this cent of this point will be high.
So one thing that we want our centr to be in the center of the cluster. Okay as you can state from the name we are calling it centr. So actually it should be in the center of the cluster. So how we can make sure that our randomly initialized centrides are actually in the center of the cluster. So for this actually we do the multiple iterations. Okay. If we initialize this randomly there could be this point of the red initialized randomly. There could be this point of blue initializes randomly because randomness could be anything.
and uh might be this as well. For example, if we say that randomly this point was initialized from this cluster.
Randomly this point was initialized from this cluster and this point was initialized from this cluster.
Okay. Now you can see what I'm trying to say. The random initialization of centr is is not a good thing. But if there is a no option and we have to use this approach, then we need to update our centrides after each distance calculation or one iteration of every point.
And how we can update the location of our centr? It is very simple.
Once we done this in our one step.
Once we done this in our one step, this cluster has been made. This cluster has been made and this cluster has been made. Okay. Now you need to update your center. But you can do it is very easy.
Just take this point and the you you can say the mean calculation actually. So all of the points of this clusters mean value and then move this central points towards the result of those values. Of course every point has x and y value. When you will calculate the mean values there will be some mean x and y value that will be your new centr point for this cluster. In the same way for this centr calculate the mean value and you will get a new mean c for the second cluster and in this third group you will calculate the mean value and you will get the new cent for this group. In step two, once our centr is updated, we will again calculate the distance from each point.
Okay, we will again do all the computations to check that on the basis of new centrides our grouping is accurate or not.
Okay, you will do again calculations and again update and after this iteration once you got this you will update the centride and then you will calculate the distance from every point on the basis of new centr again.
So now the question is how many times we will do this? Can anyone of you tell me the answer? It is very simple actually.
How many times we will actually uh update or we will continue the updation of these androids and calculating the distances of the every point.
Yes.
Yes.
No input. Okay.
It is not that difficult.
We will keep updating the centrids until there is no updating values of the centrid.
It's not that difficult. What we are doing here, we are actually trying to calculate the mean distance. Okay, for the centroid calculation and once the the grouping is accurate and the centrids are at the accurate position, so there will be no changes in the cent values even if you do the calculation again and again. So you can say that once we calculate the centrid values and after two three steps the value is consistent. So we can stop because the grouping is uh done at this stage there there there could not be further values update of this and of course if there is no update in this centroid value there is uh there will be no update in the cluster points.
Okay. Now moving towards this one. This is the again distance formulas I discussed with you. You can use uh a distance formula of simple distance, Ukrainian distance, Manhattan distance.
Okay. And uh you can use Mosowski distance.
We can use cosine similarity distance.
This is mostly for the text data. Cosine similarity. This is the function of the cosine similarity. This is uh for the manoski uh distance calculation. If you use p value one, it is uh it will work like simple the distance. If you use the p value 2, it will act like simple ukidian distance.
And this is simple one point distance minus cent value.
It's you can say the sum of square distances.
Okay. And this is the value to calculate the new centr.
Okay. So you can calculate the new centr.
This is the mean of all the values of that particular points divided by uh the total number of points in that cluster.
So you will actually get the mean mean value that's interact.
If we talk about the complexity, it is n into k into i into d where n is the number of total number of points in that data set. K is the number of the clusters in that data set and i is equal to the number of iterations in that data set and d is the dimensionality of the data set.
Okay. Now for the evaluation we have different evaluation matrix the the distance matrices was for the grouping and of course now we have the evaluation matrix the Davis bolden index the clusters intracluster and the intercluster distances all these are used for the evaluation or the validation uh of the the clusters that the claiming clustering has generated. It is again very simple, not that difficult. You are actually calculating the differences of the distances and then you are taking the average and the sum of square square root.
Once you get this value, no, you will actually sum on the basis of number of clusters. If there are two cluster, three cluster, four cluster, you will sum all of these clusters and divided by the distance between uh it is actually one of this is the intercluster distance. This second one is the intercluster distance. If we divide this we will get get the one uh dispersion score and this dispersion expectation or the score we will use to evaluate that our clusters are good enough or not. So let me tell you this thing here you can see the SK is actually telling us that how spread out is the cluster. This SK is telling you how actually the cluster is separate.
You can say in other words the intracluster distance of that cluster.
And this separation is how far the the centrides or how far the clusters or in easy words you can say what is the intercluster distance of the clusters with each other. And then for the calculation you can you can uh just sum up the the spread of CK and CI what we were doing here and then we can divide on the basis of the distance between those particular centroidids values. If your SKL is small it means the clusters are tight and far apart. This is a good clustering output. But if your scale value is large the clusters are likely overlapping. Maybe the intracluster distance is high or maybe the intercluster distance is very low.
So this is how you can actually evaluate and check that what is the quality of the clusters that has been generated.
Then of course there are some disadvantages of the chem clusting algorithm. The one very big disadvantage is you actually have to input how many number of groups you are going to generate in this data set. Sometimes we don't know we cannot estimate accurately that how many groups will be in this data set but if you know then came will perform good way and if your densities mean the distribution of the data set is complicated then came in listing will suffer. We will see such examples in the next slide. So in kind of uh this type of data sets you can see these are the original points and these are the kin clusterings clusters.
In this kind of data set you can see the output of the came clustering.
So K mean clustering actually generated the three clusters but we are not happy with this. You can see clearly the problems in this clustering. The intercluster distance is very very low and the intracluster distance is very very high.
Okay, this is much better.
At least the intercluster distance is very low and also the intercluster distance is high.
So if data set is data set distribution or the data set shapes are complicated came in clusting will suffer and this is again this were the original groups and these are the groups generated by the game clustering which are not accurate.
Oh, this is very complicated shape. Of course, no way that came in will give you the correct output for this kind of uh clustering. There are some other clustering algorithms like DB scan density based clustering algorithms. Those can actually capture the similarities if the shape is very complicated or the different because they do clustering on the basis of the density not on only on the basis of the distance of one point to other.
The example of image segmentation.
If some of you use the video editing or the image editing tools, you have seen that uh you can actually segment one similar kind of things in each image very accurately.
This is on the basis of uh of the grouping of the pixels, the similarity of the pixels. So you can actually separate out the sky, the cloud, the water and the lush green fields and many other things. So on the basis of the pixel similarities you can group the similar things. We can use gaming clusting for the customer segmentation for the image compression for the document clusting for the anomaly detection. The similar type of anomaly or the attacks will be grouped together.
And in document clustering the similar topics or the sim and similar words uh documents will be grouped together. In image compression you have seen this.
And in customer segmentation the customer having similar interest similar age group similar similar metadata will be grouped together.
This is one of the example and there is another example on this link. uh I will suggest go ahead and uh do this example by hand for the better understanding of the clustering algorithm. You should do all these steps. Initialize the k number of clusters then random centroids then update the centrids then do the grouping then do the evaluation the calculation of the u evaluation of your clusters.
Uh that's it from my side. If there is any question you can ask me and then we can solve any one of these examples.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











