K-mean: is he really mean?
It’s episode 2 of Code me and now I am presenting to you k-means let’s discover k-means today. Let’s discover whether he is mean or kind.
Let’s jump right into it. Join me to the journey to unleash k-means
What is k-means? Purpose of it?
K-means is a clustering algorithm. It works on the basis of calculating the mean of the Euclidean distance. It is one of the best clustering algorithms from the k family of algorithms. Algorithms of the k family include k-median,k-means,k-nearest neighbours, k-nearest clusters and more. K-means is a method of vector quantization.
call for history
I love the history of algorithms. They are very nice and fun to learn. How some of the greatest inventions were invented. I feel like algorithms are as important as life processes they are present in every life process.
Let’s check k-means history.
The term “k-means” was first used by James MacQueen in 1967, though the idea goes back to Hugo Steinhaus in 1956. The standard algorithm was first proposed by Stuart Lloyd of Bell Labs in 1957 as a technique for pulse-code modulation, although it was not published as a journal article until 1982. In 1965, Edward W. Forgy published essentially the same method, which is why it is sometimes referred to as the Lloyd–Forgy algorithm.
How does this algorithm work?
k-Means takes data points as input and groups them into k clusters. This process of grouping is the training phase of the learning algorithm. The result would be a model that takes a data sample as input and returns the cluster that the new data point belongs to, according to the training that the model went through. How can this be useful? Well, that’s how recommendation usually works — in a very simplistic manner. Let’s say we have 3 benches in a park and you one fine morning go out for a walk to the park and find the rules to be that the authority will put like-minded people into benches which are like clusters and you are like the data. This way, the benches will be recommended for the like-minded people and find who are like-minded & where they should go and sit is your model’s task and the clusters will move closer and closer to the data points. Moreover, as a new person goes into the park that person will be placed within a particular cluster, and then the rest changing clusters and all will be taken care of by the recommendation model.
Building on that idea, k-Means is just a clustering algorithm. It uses the euclidean distance between points as a measure of similarity, based on k averages (i.e. means). This is a very interesting algorithm, so let’s get down to business.
As you might’ve suspected, the value of k is of great importance. This k is called a hyper-parameter; a variable whose value we set before training. This k specifies the number of clusters we want the algorithm to yield like the 3 benches. This number of clusters is actually the number of centroids going around in the data. I know a hyper-parameter again the boring business, think learning rate, weights, biases, epochs, batch size, units and what not we overcame but as we all know hyper-parameters are hyper-parameters that help the model train and we also need to optimize each one of them, the sad part, but optimising numbers of clusters of our model is important so we also have an optimiser as expected for that actually. Not any optimiser but a method named elbow method why because the plot of loss, of distortion to k(number of clusters) is like an elbow. Hope it is cleared I laughed my heart out when heard the name. Then I said what elbow and looked at my elbow.
WHAT is this K in K-Means?
Basically, k in k-means is quite the same as the k in the k family of algorithms. The basic idea is having k neighbours / k factors / k features / k clusters.
Let’s seek help from some of the man’s great friends. You are confused, oh!! data visualization that has always saved the day by giving valuable insights from data.
I thought it to be a call walk though I never thought it will be hard for me to write a model of my own. I am not scaring you I am just telling you about my feelings.
The majestic code is in front of you, Be aware Be respectful the lord is here (trust me I apologise you for this bad joke):-
Explore and run machine learning code with Kaggle Notebooks | Using data from no data sources
Before We Leave
It was a great journey, but like everything is created, nurtured and destroy so our journey will. I don’t know when the next blog will get uploaded, but still, I will upload the next blogs soon. Please, everyone, follow the safety rules and stay safe many a people in my family are raging in high fever, most of them having covid-19 like symptoms, I just recovered from fever. So if you have read my blogs before you know that I like giving my blog in a gist or a nutshell. in this blog we explored k-means using the #CodeMe series, we wrote our own k-means classifier in python. Hope it helped you. I will try to be more frequent there are some family issues I will overcome them and write more blogs. Till then stay safe, stay happy. May God bless you. See you next time fam.