K-means Clustering

#machine-learning #interview

K-means clustering is one of the Unsupervised Learning.

Steps:

Select K
Randomly select K initial points known as initial centroids
For all points
1. Find distance to each centroid
2. Assign to the closest one
Calculate the mean of the clusters -> new centroids
Go to Step 3 if centroid changed
Calculate Variance of each cluster
Sum all variance
Go to Step 2
Take the clustering with minimum variance

How to find Optimal K

For different K, plot the variance on line graph
Find the point where it starts to slow down, also known as elbow point
That is the optimal K

Problems with K-means:

Manual K-selection
Sensitivity to initial centroid
Doesn't work well with many outliers

Related Notes