K-means Clustering

K-means clustering is one of the Unsupervised Learning.

Steps:

  1. Select K
  2. Randomly select K initial points known as initial centroids
  3. For all points
    1. Find distance to each centroid
    2. Assign to the closest one
  4. Calculate the mean of the clusters -> new centroids
  5. Go to Step 3 if centroid changed
  6. Calculate Variance of each cluster
  7. Sum all variance
  8. Go to Step 2
  9. Take the clustering with minimum variance

How to find Optimal K

  1. For different K, plot the variance on line graph
  2. Find the point where it starts to slow down, also known as elbow point
  3. That is the optimal K

Problems with K-means:

  1. Manual K-selection
  2. Sensitivity to initial centroid
  3. Doesn't work well with many outliers

Related Notes