K-means Clustering
K-means clustering is one of the Unsupervised Learning.
Steps:
- Select K
- Randomly select K initial points known as initial centroids
- For all points
- Find distance to each centroid
- Assign to the closest one
- Calculate the mean of the clusters -> new centroids
- Go to Step 3 if centroid changed
- Calculate Variance of each cluster
- Sum all variance
- Go to Step 2
- Take the clustering with minimum variance
How to find Optimal K
- For different K, plot the variance on line graph
- Find the point where it starts to slow down, also known as elbow point
- That is the optimal K
Problems with K-means:
- Manual K-selection
- Sensitivity to initial centroid
- Doesn't work well with many outliers