Skip to content

K-means clustering

Aarthi Ramakrishnan
1 min read
K-means clustering

I wanted to remind myself how the k-means clustering algorithm worked. Following are the steps involved in K-means clustering -

  1. Start with a vector of 12 data points. For instance, [1, 2, 3, 4, 7, 8, 9, 10, 20, 21, 22, 23]
  2. Randomly select 3 data points. These are your initial clusters (k)
  3. Compute the distance between each data point and each cluster using Euclidian distance formula. The distance formula for a 1D vector is: abs(point1 -point2)
  4. Assign each point to a cluster depending on the minimum distance of that point to any of the 3 clusters
  5. Calculate the mean of all points belonging to each cluster. The mean values will represent the new clusters.
  6. Repeat steps 3, 4 and 5. Is there a change in the cluster assignment for any data point? If so, repeat steps 3, 4 and 5 until there is no change in cluster assignments
  7. How do you find the ideal k value? Just create an elbow plot i.e. a scatter plot with x-axis as k and y-axis as the sum of the variance for each cluster. Wherever you see an 'elbow' shape form, that k should be the most ideal for the analysis

Machine Learning

Related Posts

Steps in Machine Learning

Let us assume that we have a matrix of 1,000 training examples (houses) as rows, and 10 features (distance to downtown, sq. foot of house and no. of schools) as columns. The last column 'y' denotes the affordability of the house - expensive (denoted as 1) or affordable (denoted

Steps in Machine Learning