Clustering workflow

To cluster your data, you'll follow these steps:

  1. Prepare data.
  2. Create similarity metric.
  3. Run clustering algorithm.
  4. Interpret results and adjust your clustering.

This page briefly introduces the steps. We'll go into depth in subsequent sections.

Prepare data

As with any ML problem, you must normalize, scale, and transform feature data before training or fine-tuning a model on that data. In addition, before clustering, check that the prepared data lets you accurately calculate similarity between examples.

Create similarity metric

Before a clustering algorithm can group data, it needs to know how similar pairs of examples are. You can quantify the similarity between examples by creating a similarity metric, which requires a careful understanding of your data.

Run clustering algorithm

A clustering algorithm uses the similarity metric to cluster data. This course uses k-means.

Interpret results and adjust

Because clustering doesn't produce or include a ground "truth" against which you can verify the output, it's important to check the result against your expectations at both the cluster level and the example level. If the result looks odd or low-quality, experiment with the previous three steps. Continue iterating until the quality of the output meets your needs.