Overfitting: Model complexity

The previous unit introduced the following model, which miscategorized a lot of trees in the test set:

Figure 16. The same image as Figure 13. This is a complex shape that
            miscategorizes many trees.
Figure 16. The misbehaving complex model from the previous unit.

The preceding model contains a lot of complex shapes. Would a simpler model handle new data better? Suppose you replace the complex model with a ridiculously simple model--a straight line.

Figure 17. A straight line model that does an excellent job
            separating the sick trees from the healthy trees.
Figure 17. A much simpler model.

The simple model generalizes better than the complex model on new data. That is, the simple model made better predictions on the test set than the complex model.

Simplicity has been beating complexity for a long time. In fact, the preference for simplicity dates back to ancient Greece. Centuries later, a fourteenth-century friar named William of Occam formalized the preference for simplicity in a philosophy known as Occam's razor. This philosophy remains an essential underlying principle of many sciences, including machine learning.

Exercises: Check your understanding

You are developing a physics equation. Which of the following formulas conform more closely to Occam's Razor?
A formula with twelve variables.
A formula with three variables.
You're on a brand-new machine learning project, about to select your first features. How many features should you pick?
Pick as many features as you can, so you can start observing which features have the strongest predictive power.
Pick 4–6 features that seem to have strong predictive power.
Pick 1–3 features that seem to have strong predictive power.

Regularization

Machine learning models must simultaneously meet two conflicting goals:

  • Fit data well.
  • Fit data as simply as possible.

One approach to keeping a model simple is to penalize complex models; that is, to force the model to become simpler during training. Penalizing complex models is one form of regularization.

Loss and complexity

So far, this course has suggested that the only goal when training was to minimize loss; that is:

minimize(loss)

As you've seen, models focused solely on minimizing loss tend to overfit. A better training optimization algorithm minimizes some combination of loss and complexity:

minimize(loss + complexity)

Unfortunately, loss and complexity are typically inversely related. As complexity increases, loss decreases. As complexity decreases, loss increases. You should find a reasonable middle ground where the model makes good predictions on both the training data and real-world data. That is, your model should find a reasonable compromise between loss and complexity.

What is complexity?

You've already seen a few different ways of quantifying loss. How would you quantify complexity? Start your exploration through the following exercise:

Exercise: Check your intuition

So far, we've been pretty vague about what complexity actually is. Which of the following ideas do you think would be reasonable complexity metrics?
Complexity is a function of the biases of all the features in the model.
Complexity is a function of the model's weights.
Complexity is a function of the square of the model's weights.