Machine learning would be much simpler if all your loss curves looked like this the first time you trained your model:
Unfortunately, loss curves are often challenging to interpret. Use your intuition about loss curves to solve the exercises on this page.
Exercise 1: Oscillating loss curve
What three things could you do to try improve the loss curve
shown in Figure 21.
Check your data against a data schema to detect bad examples, and
then remove the bad examples from the training set.
Yes, this is a good practice for all models.
Reduce the learning rate.
Yes, reducing learning rate is often a good idea when debugging a
training problem.
Reduce the training set to a tiny number of trustworthy examples.
Although this technique sounds artificial, it is actually a good
idea. Assuming that the model converges on the small set of
trustworthy examples, you can then gradually add more examples,
perhaps discovering which examples cause the loss curve to
oscillate.
Increase the number of examples in the training set.
This is a tempting idea, but it is extremely unlikely to fix
the problem.
Increase the learning rate.
In general, avoid increasing the learning rate when a model's
learning curve indicates a problem.
Exercise 2. Loss curve with a sharp jump
Which two of the following statements identify possible
reasons for the exploding loss shown in Figure 22.
The input data contains one or more NaNs—for example, a value
caused by a division by zero.
This is more common than you might expect.
The input data contains a burst of outliers.
Sometimes, due to improper shuffling of batches, a batch might
contain a lot of outliers.
The learning rate is too low.
A very low learning rate might increase training time, but it is
not the cause of the strange loss curve.
The regularization rate is too high.
True, a very high regularization could prevent a model from
converging; however, it won't cause the strange loss curve
shown in Figure 22.
Exercise 3. Test loss diverges from training loss
Which one of the following statements best identifies the
reason for this difference between the loss curves of the training
and test sets?
The model is overfitting the training set.
Yes, it probably is. Possible solutions:
- Make the model simpler, possibly by reducing the number of features.
- Increase the regularization rate.
- Ensure that the training set and test set are statistically equivalent.
The learning rate is too high.
If the learning rate were too high, the loss curve for the training set
would likely not have behaved as it did.
Exercise 4. Loss curve gets stuck
Which one of the following statements is the most likely
explanation for the erratic loss curve shown in Figure 24?
The training set contains repetitive sequences of examples.
This is a possibility. Ensure that you are shuffling examples
sufficiently.
The regularization rate is too high.
This is unlikely to be the cause.
The training set contains too many features.
This is unlikely to be the cause.