Overfitting: Interpreting loss curves

Machine learning would be much simpler if all your loss curves looked like this the first time you trained your model:

Figure 20. A plot showing the ideal loss curve when training a
            machine learning model. The loss curve plots loss on the y-axis
            against the number of training steps on the x-axis. As the number
            of training steps increases, loss begins high, then decreases
            exponentially, and ultimately flattens out to reach a minimum
            loss.
Figure 20. An ideal loss curve.

Unfortunately, loss curves are often challenging to interpret. Use your intuition about loss curves to solve the exercises on this page.

Exercise 1: Oscillating loss curve

Figure 21. A loss curve (loss on the y-axis; number of training
            steps on the x-axis) in which the loss doesn't flatten out.
            Instead, loss oscillates erratically.
Figure 21. Oscillating loss curve.
What three things could you do to try improve the loss curve shown in Figure 21.
Check your data against a data schema to detect bad examples, and then remove the bad examples from the training set.
Reduce the training set to a tiny number of trustworthy examples.
Reduce the learning rate.
Increase the learning rate.
Increase the number of examples in the training set.

Exercise 2. Loss curve with a sharp jump

Figure 22. A loss curve plot that shows the loss decreasing up to a
            certain number of training steps and then suddenly increasing
            with further training steps.
Figure 22. Sharp rise in loss.
Which two of the following statements identify possible reasons for the exploding loss shown in Figure 22.
The input data contains one or more NaNs—for example, a value caused by a division by zero.
The learning rate is too low.
The regularization rate is too high.
The input data contains a burst of outliers.

Exercise 3. Test loss diverges from training loss

Figure 23. The training loss curve appears to converge, but the
            validation loss begins to rise after a certain number of training
            steps.
Figure 23. Sharp rise in validation loss.
Which one of the following statements best identifies the reason for this difference between the loss curves of the training and test sets?
The learning rate is too high.
The model is overfitting the training set.

Exercise 4. Loss curve gets stuck

Figure 24. A plot of a loss curve showing the loss beginning to
            converge with training but then displaying repeated patterns that
            look like a rectangular wave.
Figure 24. Chaotic loss after a certain number of steps.
Which one of the following statements is the most likely explanation for the erratic loss curve shown in Figure 24?
The training set contains too many features.
The regularization rate is too high.
The training set contains repetitive sequences of examples.