Examining L1 Regularization
This exercise contains a small, slightly noisy, training data set. In this kind of setting, overfitting is a real concern. Regularization might help, but which form of regularization?
This exercise consists of five related tasks. To simplify comparisons across the five tasks, run each task in a separate tab. Notice that the thicknesses of the lines connecting FEATURES and OUTPUT represent the relative weights of each feature.
|Regularization Rate (lambda)
- How does switching from L2 to L1 regularization influence the delta between test loss and training loss?
- How does switching from L2 to L1 regularization influence the learned weights?
- How does increasing the L1 regularization rate (lambda) influence the learned weights?
(Answers appear just below the exercise.)
Click the plus icon for answers.
- Switching from L2 to L1 regularization dramatically reduces the delta between test loss and training loss.
- Switching from L2 to L1 regularization dampens all of the learned weights.
- Increasing the L1 regularization rate generally dampens the learned weights; however, if the regularization rate goes too high, the model can't converge and losses are very high.