Regularization for Simplicity: Playground Exercise (Overcrossing?)
Overcrossing?
Before you watch the video or read the documentation, please complete
this exercise that explores overuse of feature crosses.
Task 1: Run the model as is, with all of the given cross-product
features. Are there any surprises in the way the model fits the data?
What is the issue?
Task 2: Try removing various cross-product features to improve
performance (albeit only slightly). Why would removing features
improve performance?
(Answers appear just below the exercise.)
Click the plus icon for an answer to Task 1.
Surprisingly, the model's decision boundary looks kind of wacky. In particular,
there's a region in the upper left that's hinting towards blue, even though
there's no visible support for that in the data.
Notice the relative thickness of the five lines running from INPUT to OUTPUT.
These lines show the relative weights of the five features.
The lines emanating from X1 and X2 are much thicker than
those coming from the feature crosses. So, the feature crosses are
contributing far less to the model than the normal (uncrossed) features.
Click the plus icon for an answer to Task 2.
Removing all the feature crosses gives a more reasonable model (there is
no longer a curved boundary suggestive of overfitting)
and makes the test loss converge.
After 1,000 iterations, test loss should be a slightly lower value
than when the feature crosses were in play (although your results
may vary a bit, depending on the data set).
The data in this exercise is basically linear data plus noise.
If we use a model that is too complicated, such as one with too many
crosses, we give it the opportunity to fit to the noise in the training data,
often at the cost of making the model perform badly on test data.