Regularization means penalizing the complexity of a model to reduce overfitting.
Regularization for Simplicity
Generalization Curve
Penalizing Model Complexity
- We want to avoid model complexity where possible.
- We can bake this idea into the optimization we do at training time.
- Empirical Risk Minimization:
- aims for low training error
$$ \text{minimize: } Loss(Data\;|\;Model) $$
Penalizing Model Complexity
- We want to avoid model complexity where possible.
- We can bake this idea into the optimization we do at training time.
- Structural Risk Minimization:
- aims for low training error
- while balancing against complexity
$$ \text{minimize: } Loss(Data\;|\;Model) + complexity(Model) $$
Regularization
- How to define complexity(Model)?
Regularization
- How to define complexity(Model)?
- Prefer smaller weights
Regularization
- How to define complexity(Model)?
- Prefer smaller weights
- Diverging from this should incur a cost
- Can encode this idea via L2 regularization (a.k.a. ridge)
- complexity(model) = sum of the squares of the weights
- Penalizes really big weights
- For linear models: prefers flatter slopes
- Bayesian prior:
- weights should be centered around zero
- weights should be normally distributed
A Loss Function with L2 Regularization
$$ Loss(Data|Model) + \lambda \left(w_1^2 + \ldots + w_n^2 \right) $$
\(\text{Where:}\)
\(Loss\text{: Aims for low training error}\)
\(\lambda\text{: Scalar value that controls how weights are balanced}\)
\(w_1^2+\ldots+w_n^2\text{: Square of}\;L_2\;\text{norm}\)