Two common regularization methods:
- Lasso
- Uses L1-norm
- Ridge
- Uses L2-norm
A trick to remember the norm is that letter L comes before letter R, so Lasso is L1 norm and Ridge is L2 norm.
One is more likely to result in sparse solutions turning one or more coefficients zero. Which one do you think it is?
Quiz: which formula is Lasso? Which one is ridge?
- Regularization penalizes overly complex models
- Large weights usually make penalty term higher, so smaller effective weights are preferred
- Larger weights cost more
- Regularization = regular_loss_function + extra_penalty_term(lambda, weights)
- The extra penalty term also depends on the weights parameter and the lambda rate parameter