- On Sunday, September 30, 2018
## Deep Learning: Regularization Notes

In previous article (long ago, now I am back!!) I talked about overfitting and the problems faced due to overfitting.

Many regularization approaches are based on limiting the capacity of models, such as neural networks, linear regression, or logistic regression, by adding a parameter norm penalty Ω(θ) to the objective function J.

X, y) + αΩ(θ) — {1} where α ∈[0, ∞) is a hyperparameter that weights the relative contribution of the norm penalty term, Ω, relative to the standard objective function J.

We note that for neural networks, we typically choose to use a parameter norm penalty Ω that penalizes only the weights of the aﬃne transformation at each layer and leaves the biases unregularized.

We therefore use the vector w to indicate all of the weights that should be aﬀected by a norm penalty, while the vector θ denotes all of the parameters, including both w and the unregularized parameters.

