This week we will be talking about model selection and regularization. In particular, lasso and ridge regression.

Recommended readings/videos

James, G. et al. (2021). “An Introduction to Statistical Learning with Applications in R” (ISLR). Chapter 6.2 (pg. 237-242). Note: For ISLR readings, don’t get caught up in the math.

Josh Starmer. (2018). “Regularization Part 1: Ridge (L2) Regression”. Video materials from StatQuest. Note: I usually watch his videos at x1.5 speed.

Josh Starmer. (2018). “Regularization Part 2: Lasso (L1) Regression”. Video materials from StatQuest. Note: I usually watch his videos at x1.5 speed.

Here, I provide a simple example to understand why this happens. Let’s think about the simplest scenario with just one data point and one predictor (we won’t take the intercept into account, because it doesn’t affect the prediction). As seen in class, the objective function that ridge regression is trying to minizime is the following:

In this case, for non-zero values of $x$ and $y$, then $\beta$ cannot be shrunk to exactly 0, because the numerator will always be different from 0. However, if $\lambda \rightarrow \infty$, then $\beta \rightarrow 0$.

In the case of lasso, now, assuming a positive value for $\beta$ (though it works the same if $\beta<0$), we have the following objective function and FOC: