STA 235H - Model Selection I:
Bias vs Variance, Cross-Validation, and Stepwise
Fall 2023
McCombs School of Business, UT Austin
1 / 39

Announcements

Re-grading for homework 3 available until this Thursday.
- Please check the rubric and based on that ask for a specific re-grade.

2 / 39

Announcements

Re-grading for homework 3 available until this Thursday.
- Please check the rubric and based on that ask for a specific re-grade.
Think of assignment drop as an insurance policy.
- Start assignments with enough time if you already think you used your drop.

2 / 39

Announcements

Re-grading for homework 3 available until this Thursday.
- Please check the rubric and based on that ask for a specific re-grade.
Think of assignment drop as an insurance policy.
- Start assignments with enough time if you already think you used your drop.
Grades for the midterm will be posted on Tuesday.
- Importance of completing assignments (e.g. practice quiz, JITTs).
- Final exam will have limited notes.

2 / 39

Announcements

Re-grading for homework 3 available until this Thursday.
- Please check the rubric and based on that ask for a specific re-grade.
Think of assignment drop as an insurance policy.
- Start assignments with enough time if you already think you used your drop.
Grades for the midterm will be posted on Tuesday.
- Importance of completing assignments (e.g. practice quiz, JITTs).
- Final exam will have limited notes.
Start of a completely new chapter
- If you struggled with causal inference, doesn't mean that you can't do very well in this second part.

2 / 39

Last class

Finished with causal inference, discussing regression discontinuity designs
- We will review the JITT (slides will be posted tomorrow)
- Importance of doing the coding exercises

3 / 39

JITT 9: Regression discontinuity designRDD allows us to compare people exactly at the cutoff if they were treated vs not treated, and estimate a Local Average Treatment Effect (LATE) for those units.
4 / 39

JITT 9: Regression discontinuity design

RDD allows us to compare people exactly at the cutoff if they were treated vs not treated, and estimate a Local Average Treatment Effect (LATE) for those units.
In the example for the JITT, the treatment is being legally able to drink (and the control is not being legally able to drink).

4 / 39

JITT 9: Regression discontinuity design

RDD allows us to compare people exactly at the cutoff if they were treated vs not treated, and estimate a Local Average Treatment Effect (LATE) for those units.
In the example for the JITT, the treatment is being legally able to drink (and the control is not being legally able to drink).
The code you had to run is: summary(rdrobust(mlda$all, mlda$r, c = 0))
- In this case, remember that all is our outcome (total number of arrests), r is our centered running variable (age minus the cutoff), and c = 0 is our cutoff (remember that r is centered around 0, so the cutoff is 0 and not 7670).
- You have to look at the coefficient in the table (Conventional)... and remember to also look at the p-value!

4 / 39

JITT 9: Regression discontinuity design

RDD allows us to compare people exactly at the cutoff if they were treated vs not treated, and estimate a Local Average Treatment Effect (LATE) for those units.
In the example for the JITT, the treatment is being legally able to drink (and the control is not being legally able to drink).
The code you had to run is: summary(rdrobust(mlda$all, mlda$r, c = 0))
- In this case, remember that all is our outcome (total number of arrests), r is our centered running variable (age minus the cutoff), and c = 0 is our cutoff (remember that r is centered around 0, so the cutoff is 0 and not 7670).
- You have to look at the coefficient in the table (Conventional)... and remember to also look at the p-value!
"On average, for individuals with exactly 21 years of age, being legally able to drink increases the total number of arrests by 409.1, compared to not being legally able to drink"

4 / 39

Introduction to prediction

So far, we had been focusing on causal inference:
- Estimating an effect and "predicting" a counterfactual (what if?)
Now, we will focus on prediction:
- Estimate/predict outcomes under specific conditions.

5 / 39

Differences between inference and prediction

Inference $\to$ focus on covariate
- Interpretability of model.
Prediction $\to$ focus on outcome variable
- Accuracy of model.

Both can be complementary!

6 / 39

Example: What is churn?

Churn: Measure of how many customers stop using your product (e.g. cancel a subscription).

7 / 39

Example: What is churn?

Churn: Measure of how many customers stop using your product (e.g. cancel a subscription).

Less costly to keep a customer than bring a new one

8 / 39

Example: What is churn?

Churn: Measure of how many customers stop using your product (e.g. cancel a subscription).

Less costly to keep a customer than bring a new one

Prevent churn

9 / 39

Example: What is churn?

Churn: Measure of how many customers stop using your product (e.g. cancel a subscription).

Less costly to keep a customer than bring a new one

Prevent churn

Identify customer that are likely to cancel/quit/fail to renew

10 / 39

Bias vs Variance

"There are no free lunches in statistics"

11 / 39

Bias vs Variance

"There are no free lunches in statistics"

Not one method dominates others: Context/dataset dependent.
Remember that the goal of prediction is to have a method that is accurate in predicting outcomes on previously unseen data.
- Validation set approach: Training and testing data

11 / 39

Bias vs Variance

"There are no free lunches in statistics"

Not one method dominates others: Context/dataset dependent.
Remember that the goal of prediction is to have a method that is accurate in predicting outcomes on previously unseen data.
- Validation set approach: Training and testing data

Balance between flexibility and accuracy

11 / 39

Bias vs Variance

Variance

"[T]he amount by which the function f would change if we estimated it using a different training dataset"

12 / 39

Bias vs Variance

Variance

"[T]he amount by which the function f would change if we estimated it using a different training dataset"

Bias

"[E]rror introduced by approximating a real-life problem with a model"

12 / 39

Q1:Which models do you think are higher variance?

a) More flexible models

b) Less flexible models

13 / 39

Bias vs. Variance: The ultimate battle

In inference, bias >> variance
In prediction, we care about both:
- Measures of accuracy will have both bias and variance.
Trade-off at different rates

14 / 39

How do we measure accuracy?