STA 235H - Multiple Regression: Overview and Statistical Adjustment
Fall 2023
McCombs School of Business, UT Austin
1 / 28

Today

Quick multiple regression review
- How does OLS work?
What can we say using regressions?
- Interpreting coefficients

2 / 28

Nothing "Ordinary" about OLS

3 / 28

What do you understand about regressions?

4 / 28

Remembering Regressions

Linear Regression is a very useful tool.
- Simple supervised learning approach.
- Many fancy methods are generalizations or extensions of linear regression!
It's a way to (partially) describe a data generating process (DGP).

$Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + ε$

5 / 28

Essential Parts of a Regression

Outcome Variable

Response Variable

Dependent Variable

Thing you want to explain or predict

6 / 28

Essential Parts of a Regression

Outcome Variable

Response Variable

Dependent Variable

Thing you want to explain or predict

Explanatory Variable

Predictor Variable

Independent Variable

Thing you use to explain or predict Y

7 / 28

Identify the variables

A study examines the effect of smoking on lung cancer

8 / 28

Identify the variables

A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team

9 / 28

Identify the variables

A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team

You want to see if taking more AP classes in high school improves college grades

10 / 28

Identify the variables

A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team

You want to see if taking more AP classes in high school improves college grades
Netflix uses your past viewing history, the day of the week, and the time of the day to guess which show you want to watch next

11 / 28

Two Purposes of Regression

Prediction

Forecast the future

Focus is on Y

Netflix trying to guess your next show

12 / 28

Two Purposes of Regression

Prediction

Forecast the future

Focus is on Y

Netflix trying to guess your next show

Explanation

Explain the effect of X on Y

Focus is on X

Netflix looking at the effect of time of the day on show selection

12 / 28

What do we want to estimate in a regression?

When we run a regression we have an outcome $Y$ and explanatory variables or covariates $X$ .
We want to estimate the $β$ 's

13 / 28

What do we want to estimate in a regression?

When we run a regression we have an outcome $Y$ and explanatory variables or covariates $X$ .
We want to estimate the $β$ 's
One important distinction:
- $β$ 's are the population parameters we want to estimate.
- $\hat{β}$ are the estimates of those parameters.

13 / 28

How do we estimate the coefficients in a regression ?

Ordinary Least Squares is the most popular way.

$min_{β} \sum [Y_{i} - (\sum_{j = 1}^{p} β_{j} X_{i j})]^{2}$

14 / 28

How do we estimate the coefficients in a regression ?

Ordinary Least Squares is the most popular way.

$min_{β} \sum [Y_{i} - (\sum_{j = 1}^{p} β_{j} X_{i j})]^{2}$

14 / 28

How do we estimate the coefficients in a regression ? (cont.)

15 / 28

Let's get into some data

16 / 28

Let's introduce an example: The Bechdel Test17 / 28

Let's introduce an example: The Bechdel Test

Three criteria:
1. At least two named women
2. Who talk to each other
3. About something besides a man

17 / 28

Do movies pass the test?

18 / 28

Is it convenient for my movie to pass the Bechdel test?

I'm a profit-maximizing investor and want to know whether it's in my best interest to switch a male for a female character.
- What is the simplest model you could fit?

19 / 28

Is it convenient for my movie to pass the Bechdel test?

I'm a profit-maximizing investor and want to know whether it's in my best interest to switch a male for a female character.
- What is the simplest model you could fit?

$R e v e n u e = α + β B e c h d e l + ε$

19 / 28

Let's analyze some models

We have some data and code on the course website
Dataset from fivethirtyeight.com:
- Focus on 1990 onward

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 75	Max
Year	2087	2004.963	6.755	1990	1999	2011	2014
Adj_Revenue	2087	66.254	92.07	0	4.36	86.936	968.41
Adj_Budget	1369	61.498	57.784	0.02	19.3	88.47	470.839
Metascore	1755	5.663	1.66	1.1	4.5	6.8	9.7
imdbRating	2085	6.546	0.979	1.5	6	7.2	9.3
bechdel_test	2087	0.571	0.495	0	0	1	1

20 / 28

Let's analyze some models

summary(lm(Adj_Revenue ~ bechdel_test, data = bechdel))

##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)   76.4553     3.0641 24.9521        0
## bechdel_test -17.8616     4.0544 -4.4055        0

How do you interpret these results?

21 / 28

Let's analyze some models

summary(lm(Adj_Revenue ~ bechdel_test, data = bechdel))

##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)   76.4553     3.0641 24.9521        0
## bechdel_test -17.8616     4.0544 -4.4055        0

${\hat{β}}_{0}$ is the average adjusted revenue (in millions of dollars) for movies that do not pass the Bechdel test.
On average, movies that pass the Bechdel test have an adjusted revenue that is $| {\hat{β}}_{1} |$ million dollars less than a movie that doesn't pass the Bechdel test.

Negative effect of including more women?

22 / 28

What gives?

23 / 28

More variables

Bechdel test could be capturing the effect of other variables:
- What type of movies are the ones that pass the test?
- What is their budget?

24 / 28

More variables

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

Positive and significant!

25 / 28

More variables

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

Positive and significant!

How do we interpret the relevant coefficient now?

25 / 28

Main takeaway points

Regressions are super useful...
- But you need to know how to interpret them.
Be sure not to overstate your claims!
Remember the magic words for interpretation

26 / 28

Next class

Continue with multiple regression models:
- Interactions and how to interpret them
"Nonlinear" models

27 / 28

References

Heiss, A. (2020). "Course: Program Evaluation for Public Service". Slides for Regression and Inference.
Ismay, C. & A. Kim. (2021). “Statistical Inference via Data Science”. Chapter 10.
Keegan, B. (2018). "The Need for Openess in Data Journalism". Github Repository

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

STA 235H - Multiple Regression: Overview and Statistical Adjustment

Fall 2023

McCombs School of Business, UT Austin

Today

Remembering Regressions

Essential Parts of a Regression

Essential Parts of a Regression

Identify the variables

Identify the variables

Identify the variables

Identify the variables

Two Purposes of Regression

Two Purposes of Regression

What do we want to estimate in a regression?

What do we want to estimate in a regression?

How do we estimate the coefficients in a regression ?

How do we estimate the coefficients in a regression ?

How do we estimate the coefficients in a regression ? (cont.)

Let's introduce an example: The Bechdel Test

Let's introduce an example: The Bechdel Test

Do movies pass the test?

Is it convenient for my movie to pass the Bechdel test?

Is it convenient for my movie to pass the Bechdel test?

Let's analyze some models

Let's analyze some models

Let's analyze some models

What gives?

More variables

More variables

More variables

Main takeaway points

Next class

References

Today

Help