Quick multiple regression review
What can we say using regressions?
Nothing "Ordinary" about OLS
What do you understand about regressions?
Linear Regression is a very useful tool.
It's a way to (partially) describe a data generating process (DGP).
Y=β0+β1X1+β2X2+ε
Y
Outcome Variable
Response Variable
Dependent Variable
Thing you want to explain or predict
Y
Outcome Variable
Response Variable
Dependent Variable
Thing you want to explain or predict
X
Explanatory Variable
Predictor Variable
Independent Variable
Thing you use to explain or predict Y
A study examines the effect of smoking on lung cancer
A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team
A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team
You want to see if taking more AP classes in high school improves college grades
A study examines the effect of smoking on lung cancer
Fantasy football fanatics predict the performance of a player based on past performance, health status, and characteristics of the opposite team
You want to see if taking more AP classes in high school improves college grades
Netflix uses your past viewing history, the day of the week, and the time of the day to guess which show you want to watch next
Prediction
Forecast the future
Focus is on Y
Netflix trying to guess your next show
Prediction
Forecast the future
Focus is on Y
Netflix trying to guess your next show
Explanation
Explain the effect of X on Y
Focus is on X
Netflix looking at the effect of time of the day on show selection
When we run a regression we have an outcome Y and explanatory variables or covariates X.
We want to estimate the β's
When we run a regression we have an outcome Y and explanatory variables or covariates X.
We want to estimate the β's
One important distinction:
minβ∑[Yi−(p∑j=1βjXij)]2
minβ∑[Yi−(p∑j=1βjXij)]2
Let's get into some data
Three criteria:
I'm a profit-maximizing investor and want to know whether it's in my best interest to switch a male for a female character.
I'm a profit-maximizing investor and want to know whether it's in my best interest to switch a male for a female character.
Revenue=α+βBechdel+ε
We have some data and code on the course website
Dataset from fivethirtyeight.com:
Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
---|---|---|---|---|---|---|---|
Year | 2087 | 2004.963 | 6.755 | 1990 | 1999 | 2011 | 2014 |
Adj_Revenue | 2087 | 66.254 | 92.07 | 0 | 4.36 | 86.936 | 968.41 |
Adj_Budget | 1369 | 61.498 | 57.784 | 0.02 | 19.3 | 88.47 | 470.839 |
Metascore | 1755 | 5.663 | 1.66 | 1.1 | 4.5 | 6.8 | 9.7 |
imdbRating | 2085 | 6.546 | 0.979 | 1.5 | 6 | 7.2 | 9.3 |
bechdel_test | 2087 | 0.571 | 0.495 | 0 | 0 | 1 | 1 |
summary(lm(Adj_Revenue ~ bechdel_test, data = bechdel))
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 76.4553 3.0641 24.9521 0## bechdel_test -17.8616 4.0544 -4.4055 0
summary(lm(Adj_Revenue ~ bechdel_test, data = bechdel))
## Estimate Std. Error t value Pr(>|t|)## (Intercept) 76.4553 3.0641 24.9521 0## bechdel_test -17.8616 4.0544 -4.4055 0
ˆβ0 is the average adjusted revenue (in millions of dollars) for movies that do not pass the Bechdel test.
On average, movies that pass the Bechdel test have an adjusted revenue that is |ˆβ1| million dollars less than a movie that doesn't pass the Bechdel test.
Negative effect of including more women?
Bechdel test could be capturing the effect of other variables:
What type of movies are the ones that pass the test?
What is their budget?
lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -127.0710 17.0563 -7.4501 0.0000## bechdel_test 11.0009 4.3786 2.5124 0.0121## Adj_Budget 1.1192 0.0367 30.4866 0.0000## Metascore 7.0254 1.9058 3.6864 0.0002## imdbRating 15.4631 3.3914 4.5595 0.0000
Positive and significant!
lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)
## Estimate Std. Error t value Pr(>|t|)## (Intercept) -127.0710 17.0563 -7.4501 0.0000## bechdel_test 11.0009 4.3786 2.5124 0.0121## Adj_Budget 1.1192 0.0367 30.4866 0.0000## Metascore 7.0254 1.9058 3.6864 0.0002## imdbRating 15.4631 3.3914 4.5595 0.0000
Positive and significant!
Regressions are super useful...
Be sure not to overstate your claims!
Remember the magic words for interpretation
Continue with multiple regression models:
"Nonlinear" models
Heiss, A. (2020). "Course: Program Evaluation for Public Service". Slides for Regression and Inference.
Ismay, C. & A. Kim. (2021). “Statistical Inference via Data Science”. Chapter 10.
Keegan, B. (2018). "The Need for Openess in Data Journalism". Github Repository
Quick multiple regression review
What can we say using regressions?
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |