STA 235H - Multiple Regression: Polynomials

class: center, middle, inverse, title-slide

.title[
# STA 235H - Multiple Regression: Polynomials
]
.subtitle[
## Fall 2023
]
.author[
### McCombs School of Business, UT Austin
]

---

.small .remark-code { /*Change made here*/
  font-size: 80% !important;
}

.tiny .remark-code { /*Change made here*/
  font-size: 90% !important;
}
</style>

# Some Announcements

.pull-left[
- **.darkorange[Homework answer key]** will be posted on Tuesday/Wednesday.

- Make sure you check it out!
  
  - Exercises: Multiple regression (e.g. Bechdel Test example), differences in associations between groups (e.g. luxury vs non-luxury cars depreciation).
  
- Check **.darkorange[personalized feedback]** for JITT 3, if included.

- Additional videos on material (and some R code) in Resources > Videos

]

.pull-right[
.center[
![:scale 150%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/1_OLS_PotentialIssues/images/diff_pp_perc.png)
]
]

---
# Today

.pull-left[
- **.darkorange[Roadmap]** of where we've been and where we're going.
  
- **.darkorange[Nonlinear models:]**
  - Polynomial terms
  
- **.darkorange[Introduction to Causal Inference]**
  - Potential Outcomes Framework
]

.pull-right[
.center[
![](https://media.giphy.com/media/xTiN0CNHgoRf1Ha7CM/giphy.gif)]
]

---
# Roadmap so far

- Started the class with a review on **.darkorange[simple linear regressions]**:
  - Association between a variable `$X$` and outcome `$Y$`
  - e.g. `$Revenue = \beta_0 + \beta_1 Bechdel + \varepsilon$`
--
<br>
<br>
- Followed by **.darkorange[multiple regression]**:
  - *Partial* association between `$X$` and `$Y$`, when holding other variables constant.
  - e.g. `$Revenue = \beta_0 + \beta_1 Bechdel + \beta_2Revenue + \beta_3IMDB + \varepsilon$`
--
<br>
<br>
- What if we want to compare **.darkorange[differences in associations between groups]**?:
  - *Compare* the association between `$X$` and `$Y$` for group `$D=1$` and `$D=0$`.
  - e.g. `$Price = \beta_0 + \beta_1 Year + \beta_2Luxury + \beta_3Year\times Luxury + \varepsilon$`
  
---
# Roadmap so far

- What if our outcome `$Y$` is *weird* (e.g. not normally distributed)?
  - *If `$Y$` is skewed to the right (log-normal)*: Transform to `$log(Y)$` to improve linearity assumption!
  - e.g. `$log(Price) = \beta_0 + \beta_1 Year + \beta_2Luxury + \beta_3Mileage + \varepsilon$`
  - Interpret coefficients as **percent change** (%)
--
<br>
<br>
- What if our outcome `$Y$` is *weird* (e.g. binary)?
  - e.g. `$Employed = \beta_0 + \beta_1 Age + \beta_2Afam + \beta_3NKids + \varepsilon$`
  - Interpret coefficients as **change in probability** (e.g. percentage points)
--
<br>
<br>
- What if there **.darkorange[isn't a linear relation]** between `$X$` and `$Y$`?
  - Include **polynomial terms** for `$X$`
--
<br>
<br>
- What if I want to know what is **.darkorange[the effect of X on Y]**?
  - Causal Inference!

---
# Adding polynomial terms

- Another way to capture **.darkorange[nonlinear associations]** between the outcome (Y) and covariates (X) is to include **.darkorange[polynomial terms]**:
  
  - e.g. `$Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon$`
  
--

- Let's look at an example!

---
# Determinants of wages: CPS 1985

---
# Determinants of wages: CPS 1985

---
# Experience vs wages: CPS 1985

<br>

---
# Experience vs wages: CPS 1985

`$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2Exp + \varepsilon$$`

---
# Experience vs wages: CPS 1985

`$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$`
<img src="f2023_sta235h_4_reg_files/figure-html/exp_wages3-1.svg" style="display: block; margin: auto;" />

---
# Mincer equation

`$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$`

--
- Interpret the coefficient for **.darkorange[education]**

`$$\log(Wage) = 0.52 + 0.09\cdot Educ + 0.034\cdot Exp - 0.0005 \cdot Exp^2$$`

- What is the association between **.darkorange[experience and wages]**?

---
# Interpreting coefficients in quadratic equation

---
# Interpreting coefficients in quadratic equation

---
# Interpreting coefficients in quadratic equation

---
# Interpreting coefficients in quadratic equation

`$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$`
What is the association between experience and wages?

- Pick a value for `$Exp_0$` (e.g. mean, median, one value of interest)

.center[
*Increasing work experience from `$Exp_0$` to `$Exp_0 + 1$` years is associated, on average, to a `$(\hat{\beta}_2 + 2\hat{\beta}_3\times Exp_0)100$`% increase on hourly wages, holding education constant*]

Let's put some numbers into it:

`$$\log(Wage) = 0.52 + 0.09\cdot Educ + 0.034\cdot Exp - 0.0005 \cdot Exp^2$$`

--
.center[
*Increasing work experience from 20 to 21 years is associated, on average, to a `$(0.034 - 2\times 0.0005 \times 20)\times 100 = 1.4$`% increase on hourly wages, holding education constant*]

.small[*Note that in this case we are interpreting the association between Experience and Wages as a percent change, because Wages is in a logarithm!*]

---
<br>
<br>
<br>
<br>
<br>
<br>
<br>

.box-7Trans[Let's go to R]

---
# References

- Ismay, C. & A. Kim. (2021). “Statistical Inference via Data Science”. Chapter 6 & 10.