class: center, middle, inverse, title-slide .title[ # STA 235H - Multiple Regression: Polynomials ] .subtitle[ ## Fall 2023 ] .author[ ### McCombs School of Business, UT Austin ] --- <!-- <script type="text/javascript"> --> <!-- MathJax.Hub.Config({ --> <!-- "HTML-CSS": { --> <!-- preferredFont: null, --> <!-- webFont: "Neo-Euler" --> <!-- } --> <!-- }); --> <!-- </script> --> <style type="text/css"> .small .remark-code { /*Change made here*/ font-size: 80% !important; } .tiny .remark-code { /*Change made here*/ font-size: 90% !important; } </style> # Some Announcements .pull-left[ - **.darkorange[Homework answer key]** will be posted on Tuesday/Wednesday. - Make sure you check it out! - Exercises: Multiple regression (e.g. Bechdel Test example), differences in associations between groups (e.g. luxury vs non-luxury cars depreciation). - Check **.darkorange[personalized feedback]** for JITT 3, if included. - Additional videos on material (and some R code) in Resources > Videos ] -- .pull-right[ .center[ ![:scale 150%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/1_OLS_PotentialIssues/images/diff_pp_perc.png) ] ] --- # Today .pull-left[ - **.darkorange[Roadmap]** of where we've been and where we're going. - **.darkorange[Nonlinear models:]** - Polynomial terms - **.darkorange[Introduction to Causal Inference]** - Potential Outcomes Framework ] .pull-right[ .center[ ![](https://media.giphy.com/media/xTiN0CNHgoRf1Ha7CM/giphy.gif)] ] --- # Roadmap so far - Started the class with a review on **.darkorange[simple linear regressions]**: - Association between a variable `\(X\)` and outcome `\(Y\)` - e.g. `\(Revenue = \beta_0 + \beta_1 Bechdel + \varepsilon\)` -- <br> <br> - Followed by **.darkorange[multiple regression]**: - *Partial* association between `\(X\)` and `\(Y\)`, when holding other variables constant. - e.g. `\(Revenue = \beta_0 + \beta_1 Bechdel + \beta_2Revenue + \beta_3IMDB + \varepsilon\)` -- <br> <br> - What if we want to compare **.darkorange[differences in associations between groups]**?: - *Compare* the association between `\(X\)` and `\(Y\)` for group `\(D=1\)` and `\(D=0\)`. - e.g. `\(Price = \beta_0 + \beta_1 Year + \beta_2Luxury + \beta_3Year\times Luxury + \varepsilon\)` --- # Roadmap so far - What if our outcome `\(Y\)` is *weird* (e.g. not normally distributed)? - *If `\(Y\)` is skewed to the right (log-normal)*: Transform to `\(log(Y)\)` to improve linearity assumption! - e.g. `\(log(Price) = \beta_0 + \beta_1 Year + \beta_2Luxury + \beta_3Mileage + \varepsilon\)` - Interpret coefficients as **percent change** (%) -- <br> <br> - What if our outcome `\(Y\)` is *weird* (e.g. binary)? - e.g. `\(Employed = \beta_0 + \beta_1 Age + \beta_2Afam + \beta_3NKids + \varepsilon\)` - Interpret coefficients as **change in probability** (e.g. percentage points) -- <br> <br> - What if there **.darkorange[isn't a linear relation]** between `\(X\)` and `\(Y\)`? - Include **polynomial terms** for `\(X\)` -- <br> <br> - What if I want to know what is **.darkorange[the effect of X on Y]**? - Causal Inference! --- # Adding polynomial terms - Another way to capture **.darkorange[nonlinear associations]** between the outcome (Y) and covariates (X) is to include **.darkorange[polynomial terms]**: - e.g. `\(Y = \beta_0 + \beta_1X + \beta_2X^2 + \varepsilon\)` -- - Let's look at an example! --- # Determinants of wages: CPS 1985 <img src="f2023_sta235h_4_reg_files/figure-html/wages_inspect-1.svg" style="display: block; margin: auto;" /> --- # Determinants of wages: CPS 1985 <img src="f2023_sta235h_4_reg_files/figure-html/wages_inspect2-1.svg" style="display: block; margin: auto;" /> --- # Experience vs wages: CPS 1985 <br> <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages-1.svg" style="display: block; margin: auto;" /> --- # Experience vs wages: CPS 1985 `$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2Exp + \varepsilon$$` <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages2-1.svg" style="display: block; margin: auto;" /> --- # Experience vs wages: CPS 1985 `$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$` <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages3-1.svg" style="display: block; margin: auto;" /> --- # Mincer equation `$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$` -- - Interpret the coefficient for **.darkorange[education]** `$$\log(Wage) = 0.52 + 0.09\cdot Educ + 0.034\cdot Exp - 0.0005 \cdot Exp^2$$` -- - What is the association between **.darkorange[experience and wages]**? --- # Interpreting coefficients in quadratic equation <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages4-1.svg" style="display: block; margin: auto;" /> --- # Interpreting coefficients in quadratic equation <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages5-1.svg" style="display: block; margin: auto;" /> --- # Interpreting coefficients in quadratic equation <img src="f2023_sta235h_4_reg_files/figure-html/exp_wages6-1.svg" style="display: block; margin: auto;" /> --- # Interpreting coefficients in quadratic equation `$$\log(Wage) = \beta_0 + \beta_1 Educ + \beta_2 Exp + \beta_3 Exp^2 + \varepsilon$$` What is the association between experience and wages? -- - Pick a value for `\(Exp_0\)` (e.g. mean, median, one value of interest) -- .center[ *Increasing work experience from `\(Exp_0\)` to `\(Exp_0 + 1\)` years is associated, on average, to a `\((\hat{\beta}_2 + 2\hat{\beta}_3\times Exp_0)100\)`% increase on hourly wages, holding education constant*] -- Let's put some numbers into it: `$$\log(Wage) = 0.52 + 0.09\cdot Educ + 0.034\cdot Exp - 0.0005 \cdot Exp^2$$` -- .center[ *Increasing work experience from 20 to 21 years is associated, on average, to a `\((0.034 - 2\times 0.0005 \times 20)\times 100 = 1.4\)`% increase on hourly wages, holding education constant*] -- .small[*Note that in this case we are interpreting the association between Experience and Wages as a percent change, because Wages is in a logarithm!*] --- <br> <br> <br> <br> <br> <br> <br> .box-7Trans[Let's go to R] --- # References - Ismay, C. & A. Kim. (2021). “Statistical Inference via Data Science”. Chapter 6 & 10. <!-- pagedown::chrome_print('C:/Users/mc72574/Dropbox/Hugo/Sites/sta235/exampleSite/content/Classes/Week2/1_OLS/f2021_sta235h_3_reg.html') -->