STA 235H - Multiple Regression: Interactions & Nonlinearity
Fall 2023
McCombs School of Business, UT Austin
1 / 47

Before we start...

Use the knowledge check portion of the JITT to assess your own understanding:
- Be sure to answer the question correctly (look at the feedback provided)
- Feedback are guidelines; Try to use your own words.

2 / 47

Before we start...

Use the knowledge check portion of the JITT to assess your own understanding:
- Be sure to answer the question correctly (look at the feedback provided)
- Feedback are guidelines; Try to use your own words.
If you are struggling with material covered in STA 301H: Check the course website for resources and come to Office Hours.

2 / 47

Before we start...

Use the knowledge check portion of the JITT to assess your own understanding:
- Be sure to answer the question correctly (look at the feedback provided)
- Feedback are guidelines; Try to use your own words.
If you are struggling with material covered in STA 301H: Check the course website for resources and come to Office Hours.
Office Hours Prof. Bennett: Wed 10.30-11.30am and Thu 4.00-5.30pm

2 / 47

Before we start...

Use the knowledge check portion of the JITT to assess your own understanding:
- Be sure to answer the question correctly (look at the feedback provided)
- Feedback are guidelines; Try to use your own words.
If you are struggling with material covered in STA 301H: Check the course website for resources and come to Office Hours.
Office Hours Prof. Bennett: Wed 10.30-11.30am and Thu 4.00-5.30pm

No in-person class next week -- Recorded class

2 / 47

Today

Quick multiple regression review:
- Interpreting coefficients
- Interaction models
Looking at your data:
- Distributions
Nonlinear models:
- Logarithmic outcomes
- Polynomial terms

3 / 47

Remember last week's example? The Bechdel Test

Three criteria:
1. At least two named women
2. Who talk to each other
3. About something besides a man

4 / 47

Is it convenient for my movie to pass the Bechdel test?

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

What does each column represent?

5 / 47

Is it convenient for my movie to pass the Bechdel test?

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

"Estimate": Point estimates of our paramters $β$ . We call them $\hat{β}$ .

6 / 47

Is it convenient for my movie to pass the Bechdel test?

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

"Estimate": Point estimates of our paramters $β$ . We call them $\hat{β}$ .
"Standard Error" (SE): You can think about it as the variability of $\hat{β}$ . The smaller, the more precise $\hat{β}$ is!

6 / 47

Is it convenient for my movie to pass the Bechdel test?

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

"Estimate": Point estimates of our paramters $β$ . We call them $\hat{β}$ .
"Standard Error" (SE): You can think about it as the variability of $\hat{β}$ . The smaller, the more precise $\hat{β}$ is!
"t-value": A value of the Student distribution that measures how many SE away $\hat{β}$ is from 0. You can calculate it as $t v a l = \frac{\hat{β}}{S E}$ . It relates to our null-hypothesis $H_{0} : β = 0$ .

6 / 47

Is it convenient for my movie to pass the Bechdel test?

lm(Adj_Revenue ~ bechdel_test + Adj_Budget + Metascore + imdbRating, data=bechdel)

##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -127.0710    17.0563 -7.4501   0.0000
## bechdel_test   11.0009     4.3786  2.5124   0.0121
## Adj_Budget      1.1192     0.0367 30.4866   0.0000
## Metascore       7.0254     1.9058  3.6864   0.0002
## imdbRating     15.4631     3.3914  4.5595   0.0000

"Estimate": Point estimates of our paramters $β$ . We call them $\hat{β}$ .
"Standard Error" (SE): You can think about it as the variability of $\hat{β}$ . The smaller, the more precise $\hat{β}$ is!
"t-value": A value of the Student distribution that measures how many SE away $\hat{β}$ is from 0. You can calculate it as $t v a l = \frac{\hat{β}}{S E}$ . It relates to our null-hypothesis $H_{0} : β = 0$ .
"p-value": Probability of rejecting the null hypothesis and being wrong (Type I error). You want this to be a small as possible (statistically significant).

6 / 47

Reminder: Null-Hypothesis

We are testing $H_{0} : β = 0$ vs $H_{1} : β \neq 0$

"Reject the null hypothesis"

"Not reject the null hypothesis"

Note: Figures adapted from @AllisonHorst's art

7 / 47

Reminder: Null-Hypothesis

Reject the null if the t-value falls outside the dashed lines.

Note: Figures adapted from @AllisonHorst's art

8 / 47

One extra dollar in our budget

Imagine now that you have an hypothesis that Bechdel movies also get more bang for their buck, e.g. they get more revenue for an additional dollar in their budget.

How would you test that in an equation?

9 / 47

One extra dollar in our budget

Imagine now that you have an hypothesis that Bechdel movies also get more bang for their buck, e.g. they get more revenue for an additional dollar in their budget.

How would you test that in an equation?

Interactions!

9 / 47

One extra dollar in our budget

Interaction model:

$R e v e n u e = β_{0} + β_{1} B e c h d e l + β_{3} B u d g e t + β_{6} (B u d g e t \times B e c h d e l) + β_{4} I M D B + β_{5} M e t a S c o r e + ε$

10 / 47

One extra dollar in our budget

Interaction model:

$R e v e n u e = β_{0} + β_{1} B e c h d e l + β_{3} B u d g e t + β_{6} (B u d g e t \times B e c h d e l) + β_{4} I M D B + β_{5} M e t a S c o r e + ε$ How should we think about this?

Write the equation for a movie that does not pass the Bechdel test. How does it look like?

10 / 47

One extra dollar in our budget

Interaction model:

$R e v e n u e = β_{0} + β_{1} B e c h d e l + β_{3} B u d g e t + β_{6} (B u d g e t \times B e c h d e l) + β_{4} I M D B + β_{5} M e t a S c o r e + ε$ How should we think about this?

Write the equation for a movie that does not pass the Bechdel test. How does it look like?
Now do the same for a movie that passes the Bechdel test. How does it look like?

10 / 47

One extra dollar in our budget

Now, let's interpret some coefficients:

If $B e c h d e l = 0$ , then:

$R e v e n u e = β_{0} + β_{3} B u d g e t + β_{4} I M D B + β_{5} M e t a S c o r e + ε$

11 / 47

One extra dollar in our budget

Now, let's interpret some coefficients:

If $B e c h d e l = 0$ , then:

$R e v e n u e = β_{0} + β_{3} B u d g e t + β_{4} I M D B + β_{5} M e t a S c o r e + ε$
If $B e c h d e l = 1$ , then:

$R e v e n u e = (β_{0} + β_{1}) + (β_{3} + β_{6}) B u d g e t + β_{4} I M D B + β_{5} M e t a S c o r e + ε$

11 / 47

One extra dollar in our budget

Now, let's interpret some coefficients:

If $B e c h d e l = 0$ , then:

$R e v e n u e = β_{0} + β_{3} B u d g e t + β_{4} I M D B + β_{5} M e t a S c o r e + ε$
If $B e c h d e l = 1$ , then:

$R e v e n u e = (β_{0} + β_{1}) + (β_{3} + β_{6}) B u d g e t + β_{4} I M D B + β_{5} M e t a S c o r e + ε$

What is the difference in the association between budget and revenue for movies that pass the Bechdel test vs. those that don't?

11 / 47

Let's put some data into it

lm(Adj_Revenue ~ bechdel_test*Adj_Budget + Metascore + imdbRating, data=bechdel)

##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)             -124.1997    17.4932 -7.0999   0.0000
## bechdel_test               7.5138     6.4257  1.1693   0.2425
## Adj_Budget                 1.0926     0.0513 21.2865   0.0000
## Metascore                  7.1424     1.9126  3.7344   0.0002
## imdbRating                15.2268     3.4069  4.4694   0.0000
## bechdel_test:Adj_Budget    0.0546     0.0737  0.7416   0.4585

12 / 47

Let's put some data into it

lm(Adj_Revenue ~ bechdel_test*Adj_Budget + Metascore + imdbRating, data=bechdel)

##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)             -124.1997    17.4932 -7.0999   0.0000
## bechdel_test               7.5138     6.4257  1.1693   0.2425
## Adj_Budget                 1.0926     0.0513 21.2865   0.0000
## Metascore                  7.1424     1.9126  3.7344   0.0002
## imdbRating                15.2268     3.4069  4.4694   0.0000
## bechdel_test:Adj_Budget    0.0546     0.0737  0.7416   0.4585

What is the association between budget and revenue for movies that pass the Bechdel test?
What is the difference in the association between budget and revenue for movies that pass vs movies that don't pass the Bechdel test?

12 / 47

Let's put some data into it

lm(Adj_Revenue ~ bechdel_test*Adj_Budget + Metascore + imdbRating, data=bechdel)

##                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)             -124.1997    17.4932 -7.0999   0.0000
## bechdel_test               7.5138     6.4257  1.1693   0.2425
## Adj_Budget                 1.0926     0.0513 21.2865   0.0000
## Metascore                  7.1424     1.9126  3.7344   0.0002
## imdbRating                15.2268     3.4069  4.4694   0.0000
## bechdel_test:Adj_Budget    0.0546     0.0737  0.7416   0.4585

What is the association between budget and revenue for movies that pass the Bechdel test?
What is the difference in the association between budget and revenue for movies that pass vs movies that don't pass the Bechdel test?
Is that difference statistically significant (at conventional levels)?

12 / 47

Let's look at another example

13 / 47

Cars, cars, cars

Used cars in South California (from this week's JITT)

cars <- read.csv("https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week2/1_OLS/data/SoCalCars.csv", stringsAsFactors = FALSE)
names(cars)

##  [1] "type"      "certified" "body"      "make"      "model"     "trim"     
##  [7] "mileage"   "price"     "year"      "dealer"    "city"      "rating"   
## [13] "reviews"   "badge"

Data source: "Modern Business Analytics" (Taddy, Hendrix, & Harding, 2018)

14 / 47

Luxury vs. non-luxury cars?

Do you think there's a difference between how price changes over time for luxury vs non-luxury cars?

15 / 47

Luxury vs. non-luxury cars?

Do you think there's a difference between how price changes over time for luxury vs non-luxury cars?

How would you test this?

15 / 47

Let's go to R

16 / 47

Models with interactions

You include the interaction between two (or more) covariates:

$\hat{P r i c e} = β_{0} + {\hat{β}}_{1} R a t i n g + {\hat{β}}_{2} M i l e s + {\hat{β}}_{3} L u x u r y + {\hat{β}}_{4} Y e a r + {\hat{β}}_{5} L u x u r y \times Y e a r$

17 / 47

Models with interactions

You include the interaction between two (or more) covariates:

$\hat{P r i c e} = β_{0} + {\hat{β}}_{1} R a t i n g + {\hat{β}}_{2} M i l e s + {\hat{β}}_{3} L u x u r y + {\hat{β}}_{4} Y e a r + {\hat{β}}_{5} L u x u r y \times Y e a r$

${\hat{β}}_{3}$ and ${\hat{β}}_{4}$ are considered the main effects (no interaction)

17 / 47

Models with interactions

You include the interaction between two (or more) covariates:

$\hat{P r i c e} = β_{0} + {\hat{β}}_{1} R a t i n g + {\hat{β}}_{2} M i l e s + {\hat{β}}_{3} L u x u r y + {\hat{β}}_{4} Y e a r + {\hat{β}}_{5} L u x u r y \times Y e a r$

${\hat{β}}_{3}$ and ${\hat{β}}_{4}$ are considered the main effects (no interaction)

The coefficient you are interested in is ${\hat{β}}_{5}$ :
- Difference in the price change for one additional year between luxury vs non-luxury cars, holding other variables constant.

17 / 47

Now it's your turn

Looking at this equation:

$\hat{P r i c e} = β_{0} + {\hat{β}}_{1} R a t i n g + {\hat{β}}_{2} M i l e s + {\hat{β}}_{3} L u x u r y + {\hat{β}}_{4} Y e a r + {\hat{β}}_{5} L u x u r y \times Y e a r$ 1) What is the association between price and year for non-luxury cars?

18 / 47

Now it's your turn

Looking at this equation:

2) What is the association between price and year for luxury cars?

18 / 47

Looking at our dataWe have dived into running models head on. Is that a good idea?
19 / 47

Looking at our data

We have dived into running models head on. Is that a good idea?

19 / 47

What should we do before we ran any model?

20 / 47

Inspect your data!

21 / 47

Some ideas:

Use vtable:

library(vtable)
vtable(cars)

22 / 47

Some ideas:

Use vtable:

library(vtable)
vtable(cars)

Use summary to see the min, max, mean, and quartile:

cars %>% select(price, mileage, year) %>% summary(.)

##      price            mileage            year     
##  Min.   :   1790   Min.   :     0   Min.   :1966  
##  1st Qu.:  16234   1st Qu.:     5   1st Qu.:2017  
##  Median :  23981   Median :    56   Median :2019  
##  Mean   :  32959   Mean   : 21873   Mean   :2018  
##  3rd Qu.:  36745   3rd Qu.: 36445   3rd Qu.:2020  
##  Max.   :1499000   Max.   :292952   Max.   :2021

22 / 47

Some ideas:

Use vtable:

library(vtable)
vtable(cars)

Use summary to see the min, max, mean, and quartile:

cars %>% select(price, mileage, year) %>% summary(.)

##      price            mileage            year     
##  Min.   :   1790   Min.   :     0   Min.   :1966  
##  1st Qu.:  16234   1st Qu.:     5   1st Qu.:2017  
##  Median :  23981   Median :    56   Median :2019  
##  Mean   :  32959   Mean   : 21873   Mean   :2018  
##  3rd Qu.:  36745   3rd Qu.: 36445   3rd Qu.:2020  
##  Max.   :1499000   Max.   :292952   Max.   :2021

Plot your data!

22 / 47

Look at the data

23 / 47

Look at the data

What can you say about this variable?

24 / 47

Logarithms to the rescue?

25 / 47

Logarithms to the rescue?

25 / 47

How would we interpret coefficients now?

Let's interpret the coefficient for $M i l e s$ in the following equation:

$\log (P r i c e) = β_{0} + β_{1} R a t i n g + β_{2} M i l e s + β_{3} L u x u r y + β_{4} Y e a r + ε$

26 / 47

How would we interpret coefficients now?

Let's interpret the coefficient for $M i l e s$ in the following equation:

$\log (P r i c e) = β_{0} + β_{1} R a t i n g + β_{2} M i l e s + β_{3} L u x u r y + β_{4} Y e a r + ε$

Remember: $β_{2}$ represents the average change in the outcome variable, $\log (P r i c e)$ , for a one-unit increase in the independent variable $M i l e s$ .
- Think about the units of the dependent and independent variables!

26 / 47

A side note on log-transformed variables...

$\log (Y) = {\hat{β}}_{0} + {\hat{β}}_{1} X$

We want to compare the outcome for a regression with $X = x$ and $X = x + 1$

27 / 47

A side note on log-transformed variables...

$\log (Y) = {\hat{β}}_{0} + {\hat{β}}_{1} X$

We want to compare the outcome for a regression with $X = x$ and $X = x + 1$

$\log (y_{0}) = {\hat{β}}_{0} + {\hat{β}}_{1} x (1)$

and

$\log (y_{1}) = {\hat{β}}_{0} + {\hat{β}}_{1} (x + 1) (2)$

27 / 47

A side note on log-transformed variables...

$\log (Y) = {\hat{β}}_{0} + {\hat{β}}_{1} X$

We want to compare the outcome for a regression with $X = x$ and $X = x + 1$

$\log (y_{0}) = {\hat{β}}_{0} + {\hat{β}}_{1} x (1)$

and

$\log (y_{1}) = {\hat{β}}_{0} + {\hat{β}}_{1} (x + 1) (2)$

Let's subtract (2) - (1)!

27 / 47

A side note on log-transformed variables...