STA 235H - RCTs and Observational Studies

class: center, middle, inverse, title-slide

.title[
# STA 235H - RCTs and Observational Studies
]
.subtitle[
## Fall 2023
]
.author[
### McCombs School of Business, UT Austin
]

---

.small .remark-code { /*Change made here*/
  font-size: 80% !important;
}

.tiny .remark-code { /*Change made here*/
 font-size: 90% !important;
}
</style>

# Housekeeping

- **.darkorange[Let's talk about ChatGPT]**.

- Should be use as a *complement* of learning, not a *substitute*.
 
 - ChatGPT is mainly useful when you are able to check the *accuracy* of its answers.
 
 - You need to do your own work.
 
--

- **.darkorange[No Office Hours this Thursday]**.

- I will hold OH (for this week) on Tues (4pm - 5:30pm) and Wed (10:30am - 11:30am)

---
# Last week

.pull-left[

.center[
![:scale 100%](https://media.giphy.com/media/jUzczP42fMgE8NxSZ0/giphy.gif)]
]

.pull-right[
- We talked about the **.darkorange[Ignorability Assumption]**

- Started discussing **.darkorange[randomized controlled trials]**.

- Why they are the gold standard.
  
  - How to analyze them.
  
]

---
# Today

.pull-left[

- Discuss about **.darkorange[limitations of RCTs]**:

- Generalizability
  - Spillover/General equilibrium effects.

- What is **.darkorange[selection on observables?]**:

- Omitted Variable Bias
  
  - Regression Adjustment
  
  - Matching
]

.pull-right[
![](https://media.giphy.com/media/a7NBvg3Ss8UYo/giphy.gif)
]

---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Limitations of RCTs
]
---
# Recap

- RCTs make the **.darkorange[ignorability assumption hold by design]**

.box-4trans[How?]

.center[
![:scale 70%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/random.png)]

---
# Examples of RCTs

.pull-left[
.center[
![:scale 75%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/linkedin_exp.png)
]
]
--
.pull-right[
.center[
![:scale 75%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/nyt_exp.jpg)
]
]

---
# Steps to analyze a RCT?

.pull-left-div3[
.box-4Trans[1) Check for balance]

.box-4trans[(Remember to transform categorical variables into binary ones!)]
]

.pull-center-div3[
.box-6Trans[2) Estimate diff. in means]

.box-6trans[(Simple regression between `Y` and `Z`)]
]

.pull-right-div3[
.box-7Trans[2)* Estimate diff. in means with covariates]

.box-7trans[(Multiple regression between `Y` and `Z`, adding other baseline covariates `X`)]
]

---
# Potential issues to have in mind

.box-3trans[Generalizability of our estimated effects (External Validity)]

- Where did we get our sample for our study from? Is it representative of a larger population?

.box-5trans[Spillover effects]

- Can an individual in the control group be affected by the treatment?

.box-7trans[General equilibrium effects]

- What happens if we scale up an intervention? Will the effect be the same?

---
# External vs Internal Validity

.center[
![:scale 75%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/validity.png)]

- Many times, RCTs use **.darkorange[convenience samples]**

---
# SUTVA: No interference

- Aside from **.darkorange[ignorability]**, RCTs rely on the **.darkorange[Stable Unit Treatment Value Assumption (SUTVA)]**

.box-4Trans["The treatment applied to one unit does not affect the outcome for other units"]

- No **.darkorange[spillovers]**

- No **.darkorange[general equilibrium effects]**

---
# Network effects (spillover) example

- RCT where students where **.darkorange[randomized]** into two groups:

- Treatment: Parents receive a text message when student misses school.
  - Control: Parents receive a general text message.
  
--

- Estimate the **.darkorange[effect of the intervention on attendance]**.

- Difference in average attendance between treated students and control students.

- **.darkorange[Potential problem]**: Students usually skip school with a friend.

.box-7trans[Why could this be a problem for causal inference?]

---
# Network effects

.box-6LA[Can we do something about this?]
 
 
--

1. **.darkorange[Randomize at a higher level]** (e.g. neighborhood, school, etc. instead of at the individual level)

2. **.darkorange[Model the network!]**

---
# General Equilibrium Effects

- Usually arise when you **.darkorange[scale up]** a program or intervention.

- Imagine you want to test the effect of providing information about employment and expected income to students to see whether it affect their choice of university and/or major.

--
 
 
.box-7Trans[What could happen if you offer it to everyone?]

---
background-position: 50% 50%
class: middle, center

.box-5Trans[Let's see another example]

---
# Get Out The Vote

.pull-left[
- "Get out the Vote" Large-Scale Mobilization experiment (Arceneaux, Gerber, and Green, 2006)
  
  - "Households containing one or two registered voters where **.darkorange[randomly assigned]** to treatment or control groups"
  
  - Treatment: GOTV phone calls
  
  - Stratified RCT: Two states divided into competitive and noncompetitive (randomized within state-competitiveness)
]
.pull-right[
![:scale 100%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/vote.jpg)
]

---
# Checking for balance

.center[
![:scale 100%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/balance_table_stratum.png)
]

---
 
 
 
 
 
 
.box-4LA[Let's go to R]

---
# Estimating the effect

- One important thing to note in the previous analysis is that **.darkorange[assignment to treatment]** `$\neq$` **.darkorange[contact]**

```r
d_s1 %>% count(treat_real, contact)
```

```
##   treat_real contact     n
## 1          0       0 17186
## 2          1       0  1626
## 3          1       1  1374
```

.box-5[Does this break the ignorability assumption?]

- **.darkorange[Non-compliance]**: When the treatment assignment (e.g. calling the household) is not the same as the treatment (e.g. actually receiving a call/ making contact with the household)

- What was **.darkorange[randomly assigned]** was calling the household.

- Usually, the effect of calling should be lower than the effect of actually receiving the call.

---

.box-4[Can we do something if we can't randomize??]
---
background-position: 50% 50%
class: left, bottom, inverse
.big[
Controlling by Confounders
]
---
# Controlling by Confounders

- We can control by a confounder by **.darkorange[including it in our regression]**:

- After we control for it, we are doing a fair comparison (e.g. *"holding X constant"*)
  
  .box-4trans[Conditional Independence Assumption (CIA)]

--
 
- *"Conditional on X, the ignorability assumption holds."*
 
--

- But is there another way to control for confounders?

.box-6Trans[Matching]

---
# Matching

Start with two groups: A treatment and a control group

.center[
![:scale 50%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/matching1.png)]

---
# Matching

For each unit in the treatment group, let's find a similar unit in the control group

.center[
![:scale 50%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/matching2.png)]

---
# Matching

And we do this for all units

.center[
![:scale 50%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/matching3.png)]

---
# Matching

Note that we might not be able to find similar units for everyone!

.center[
![:scale 50%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/matching4.png)]

---
# Matching

Then we just compare our matched groups

.center[
![:scale 50%](https://raw.githubusercontent.com/maibennett/sta235/main/exampleSite/content/Classes/Week6/1_ObsStudies/images/matching5.png)]

---
# Propensity Score Matching

- It is **.darkorange[difficult (impossible) to match on all the variables we want]** (potential confounders)

- The curse of dimensionality
  
--

- **.darkorange[Propensity score]**: Probability of being in the treatment group given the individuals characteristics.

`$$p = Pr(Z = 1) = \hat{\beta}_0 + \hat{\beta}_1X_1 + \hat{\beta}_2X_2 + ... + \hat{\beta}_kX_k$$`
- E.g. Two units have a 50% chance of being treated, but one was actually treated (Z=1) and the other one was not (Z=0).

- Don't need to calculate this by hand; we will use the `MatchIt` package.

---
 
 
 
 
 
 
.box-7Trans[Let's go to R]
---
# Omitted Variable Bias

- If we are under the presence of **.darkorange[confounders]**, then our estimates will be biased (i.e. will not recover the true causal effect) *unless we are able to control by them*.

- **.darkorange[Omitted Variable Bias]** represents the bias that stems from not being able to observe a confounding variable.

- If a potential confounder **.darkorange[is in our data]**, then it's not a problem!

- We can **.darkorange[control]** for it.
  
--

- Our headache will come from **.darkorange[unobserved confounders]**.

---
# Wrapping things up

.pull-left[
- If the **.darkorange[ignorability assumption doesn't hold]**, I can potentially control by all my confounders.

- Conditional Independence Assumption.
  
- **.darkorange[Unlikely to hold]**

- Do we have other alternatives?

- Let's see next class!
]

.pull-right[
.center[
![](https://media.giphy.com/media/d42bbeRtFSSru/giphy.gif)]
]