STA 235H - RCTs and Observational Studies
Fall 2023
McCombs School of Business, UT Austin
1 / 31

Housekeeping

Let's talk about ChatGPT.
- Should be use as a complement of learning, not a substitute.
- ChatGPT is mainly useful when you are able to check the accuracy of its answers.
- You need to do your own work.

2 / 31

Housekeeping

Let's talk about ChatGPT.
- Should be use as a complement of learning, not a substitute.
- ChatGPT is mainly useful when you are able to check the accuracy of its answers.
- You need to do your own work.
No Office Hours this Thursday.
- I will hold OH (for this week) on Tues (4pm - 5:30pm) and Wed (10:30am - 11:30am)

2 / 31

Last week

We talked about the Ignorability Assumption
Started discussing randomized controlled trials.
- Why they are the gold standard.
- How to analyze them.

3 / 31

Today

Discuss about limitations of RCTs:
- Generalizability
- Spillover/General equilibrium effects.
What is selection on observables?:
- Omitted Variable Bias
- Regression Adjustment
- Matching

4 / 31

Limitations of RCTs

5 / 31

RecapRCTs make the ignorability assumption hold by design
6 / 31

Recap

RCTs make the ignorability assumption hold by design

How?

6 / 31

Recap

RCTs make the ignorability assumption hold by design

How?

6 / 31

Examples of RCTs

7 / 31

Examples of RCTs

7 / 31

Steps to analyze a RCT?8 / 31

Steps to analyze a RCT?

1) Check for balance

(Remember to transform categorical variables into binary ones!)

8 / 31

Steps to analyze a RCT?

1) Check for balance

(Remember to transform categorical variables into binary ones!)

2) Estimate diff.
in means

(Simple regression
between Y and Z)

8 / 31

Steps to analyze a RCT?

1) Check for balance

(Remember to transform categorical variables into binary ones!)

2) Estimate diff.
in means

(Simple regression
between Y and Z)

2)* Estimate diff. in means with covariates

(Multiple regression between Y and Z, adding other baseline covariates X)

8 / 31

Potential issues to have in mind9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

Where did we get our sample for our study from? Is it representative of a larger population?

9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

Where did we get our sample for our study from? Is it representative of a larger population?

Spillover effects

9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

Where did we get our sample for our study from? Is it representative of a larger population?

Spillover effects

Can an individual in the control group be affected by the treatment?

9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

Where did we get our sample for our study from? Is it representative of a larger population?

Spillover effects

Can an individual in the control group be affected by the treatment?

General equilibrium effects

9 / 31

Potential issues to have in mind

Generalizability of our estimated effects (External Validity)

Where did we get our sample for our study from? Is it representative of a larger population?

Spillover effects

Can an individual in the control group be affected by the treatment?

General equilibrium effects

What happens if we scale up an intervention? Will the effect be the same?

9 / 31

External vs Internal Validity

10 / 31

External vs Internal Validity

Many times, RCTs use convenience samples

10 / 31

SUTVA: No interferenceAside from ignorability, RCTs rely on the Stable Unit Treatment Value Assumption (SUTVA)
11 / 31

SUTVA: No interference

Aside from ignorability, RCTs rely on the Stable Unit Treatment Value Assumption (SUTVA)

"The treatment applied to one unit does not affect the outcome for other units"

11 / 31

SUTVA: No interference

Aside from ignorability, RCTs rely on the Stable Unit Treatment Value Assumption (SUTVA)

"The treatment applied to one unit does not affect the outcome for other units"

No spillovers
No general equilibrium effects

11 / 31

Network effects (spillover) example

RCT where students where randomized into two groups:
- Treatment: Parents receive a text message when student misses school.
- Control: Parents receive a general text message.

12 / 31

Network effects (spillover) example

RCT where students where randomized into two groups:
- Treatment: Parents receive a text message when student misses school.
- Control: Parents receive a general text message.
Estimate the effect of the intervention on attendance.
- Difference in average attendance between treated students and control students.

12 / 31

Network effects (spillover) example

RCT where students where randomized into two groups:
- Treatment: Parents receive a text message when student misses school.
- Control: Parents receive a general text message.
Estimate the effect of the intervention on attendance.
- Difference in average attendance between treated students and control students.
Potential problem: Students usually skip school with a friend.

12 / 31

Network effects (spillover) example

RCT where students where randomized into two groups:
- Treatment: Parents receive a text message when student misses school.
- Control: Parents receive a general text message.
Estimate the effect of the intervention on attendance.
- Difference in average attendance between treated students and control students.
Potential problem: Students usually skip school with a friend.

Why could this be a problem for causal inference?

12 / 31

Network effects

Can we do something about this?

13 / 31

Network effects

Can we do something about this?

Randomize at a higher level (e.g. neighborhood, school, etc. instead of at the individual level)

13 / 31

Network effects

Can we do something about this?

Randomize at a higher level (e.g. neighborhood, school, etc. instead of at the individual level)
Model the network!

13 / 31

General Equilibrium Effects

Usually arise when you scale up a program or intervention.
Imagine you want to test the effect of providing information about employment and expected income to students to see whether it affect their choice of university and/or major.

14 / 31

General Equilibrium Effects

Usually arise when you scale up a program or intervention.
Imagine you want to test the effect of providing information about employment and expected income to students to see whether it affect their choice of university and/or major.

What could happen if you offer it to everyone?

14 / 31

Let's see another example

15 / 31

Get Out The Vote

"Get out the Vote" Large-Scale Mobilization experiment (Arceneaux, Gerber, and Green, 2006)
- "Households containing one or two registered voters where randomly assigned to treatment or control groups"
- Treatment: GOTV phone calls
- Stratified RCT: Two states divided into competitive and noncompetitive (randomized within state-competitiveness)

16 / 31

Checking for balance

17 / 31

Let's go to R

18 / 31

Estimating the effect

One important thing to note in the previous analysis is that assignment to treatment $\neq$ contact

d_s1 %>% count(treat_real, contact)

##   treat_real contact     n
## 1          0       0 17186
## 2          1       0  1626
## 3          1       1  1374

19 / 31

Estimating the effect

One important thing to note in the previous analysis is that assignment to treatment $\neq$ contact

d_s1 %>% count(treat_real, contact)

##   treat_real contact     n
## 1          0       0 17186
## 2          1       0  1626
## 3          1       1  1374

Does this break the ignorability assumption?

19 / 31

Estimating the effect

One important thing to note in the previous analysis is that assignment to treatment $\neq$ contact

d_s1 %>% count(treat_real, contact)

##   treat_real contact     n
## 1          0       0 17186
## 2          1       0  1626
## 3          1       1  1374

Does this break the ignorability assumption?

Non-compliance: When the treatment assignment (e.g. calling the household) is not the same as the treatment (e.g. actually receiving a call/ making contact with the household)
What was randomly assigned was calling the household.
Usually, the effect of calling should be lower than the effect of actually receiving the call.

19 / 31

Can we do something if we can't randomize??

20 / 31

Controlling by Confounders

21 / 31

Controlling by Confounders

We can control by a confounder by including it in our regression:
- After we control for it, we are doing a fair comparison (e.g. "holding X constant")
Conditional Independence Assumption (CIA)

22 / 31

Controlling by Confounders

We can control by a confounder by including it in our regression:
- After we control for it, we are doing a fair comparison (e.g. "holding X constant")
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."

22 / 31

Controlling by Confounders

We can control by a confounder by including it in our regression:
- After we control for it, we are doing a fair comparison (e.g. "holding X constant")
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."
But is there another way to control for confounders?

22 / 31

Controlling by Confounders

We can control by a confounder by including it in our regression:
- After we control for it, we are doing a fair comparison (e.g. "holding X constant")
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."
But is there another way to control for confounders?

Matching

22 / 31

Matching

Start with two groups: A treatment and a control group

23 / 31

Matching

For each unit in the treatment group, let's find a similar unit in the control group

24 / 31

Matching

And we do this for all units

25 / 31

Matching

Note that we might not be able to find similar units for everyone!

26 / 31

Matching

Then we just compare our matched groups

27 / 31

Propensity Score Matching

It is difficult (impossible) to match on all the variables we want (potential confounders)
- The curse of dimensionality

28 / 31

Propensity Score Matching

It is difficult (impossible) to match on all the variables we want (potential confounders)
- The curse of dimensionality
Propensity score: Probability of being in the treatment group given the individuals characteristics.

$p = P r (Z = 1) = {\hat{β}}_{0} + {\hat{β}}_{1} X_{1} + {\hat{β}}_{2} X_{2} + . . . + {\hat{β}}_{k} X_{k}$

E.g. Two units have a 50% chance of being treated, but one was actually treated (Z=1) and the other one was not (Z=0).

28 / 31

Propensity Score Matching

It is difficult (impossible) to match on all the variables we want (potential confounders)
- The curse of dimensionality
Propensity score: Probability of being in the treatment group given the individuals characteristics.

$p = P r (Z = 1) = {\hat{β}}_{0} + {\hat{β}}_{1} X_{1} + {\hat{β}}_{2} X_{2} + . . . + {\hat{β}}_{k} X_{k}$

E.g. Two units have a 50% chance of being treated, but one was actually treated (Z=1) and the other one was not (Z=0).
Don't need to calculate this by hand; we will use the MatchIt package.

28 / 31

Let's go to R

29 / 31

Omitted Variable BiasIf we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
30 / 31

Omitted Variable Bias

If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.

30 / 31

Omitted Variable Bias

If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.
If a potential confounder is in our data, then it's not a problem!
- We can control for it.

30 / 31

Omitted Variable Bias

If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.
If a potential confounder is in our data, then it's not a problem!
- We can control for it.
Our headache will come from unobserved confounders.

30 / 31

Wrapping things up

If the ignorability assumption doesn't hold, I can potentially control by all my confounders.
- Conditional Independence Assumption.
Unlikely to hold
Do we have other alternatives?
- Let's see next class!

31 / 31

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

STA 235H - RCTs and Observational Studies

Fall 2023

McCombs School of Business, UT Austin

Housekeeping

Housekeeping

Last week

Today

Recap

Recap

Recap

Examples of RCTs

Examples of RCTs

Steps to analyze a RCT?

Steps to analyze a RCT?

Steps to analyze a RCT?

Steps to analyze a RCT?

Potential issues to have in mind

Potential issues to have in mind

Potential issues to have in mind

Potential issues to have in mind

Potential issues to have in mind

Potential issues to have in mind

Potential issues to have in mind

External vs Internal Validity

External vs Internal Validity

SUTVA: No interference

SUTVA: No interference

SUTVA: No interference

Network effects (spillover) example

Network effects (spillover) example

Network effects (spillover) example

Network effects (spillover) example

Network effects

Network effects

Network effects

General Equilibrium Effects

General Equilibrium Effects

Get Out The Vote

Checking for balance

Estimating the effect

Estimating the effect

Estimating the effect

Controlling by Confounders

Controlling by Confounders

Controlling by Confounders

Controlling by Confounders

Matching

Matching

Matching

Matching

Matching

Propensity Score Matching

Propensity Score Matching

Propensity Score Matching

Omitted Variable Bias

Omitted Variable Bias

Omitted Variable Bias

Omitted Variable Bias

Wrapping things up

Housekeeping

Help