Let's talk about ChatGPT.
Should be use as a complement of learning, not a substitute.
ChatGPT is mainly useful when you are able to check the accuracy of its answers.
You need to do your own work.
Let's talk about ChatGPT.
Should be use as a complement of learning, not a substitute.
ChatGPT is mainly useful when you are able to check the accuracy of its answers.
You need to do your own work.
No Office Hours this Thursday.
We talked about the Ignorability Assumption
Started discussing randomized controlled trials.
Why they are the gold standard.
How to analyze them.
Discuss about limitations of RCTs:
What is selection on observables?:
Omitted Variable Bias
Regression Adjustment
Matching
Limitations of RCTs
How?
How?
1) Check for balance
(Remember to transform categorical variables into binary ones!)
1) Check for balance
(Remember to transform categorical variables into binary ones!)
2) Estimate diff.
in means
(Simple regression
between Y
and Z
)
1) Check for balance
(Remember to transform categorical variables into binary ones!)
2) Estimate diff.
in means
(Simple regression
between Y
and Z
)
2)* Estimate diff. in means with covariates
(Multiple regression between Y
and Z
, adding other baseline covariates X
)
Generalizability of our estimated effects (External Validity)
Generalizability of our estimated effects (External Validity)
Generalizability of our estimated effects (External Validity)
Spillover effects
Generalizability of our estimated effects (External Validity)
Spillover effects
Generalizability of our estimated effects (External Validity)
Spillover effects
General equilibrium effects
Generalizability of our estimated effects (External Validity)
Spillover effects
General equilibrium effects
"The treatment applied to one unit does not affect the outcome for other units"
"The treatment applied to one unit does not affect the outcome for other units"
No spillovers
No general equilibrium effects
RCT where students where randomized into two groups:
RCT where students where randomized into two groups:
Estimate the effect of the intervention on attendance.
RCT where students where randomized into two groups:
Estimate the effect of the intervention on attendance.
Potential problem: Students usually skip school with a friend.
RCT where students where randomized into two groups:
Estimate the effect of the intervention on attendance.
Potential problem: Students usually skip school with a friend.
Why could this be a problem for causal inference?
Can we do something about this?
Can we do something about this?
Can we do something about this?
Randomize at a higher level (e.g. neighborhood, school, etc. instead of at the individual level)
Model the network!
Usually arise when you scale up a program or intervention.
Imagine you want to test the effect of providing information about employment and expected income to students to see whether it affect their choice of university and/or major.
Usually arise when you scale up a program or intervention.
Imagine you want to test the effect of providing information about employment and expected income to students to see whether it affect their choice of university and/or major.
What could happen if you offer it to everyone?
Let's see another example
"Get out the Vote" Large-Scale Mobilization experiment (Arceneaux, Gerber, and Green, 2006)
"Households containing one or two registered voters where randomly assigned to treatment or control groups"
Treatment: GOTV phone calls
Stratified RCT: Two states divided into competitive and noncompetitive (randomized within state-competitiveness)
Let's go to R
d_s1 %>% count(treat_real, contact)
## treat_real contact n## 1 0 0 17186## 2 1 0 1626## 3 1 1 1374
d_s1 %>% count(treat_real, contact)
## treat_real contact n## 1 0 0 17186## 2 1 0 1626## 3 1 1 1374
Does this break the ignorability assumption?
d_s1 %>% count(treat_real, contact)
## treat_real contact n## 1 0 0 17186## 2 1 0 1626## 3 1 1 1374
Does this break the ignorability assumption?
Non-compliance: When the treatment assignment (e.g. calling the household) is not the same as the treatment (e.g. actually receiving a call/ making contact with the household)
What was randomly assigned was calling the household.
Usually, the effect of calling should be lower than the effect of actually receiving the call.
Can we do something if we can't randomize??
Controlling by Confounders
We can control by a confounder by including it in our regression:
Conditional Independence Assumption (CIA)
We can control by a confounder by including it in our regression:
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."
We can control by a confounder by including it in our regression:
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."
But is there another way to control for confounders?
We can control by a confounder by including it in our regression:
Conditional Independence Assumption (CIA)
"Conditional on X, the ignorability assumption holds."
But is there another way to control for confounders?
Matching
Start with two groups: A treatment and a control group
For each unit in the treatment group, let's find a similar unit in the control group
And we do this for all units
Note that we might not be able to find similar units for everyone!
Then we just compare our matched groups
It is difficult (impossible) to match on all the variables we want (potential confounders)
It is difficult (impossible) to match on all the variables we want (potential confounders)
Propensity score: Probability of being in the treatment group given the individuals characteristics.
$$p = Pr(Z = 1) = \hat{\beta}_0 + \hat{\beta}_1X_1 + \hat{\beta}_2X_2 + ... + \hat{\beta}_kX_k$$
It is difficult (impossible) to match on all the variables we want (potential confounders)
Propensity score: Probability of being in the treatment group given the individuals characteristics.
$$p = Pr(Z = 1) = \hat{\beta}_0 + \hat{\beta}_1X_1 + \hat{\beta}_2X_2 + ... + \hat{\beta}_kX_k$$
E.g. Two units have a 50% chance of being treated, but one was actually treated (Z=1) and the other one was not (Z=0).
Don't need to calculate this by hand; we will use the MatchIt
package.
Let's go to R
If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.
If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.
If a potential confounder is in our data, then it's not a problem!
If we are under the presence of confounders, then our estimates will be biased (i.e. will not recover the true causal effect) unless we are able to control by them.
Omitted Variable Bias represents the bias that stems from not being able to observe a confounding variable.
If a potential confounder is in our data, then it's not a problem!
Our headache will come from unobserved confounders.
If the ignorability assumption doesn't hold, I can potentially control by all my confounders.
Unlikely to hold
Do we have other alternatives?
Let's talk about ChatGPT.
Should be use as a complement of learning, not a substitute.
ChatGPT is mainly useful when you are able to check the accuracy of its answers.
You need to do your own work.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |