class: center, middle, inverse, title-slide .title[ # STA 235H - Potential Outcomes ] .subtitle[ ## Fall 2023 ] .author[ ### McCombs School of Business, UT Austin ] --- <!-- <script type="text/javascript"> --> <!-- MathJax.Hub.Config({ --> <!-- "HTML-CSS": { --> <!-- preferredFont: null, --> <!-- webFont: "Neo-Euler" --> <!-- } --> <!-- }); --> <!-- </script> --> <style type="text/css"> .small .remark-code { /*Change made here*/ font-size: 80% !important; } .tiny .remark-code { /*Change made here*/ font-size: 90% !important; } </style> <br> <br> .box-2Trans[How? Potential Outcomes Framework] .box-4Trans[What? Causal Estimands] .box-7Trans[Why? Causal Questions and Study Design] --- background-position: 50% 50% class: left, bottom, inverse .big[ The "How": Potential outcomes framework ] --- .center2[![:scale 100%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/sleep_good.png)] --- .center2[.box-3LA[What do you think are the biggest issues here?]] --- <br> .pull-left[ .center[ ![:scale 70%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/coffee_good.png)] ] -- .pull-right[ .center[ ![:scale 100%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/coffee_bad.png)] ] --- # Before we start... .center[.box-2LA[Be clear about your language]] <br> .center[.box-4LA[Be clear about your data]] <br> .center[.box-7LA[Be clear about your assumptions]] --- # What is Causal Inference? .box-5LA[Inferring the effect of one thing on another thing] -- - "My headache went away because I took an aspirin". -- - "The new marketing campaign increased our sales by 20%" -- - "Providing students support when filling out FAFSA forms improves college access and completion." --- # A world of potential (outcomes) - Under a binary treatment or intervention, there are **.darkorange[two potential worlds]**: .pull-left[ - **.darkorange[World 1]**: You take the pill - **.darkorange[World 2]**: You don't take the pill ] .pull-right[ ![](https://media.giphy.com/media/SCt3Miv6ugvSg/source.gif) ] --- # A world of potential (outcomes) - A **.darkorange[potential outcome]** is the outcome under each of these scenarios or "worlds". - *There will be one for each path!* -- - A priori, each of these scenarios has a *potential outcome* - A posteriori, I can only observe **.darkorange[at most one of the potential outcomes]** -- .box-3LA[Fundamental Problem of Causal Inference] --- .center2[ .box-6LA[What are the potential outcomes for our previous example?] ] --- # Potential Outcomes Examples - "My headache went away because I took an aspirin". -- .box-3trans[*Headache status if I take an aspirin/ Headache status if I don't take an aspirin*] -- - "The new marketing campaign increased our sales by 20%" -- - "Providing students support when filling out FAFSA forms improves college access and completion." --- # Let's see a specific example - You work at a retail company and you are debating on whether to send out an **.darkorange[email campaign]** to boost your sales: -- - You are interested in **.darkorange[two specific outcomes]**: .pull-left[ .center[**.darkorange[Sales]**: Whether a customer makes a purchase or not.] .center[ ![:scale 70%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/sales_po.png)] ] .pull-right[ .center[**.darkorange[Churn]**: Whether a customer unsubscribes for your mailing list or not.] .center[ ![:scale 70%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/churn_po.png)] ] --- # Potential Outcomes Framework Let's introduce some notation: - Let `\(Y_i\)` be the observed outcome for unit `\(i\)` (e.g. whether a person makes a purchase or not). - Let `\(Z_i\)` be the treatment or intervention (e.g. receiving a promotional email (1) or not (0)). - Let `\(Y_i(z)\)` be the potential outcome under treatment `\(Z = z\)`. (e.g. whether the person would make a purchase or not *if* they received treatment z). -- Then, **.darkorange[if a person is *treated*]**, `\(Z_i = 1\)`, then their *observed outcome* `\(Y_i\)` will be the same as their *potential outcome under treatment*, `\(Y_i(1)\)` $$Y_i | (Z_i = 1) \overset{\Delta}{=} Y_i(1) $$ -- In the same fashion, **.darkorange[if a person is not *treated*]**, `\(Z_i = 0\)`, then their *observed outcome* `\(Y_i\)` will be the same as their *potential outcome under control*, `\(Y_i(0)\)` $$Y_i | (Z_i = 0) \overset{\Delta}{=} Y_i(0) $$ --- # Potential Outcomes Framework This means that we can write the observed outcome as a function of the *potential outcomes*: .box-5LA[$$\rightarrow Y_i = Z_i\cdot Y_i(1) + (1-Z_i)\cdot Y_i(0)$$] - This definition will be useful because we can see this as a **.darkorange[missing data problem]**. --- # Causal Effects .box-4tL[Individual Causal Effect] `$$ICE_i = Y_i(1) - Y_i(0)$$` --- # Causal Effects .box-4tL[Individual Causal Effect] `$$ICE_i = Y_i(1) - Y_i(0)$$` <br> .box-3LA[Can we ever observe individual causal effects?] --- # Causal Effects .box-4tL[Individual Causal Effect] `$$ICE_i = Y_i(1) - Y_i(0)$$` <br> .box-3LA[Can we ever observe individual causal effects?] .box-7LA[No!*] --- # Only one realization .pull-left[ .box-2[Z=1] .center[ ![:scale 60%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/mail_purchase.png)] ] .pull-right[ .box-2[Z=0] .center[ ![:scale 60%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/nomail_nopurchase.png)] ] --- background-position: 50% 50% class: left, bottom, inverse .big[ The "What": Causal estimands, estimates, and estimators ] --- # Estimands vs Estimates vs Estimators .pull-left-little_r[ .box-2[Estimand] .box-2trans[A quantity we want to estimate] ] .pull-right-little_l[ .box-7[Estimate] .box-7trans[The result of an estimation] ] .center[ .box-4[Estimator] <span class="box-4trans">A rule for calculating<br>an estimate based on data</span> ] --- # Estimands vs Estimates vs Estimators .pull-left-little_r[ .box-2[Estimand] .box-2trans[A quantity we want to estimate] .box-2trans[E.g.: Population mean] .box-2trans[$$\mu$$] ] .pull-right-little_l[ .box-7[Estimate] .box-7trans[The result of an estimation] .box-7trans[E.g.: Result of the sample mean<br>for a given sample *S*] .box-7trans[$$\hat{\mu}$$] ] .center[ .box-4[Estimator] <span class="box-4trans">A rule for calculating<br>an estimate based on data</span> .box-4trans[E.g.: Sample mean] .box-4trans[$$\frac{1}{n}\sum_i Y_i$$] ] --- # Estimands vs Estimates vs Estimators .center[ ![:scale 35%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/estimands.jpg) ] .source[Source: [Deng, 2022](http://onbiostatistics.blogspot.com/2022/04/estimands-estimator-estimate-and.html)] --- # Estimands vs Estimates vs Estimators - Some important **.darkorange[estimands]** that we need to keep in mind: .box-3trans[Average Treatment Effect (ATE)] .box-5trans[Average Treatment Effect on the Treated (ATT)] .box-7trans[Conditional Average Treatment Effect (CATE)] --- # Estimands vs Estimates vs Estimators - Some important **.darkorange[estimands]** that we need to keep in mind: .box-3trans[ATE: E.g. Average Treatment Effect for all customers] .box-5trans[ATT: E.g. Average Treatment Effect for customers that received the email] .box-7trans[CATE: E.g. Average Treatmenf Effect for customer under 25 years old] --- # Estimands vs Estimates vs Estimators - Some important **.darkorange[estimands]** that we need to keep in mind: .box-3trans[$$ATE = E[Y(1)- Y(0)]$$] .box-5trans[$$ATT = E[Y(1)- Y(0)| Z=1]$$] .box-7trans[$$CATE = E[Y(1)- Y(0)| X]$$] --- # Getting around the fundamental problem of causal inference - Let's go back to our original example: **.darkorange[Does an email campaign increase sales?]** .small[
] --- # Getting around the fundamental problem of causal inference - We have a **.darkorange[missing data problem]** .small[
] --- # Getting around the fundamental problem of causal inference - Compare those who **.darkorange[received the email]** to the ones **.darkorange[did not received the email.]** .small[
] --- # Getting around the fundamental problem of causal inference - Compare those who **.darkorange[received the email]** to the ones **.darkorange[did not received the email.]** .small[
] --- # Getting around the fundamental problem of causal inference - Compare those who **.darkorange[received the email]** to the ones **.darkorange[did not received the email.]** .small[
] `$$\hat{\tau} = \frac{1}{3}\sum_{i \in Z=1}Y_i - \frac{1}{3}\sum_{i \in Z=0}Y_i = 0.333$$` --- # Getting around the fundamental problem of causal inference I we had more data, we could do the same with a **.darkorange[simple regression]**: `$$Purchase = \beta_0 + \beta_1Email + \varepsilon$$` -- Imagine you get the following results: `$$Purchase = 0.4 + 0.33Email + \varepsilon$$` - Interpret the coefficient for *Email*: --- <br> <br> <br> <br> <br> <br> <br> .box-7Trans[What could be the problem with comparing the sample means?] --- .center2[ .box-2LA[Let's do a little exercise] ] --- <br> .box-5Trans[Look at your **.darkorange[green]** piece of paper and go to the following website] .center[ [![:scale 20%](https://github.com/maibennett/sta235/raw/main/exampleSite/content/Classes/Week4/2_PotentialOutcomes/images/qr_code_selection.png)](https://utexas.qualtrics.com/jfe/form/SV_cBDSevUtofbRAJo) ] .center[https://sta235h.click/week4] .box-5trans[Would you go to a physician/urgent care?] --- background-position: 50% 50% class: left, bottom, inverse .big[ The "Why": Causal questions and study designs ] --- # Under what assumptions is our estimate causal? We are using: `$$\hat{\tau} = \frac{1}{3}\sum_{i \in Z=1}Y_i - \frac{1}{3}\sum_{i \in Z=0}Y_i)$$` to estimate: `$$\tau = E[Y_i(1) - Y_i(0)]$$` --- # Under what assumptions is our estimate causal? We are using: `$$\hat{\tau} = \frac{1}{3}\sum_{i \in Z=1}Y_i - \frac{1}{3}\sum_{i \in Z=0}Y_i)$$` to estimate: `$$\tau = E[Y_i(1) - Y_i(0)]$$` .box-7LA[Let's do some math] --- # Under what assumptions is our estimate causal? `$$\tau = E[Y_i(1) - Y_i(0)]$$` `$$= E[Y_i(1)] - E[Y_i(0)]$$` -- **.darkorange[Key assumption]**: .box-3[Ignorability] Ignorability means that the potential outcomes `\(Y(0)\)` and `\(Y(1)\)` are independent of the treatment, e.g. `\((Y(0), Y(1)) \perp\!\!\!\perp Z\)`. `$$E[Y_i(1)| Z = 0] = E[Y_i(1)| Z = 1] = E[Y_i(1)]$$` .center[and] `$$E[Y_i(0)| Z = 0] = E[Y_i(0)| Z = 1] = E[Y_i(0)]$$` --- # Under what assumptions is our estimate causal? `$$\tau = E[Y_i(1) - Y_i(0)]$$` `$$= E[Y_i(1)] - E[Y_i(0)]$$` - Under ignorability (see previous slide), `\(E[Y_i(1)] = E[Y_i(1) | Z = 1] = E[Y_i | Z = 1]\)` and `\(E[Y_i(0)] = E[Y_i(0) | Z = 0] = E[Y_i | Z = 0]\)`, then: `$$\tau = E[Y_i(1)] - E[Y_i(0)] = \color{#900DA4}{\underbrace{E[Y_i(1)| Z=1]}_\text{Obs. Outcome for T}} - \color{#F89441}{\overbrace{E[Y_i(0)| Z=0]}^\text{Obs. Outcome for C}}$$` --- # Ignorability Assumption We can just "ignore" the missing data problem: .small[
] --- # Ignorability Assumption We can just "ignore" the missing data problem: .small[
] --- # Ignorability Assumption We can just "ignore" the missing data problem: .small[
] --- # Main takeaway points .box-3LA[Causal Inference is hard] -- - Think about the **.darkorange[causal problem]** -- - Check **.darkorange[validity]** of assumptions (*Is ignorability plausible? Am I controlling for the right covariates?*) -- - Most of this chapter will be spent on looking for **.darkorange[exogeneous variation]** to make the ignorability assumption happen. --- # Next week .pull-left[ - **.darkorange[Randomized Controlled Trials]**: - Pros and Cons - Concept of validity - A/B Testing ] .pull-right[ ![:scale 90%](https://media.giphy.com/media/SiGg4zSmwmbafTYwpj/giphy.gif) ] --- # References - Angrist, J. & S. Pischke. (2015). "Mastering Metrics". *Chapter 1*. - Cunningham, S. (2021). ["Causal Inference: The Mixtape". *Chapter 4: Potential Outcomes Causal Model*](https://mixtape.scunning.com/ch3.html). - Neil, B. (2020). "Introduction to Causal Inference". *Fall 2020 Course* <!-- pagedown::chrome_print('C:/Users/mc72574/Dropbox/Hugo/Sites/sta235/exampleSite/content/Classes/Week4/2_PotentialOutcomes/f2023_sta235h_6_PotentialOutcomesv2.html') -->