Causal Inference¶
Part of Statistical Methods — MORIE’s statistical-methods reference.
MORIE implements a layered causal-inference workflow. For the full estimand taxonomy see Causal Estimands; for the broader catalogue of methods (causal, survey, spatial, Hawkes, statistical physics, psychometrics, etc.) see Statistical Methods.
Potential outcomes framework¶
Let \(Y_i(1)\) and \(Y_i(0)\) be the potential outcomes for unit \(i\) under treatment and control respectively. The Average Treatment Effect is
Under the standard identification assumptions:
SUTVA — no interference between units and a single version of treatment.
Positivity — \(0 < P(T=1 \mid X) < 1\) for all covariate values.
Unconfoundedness — \(Y(0), Y(1) \perp T \mid X\).
Inverse Probability Weighting (IPW) — Hájek estimator¶
Given propensity scores \(\hat{e}(X_i) = P(T_i = 1 \mid X_i)\), MORIE uses stabilised (Hájek) weights:
The Hájek ATE estimator is
The effective sample size (ESS) is reported as a weight-quality diagnostic:
Python entry points: morie.causal.run_propensity_ipw_analysis(),
morie.causal.estimate_ate()
Average Treatment Effect on the Treated (ATT)¶
The ATT weights control units by their odds of treatment to match the treated covariate distribution:
Python entry point: morie.causal.estimate_att()
Average Treatment Effect on the Controls (ATC)¶
The ATC reweights treated units to match the control covariate distribution:
Python entry point: morie.causal.estimate_atc()
Augmented IPW (AIPW) — Doubly Robust¶
The AIPW estimator adds an outcome-model correction to the IPW influence function. It is consistent if either the propensity model or the outcome models are correctly specified.
The per-unit influence score is
The ATE estimate and its standard error are
Python entry point: morie.causal.estimate_aipw()
G-computation (Outcome Regression)¶
G-computation directly standardises the outcome by integrating over the covariate distribution:
where \(\hat{\mu}(t, X)\) is the predicted outcome from a regression model fit on the full sample. Unlike IPW, G-computation is singly robust (requires correct outcome model specification).
Python entry point: morie.effects.estimate_ate_gcomputation()
eBAC-selection-adjusted IPW¶
The ebac-selection-adjustment-ipw module extends the IPW framework to
account for selection on eBAC (estimated Blood Alcohol Concentration) strata.
Weights are constructed within eBAC-defined subpopulations and then combined.
Python entry point: morie.causal.run_ebac_selection_ipw_analysis()
Sensitivity Analysis¶
E-value (VanderWeele & Ding 2017)¶
The E-value is the minimum strength of unmeasured confounding on the risk-ratio scale needed to fully explain away the observed effect:
For the 95% CI lower bound, apply the same formula to the CI endpoint. An observed RR of 3.9 yields \(E \approx 7.3\).
Python entry point: morie.effects.e_value()
Rosenbaum Bounds¶
For \(\Gamma \geq 1\), Rosenbaum’s sensitivity analysis asks whether the p-value remains below \(\alpha\) when treatment assignment odds differ by at most \(\Gamma\) between matched units. Increasing \(\Gamma\) until the bound exceeds \(\alpha\) measures robustness to unmeasured confounding.
Python entry point: morie.effects.sensitivity_rosenbaum()
Average Treatment Effect on the Treated (ATT)¶
The ATT targets the effect of treatment among those who actually received it:
Under unconfoundedness, the Hajek IPW estimator for the ATT assigns weight 1 to treated units and weight \(\hat{e}(X)/(1 - \hat{e}(X))\) to controls:
Python entry point: morie.causal.estimate_att()
Average Treatment Effect on the Controls (ATC)¶
The ATC targets the effect among controls — what would happen if untreated units had been treated:
Treated units are re-weighted by \((1-\hat{e}(X))/\hat{e}(X)\) and controls retain weight 1.
Python entry point: morie.causal.estimate_atc()
Group Average Treatment Effect (GATE)¶
The GATE partitions units by a categorical variable \(G\) and estimates the ATE within each group:
MORIE estimates GATEs using the AIPW doubly-robust estimator applied within each stratum defined by group_col. This provides effect heterogeneity across pre-defined subpopulations (e.g., age groups, provinces).
Python entry point: morie.causal.estimate_gate()
Conditional Average Treatment Effect (CATE)¶
The CATE provides a per-unit treatment effect estimate:
MORIE implements two metalearner strategies:
T-learner: fit separate outcome models \(\hat{\mu}_1(x)\) and \(\hat{\mu}_0(x)\) on treated and control units respectively, then \(\hat{\tau}(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)\).
S-learner: fit a single outcome model with treatment as a feature, then compute \(\hat{\tau}(x) = \hat{\mu}(x, 1) - \hat{\mu}(x, 0)\).
Both use Random Forest nuisance learners by default.
Python entry point: morie.causal.estimate_cate()
Local Average Treatment Effect (LATE / IV)¶
When treatment is endogenous and an instrument \(Z\) is available, the LATE identifies the effect among compliers (units whose treatment status changes in response to the instrument):
This is the Wald estimator for binary instruments. With covariates, MORIE
uses two-stage least squares (2SLS) via linearmodels or statsmodels.
The first-stage F-statistic is reported as a weak-instrument diagnostic. The conventional threshold is \(F > 10\) (Staiger & Stock, 1997).
Python entry point: morie.causal.estimate_late()
Interactive Regression Model (IRM)¶
The IRM extends the partially linear model by allowing treatment effect heterogeneity in the outcome regression:
The Neyman-orthogonal score for the ATE under the IRM is:
MORIE uses doubleml.DoubleMLIRM with Random Forest nuisance learners
and cross-fitting for honest inference.
Python entry point: morie.causal.estimate_irm()
References¶
Hernán MA, Robins JM (2020). Causal Inference: What If. Chapman & Hall/CRC.
Lunceford JK, Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects. Statistics in Medicine, 23(19):2937–2960. https://doi.org/10.1002/sim.1903
Robins JM, Rotnitzky A, Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed. JASA, 89(427):846–866.
VanderWeele TJ, Ding P (2017). Sensitivity analysis in observational research: introducing the E-value. Annals of Internal Medicine, 167(4):268–274. https://doi.org/10.7326/M16-2607
Rosenbaum PR (2002). Observational Studies (2nd ed.). Springer.
Imbens GW, Angrist JD (1994). Identification and estimation of local average treatment effects. Econometrica, 62(2):467–475.
Imbens GW (2004). Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and Statistics, 86(1):4–29.
Kunzel SR, Sekhon JS, Bickel PJ, Yu B (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. PNAS, 116(10):4156–4165.
Chernozhukov V et al. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68.
Staiger D, Stock JH (1997). Instrumental variables regression with weak instruments. Econometrica, 65(3):557–586.