Propensity Scores

Part of Statistical Methods — MORIE’s statistical-methods reference.

The propensity score \(e(X) = P(T=1 \mid X)\) summarizes confounding information into a single scalar, allowing balancing without direct covariate matching (Rosenbaum & Rubin 1983).

Estimation

MORIE estimates propensity scores via logistic regression (default) or random forest, depending on the module configuration.

Logistic regression:

\[\log \frac{e(X_i)}{1 - e(X_i)} = \beta_0 + \beta^\top X_i\]

Implemented in morie.causal.compute_propensity_scores() using sklearn.linear_model.LogisticRegression with max_iter=1000.

Diagnostics

After propensity estimation:

  1. Overlap check — histogram of \(\hat{e}(X)\) by treatment group. Extreme values near 0 or 1 indicate potential positivity violations.

  2. Effective Sample Size (ESS) — see Causal Inference.

  3. Covariate balance — standardized mean differences before and after weighting should be \(< 0.1\) for all covariates.

CPADS covariates

The default covariate set for the propensity-scores module is drawn from CPADS_REQUIRED_VARIABLES:

  • age_group

  • gender

  • province_region

  • mental_health

  • physical_health

  • alcohol_past12m

Treatment: cannabis_any_use Outcome: heavy_drinking_30d or ebac_tot

References