Propensity Scores
=================

Part of :doc:`index` — MORIE's statistical-methods reference.

The propensity score :math:`e(X) = P(T=1 \mid X)` summarizes confounding
information into a single scalar, allowing balancing without direct covariate
matching (Rosenbaum & Rubin 1983).

Estimation
----------

MORIE estimates propensity scores via logistic regression (default) or random
forest, depending on the module configuration.

**Logistic regression**:

.. math::

   \log \frac{e(X_i)}{1 - e(X_i)} = \beta_0 + \beta^\top X_i

Implemented in :func:`morie.causal.compute_propensity_scores` using
:class:`sklearn.linear_model.LogisticRegression` with ``max_iter=1000``.

Diagnostics
-----------

After propensity estimation:

1. **Overlap check** — histogram of :math:`\hat{e}(X)` by treatment group.
   Extreme values near 0 or 1 indicate potential positivity violations.
2. **Effective Sample Size (ESS)** — see :doc:`causal`.
3. **Covariate balance** — standardized mean differences before and after
   weighting should be :math:`< 0.1` for all covariates.

CPADS covariates
----------------

The default covariate set for the ``propensity-scores`` module is drawn from
``CPADS_REQUIRED_VARIABLES``:

- ``age_group``
- ``gender``
- ``province_region``
- ``mental_health``
- ``physical_health``
- ``alcohol_past12m``

Treatment: ``cannabis_any_use``
Outcome: ``heavy_drinking_30d`` or ``ebac_tot``

References
----------

- Rosenbaum PR, Rubin DB (1983). The central role of the propensity score in
  observational studies for causal effects.
  *Biometrika*, 70(1):41–55.
  https://doi.org/10.1093/biomet/70.1.41
- Austin PC (2011). An introduction to propensity score methods for reducing
  the effects of confounding in observational studies.
  *Multivariate Behavioral Research*, 46(3):399–424.
  https://doi.org/10.1080/00273171.2011.568786