MORIE — Multi-domain Open Research and Inferential Estimation¶
A dual-language (Python + R) multi-domain scientific computing toolkit for observational inference, with sociolegal, signal-processing, cryptographic, spatial-statistics, statistical-physics, and psychometrics modules. Hosts the MRM (Multilevel Reconciliation Methodology) framework as a primary application for Canadian carceral, police, and oversight data analysis.
Quick start¶
Pick any one channel — all install the same morie v0.4.12:
# 1. One-line installer (Linux / macOS / WSL) — detects pip + R, installs both
curl -fsSL https://hadesllm.github.io/morie/install.sh | bash
# 2. PyPI (any platform with Python ≥3.10)
pip install morie # 60+ built-in datasets
pip install "morie[interactive]" # + Terminal IDE (TUI)
pip install "morie[carbon]" # + CodeCarbon emissions (Python ≤3.14 only)
# 3. Homebrew (macOS / Linuxbrew)
brew tap hadesllm/morie
brew install morie
# 4. Docker (zero local dependencies)
docker run --rm ghcr.io/hadesllm/morie:0.4.12 morie --help
# 5. R package (CRAN-compatible, served from r-universe)
install.packages("morie", repos = "https://hadesllm.r-universe.dev")
Note
Heads-up for Raspberry Pi OS / Debian Trixie users: the system
/usr/bin/python3 is python 3.13.5, which segfaults on import for
several scientific wheels (a Debian-packaging bug, not a morie bug).
Work around it with uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.12
uv venv ~/.venvs/morie --python 3.12
uv pip install --python ~/.venvs/morie/bin/python morie
~/.venvs/morie/bin/morie --version
Run your first analysis in seconds:
# Launch the Terminal IDE (multi-pane IDE)
morie tui
# Self-diagnostics — checks LLM providers, datasets, R, Docker
morie doctor
# List all 60+ built-in datasets
morie list-datasets
# List all 23 analysis modules
morie list-modules
# Run a single module against built-in data
morie run-module power-design --output-dir /tmp/morie-outputs
# Run the full pipeline (with enlighten progress bars)
morie pipeline --all -y
# Start free AI chat (no API key needed)
morie chat
From R:
library(morie)
# Load built-in dataset (DBI/RSQLite — no file paths needed)
cpads <- morie_load_dataset("cpads_2021")
# List all 60+ built-in datasets
morie_list_datasets()
# Browse dataset catalog
morie_dataset_catalog()
# Estimate average treatment effect
ate <- estimate_ate(cpads, "outcome", "treatment", c("age", "sex"))
What MORIE does¶
A unified Python + R interface across the following surfaces. See Statistical Methods for methodology details and API Reference for function reference.
- Causal estimators
ATE, ATT, ATC, GATE, CATE (T- / S-learner), LATE (2SLS / Wald), AIPW, IPW (Hájek), G-computation, propensity-score matching (1:1 NN, 5-strata subclass), Rosenbaum sensitivity bounds, E-value.
- Double machine learning
Partially linear regression (PLR), interactive regression model (IRM), partially linear IV (PLIV); cross-fitted with pluggable nuisance learners. Multi-SE comparison (pooled, cluster, multi-way) on the IRM-DML primary estimate. Propensity calibration (Platt / isotonic) on IPW / AIPW / SuperLearner-AIPW with Brier score.
- The MRM framework
Multilevel Reconciliation Methodology — a 10-estimator framework applied to OTIS / SIU / TPS data over a coordinated set of (treatment, outcome, covariates) designs. Per-row individual-level + aggregate (Poisson, NB GLM) modes. Mandela classifier (UN Mandela Rules 43 + 44) + provincial-vs-federal cross-comparison.
- Spatial statistics
Moran’s \(I\), Geary’s \(C\), Getis-Ord general \(G\), join count, LISA, Getis-Ord \(G_i^{*}\), local Geary, Ripley’s K / L, geostatistical kriging (ordinary, universal, IDW, co-kriging), variogram fitting, GWR (basic, GW-PCA, ST-GWR), bivariate Moran, Moran sweep heatmap, DBSCAN / HDBSCAN, Kulldorff space-time scan.
- Hawkes self-exciting point processes
Markovian Mohler-Bertozzi-Brantingham fit (exponential kernel + constant baseline) plus the non-stationary, non-Markovian Kwan-Chen-Dunsmuir (2024) family — Gamma, Weibull, Lomax kernels with sinusoidal baselines. Eight (kernel × baseline) combinations ranked by AIC and time-rescaling Kolmogorov-Smirnov goodness-of-fit.
- Statistical physics of crime
Short-Brantingham reaction-diffusion PDE, Brockmann-Hufnagel-Geisel Lévy-flight tail (Hill estimator), Bettencourt urban-scaling exponent (HC3-OLS), D’Orsogna-Perc Lotka-Volterra predator-prey, SDB Turing-pattern demo, Helbing-Szolnoki inspection-game phase diagram, criminal-role co-occurrence networks.
- Survey-weighted inference
Horvitz-Thompson totals, Hájek means, ratio estimators, calibration weights (raking / IPF), complex-survey GLM, subpopulation estimates, stratified / cluster / PPS sampling, bootstrap + jackknife variance, design-effect computations, effective-sample-size diagnostics.
- Psychometrics
Cronbach’s \(\alpha\), McDonald’s \(\omega_t\) / \(\omega_h\), KMO sampling adequacy, Bartlett’s sphericity, parallel analysis, composite reliability, AVE, item-response-theory fits (1PL / 2PL / 3PL / GRM / PCM), differential item functioning (Mantel-Haenszel, logistic, generalised), measurement invariance, network psychometrics, Bayesian psychometrics. 250+ functions.
- Signal processing + cryptography
Spectral analysis, biomedical-signal helpers, homomorphic deconvolution, classical and modern crypto primitives (ChaCha20-Poly1305, etc.), TurboQuant vector quantization with near-optimal distortion (Zandieh et al. 2026 ICLR).
- Datasets
60+ built-in datasets in a portable SQLite layer (Canadian carceral, police, and oversight + epidemiological reference data). Auto dataset-profiling for arbitrary tabular input (
morie.dataset.profile_dataset).- Function namespace ``morie.fn``
36,000+ individual function files indexed by a registry, exposing short stable names for every estimator, every kernel, every weight matrix, every test. Use
morie.fn.cheatsheet(name)for a per-function help card.- Federal SIU + Doob T-539-20 replication
Mandela classifier (Rules 43 + 44) with χ² verification, Sprott / Doob (± Iftene) IEDM analyses, full replication of Doob’s CCRSO 2018 Tables 1–3 and the imprisonment-vs-crime decoupling Pettitt change-point test. See Sprott-Doob CRIMSL + Schulich Law SIU analyses, Doob T-539-20 Federal Court affidavit replication, SIU IAP — Federal Structured Intervention Unit Implementation Advisory Panel.
- Toronto Police Service surface
morie.tps_*modules: incident I/O, CSI, neighbourhood spatial / temporal analyses, Hawkes (basic + advanced), statistical physics, Hohl-style choropleths and proportional-symbol district maps. Companion paper at 10.5281/zenodo.20102198.- LLM + assistant
Ollama (local, private) → vendored OllamaFreeAPI (no key) → Gemini free tier → local-keyword fallback. Zero cloud dependency at the default tier. Vendored TurboQuant KV-cache compression. Polyglot REPL bridges variables across Python ↔ R ↔ shell ↔ 12 other languages.
- Carbon-aware computing
Built-in pure-Python emissions tracker (
morie.emissions) with 213-country IEA carbon-intensity data, per-module and pipeline-wide CO₂ accounting. CodeCarbon fallback on Python ≤ 3.14.
Key design principles¶
- Lean terminal IDE.
Rich terminal output — progress bars, formatted ASCII tables, color-coded diagnostics. Run entire pipelines from a single
moriecommand.- Python + R parity.
Every statistical estimator is implemented in both languages with matching APIs. Python uses scikit-learn conventions (
fit/predict). R uses S3 generics (summary(),plot(),predict()).- Automated documentation.
Python API docs via Sphinx autodoc. R API docs via Roxygen2 →
.. r:function::(no manual writing). Rundevtools::document()to regenerate.- Data governance built-in.
Raw CPADS microdata lives in
data/datasets/. Wrangled cache indata/cache/. Synthetic data (generate_synthetic_data()) is labeled synthetic in all outputs.morie verify(planned) will validate manifest output provenance.- Statistically rigorous.
Target estimand is always an explicit parameter (ATE vs ATT vs CATE — never implicit). Overlap/positivity violations raise explicit warnings. Cross-fitting prevents data leakage. Convergence diagnostics are built into MCMC outputs.
Background¶
MORIE is a multi-domain scientific computing toolkit for observational inference. It sits between one-off research scripts and heavy enterprise analytics platforms, and is aimed at researchers who need:
A unified Python + R surface across the same estimators (no language-choice tax).
Causal estimators (ATE / ATT / ATC / GATE / CATE / LATE, AIPW, G-computation, DML–PLR, DML–IRM, propensity-score matching, E-value and Rosenbaum-bound sensitivity) with explicit estimands.
Survey-weighted inference (Horvitz-Thompson, Hájek, raking, cluster + stratified design) on top of the same DataFrame as the causal layer.
Spatial statistics (Moran’s \(I\), LISA, Getis-Ord \(G^{*}\), DBSCAN, Kulldorff space-time scan), Hawkes self-exciting point processes (Markovian and non-Markovian), and the statistical-physics-of-crime models (Short-Brantingham reaction-diffusion, Lévy-flight tail, Bettencourt urban scaling, Lotka-Volterra) — applied as first-class methods on the Toronto Police Service open-data feeds.
Reproducible pipelines that run unattended in CI / CD — outputs carry provenance manifests; synthetic data is labelled as such.
The MRM (Multilevel Reconciliation Methodology) framework as a primary application for Canadian carceral, police, and oversight data (Ontario OTIS, federal SIU, TPS).
The package ships 60+ built-in datasets (Canadian carceral, police, and oversight + epidemiological reference data) in a portable SQLite layer.
MORIE is licensed under GPL-2.0-only (Linus copyleft, deliberately
chosen over GPL-3.0 for compatibility with the broader
Linux-kernel-style ecosystem). See LICENSE for the full text and
LICENSING.md for the rationale.
Documentation index¶
If you prefer a single linear walkthrough rather than the sidebar navigation, every page on this site is listed below — top to bottom:
From Zero — Start Here If You’ve Never Done This Before — From-zero tutorial track. Start here if you have never opened a Python or R console before.
Installation — Installation instructions for Python, R, macOS, Linux, Windows, plus LLM provider setup.
CLI Reference — Reference for every
morie …subcommand.Statistical Methods — Statistical-methods reference. Estimands, causal estimators, survey statistics, spatial methods, Hawkes processes, statistical physics of crime, OTIS / TPS / SIU pipelines, the MRM framework, key empirical findings.
API Reference — Python and R API reference (function signatures and docstrings).
Contributing — Development setup, test conventions, module-addition guide.