MORIE — Multi-domain Open Research and Inferential Estimation
=============================================================

.. image:: https://img.shields.io/badge/license-GPL--2.0-d97706.svg
   :alt: License: GPL-2.0

.. image:: https://img.shields.io/badge/python-3.10%2B-blue.svg
   :alt: Python 3.10+

.. image:: https://img.shields.io/badge/R-4.3%2B-276DC3.svg
   :alt: R 4.3+

.. image:: https://img.shields.io/pypi/v/morie.svg
   :target: https://pypi.org/project/morie/
   :alt: PyPI version

.. image:: https://img.shields.io/badge/r--universe-hadesllm-276DC3
   :target: https://hadesllm.r-universe.dev/morie
   :alt: r-universe

.. image:: https://img.shields.io/badge/DOI%20%C2%B7%20morie%20R-10.5281%2Fzenodo.20111233-0d9488?logo=zenodo&logoColor=white
   :target: https://doi.org/10.5281/zenodo.20111233
   :alt: DOI - morie R - 10.5281/zenodo.20111233

.. image:: https://img.shields.io/badge/DOI%20%C2%B7%20morie%20Python-10.5281%2Fzenodo.20096350-7c3aed?logo=zenodo&logoColor=white
   :target: https://doi.org/10.5281/zenodo.20096350
   :alt: DOI - morie Python - 10.5281/zenodo.20096350

.. image:: https://img.shields.io/badge/MRM_paper-10.5281%2Fzenodo.20096075-15803d?logo=zenodo&logoColor=white
   :target: https://doi.org/10.5281/zenodo.20096075
   :alt: MRM paper - 10.5281/zenodo.20096075

.. image:: https://img.shields.io/badge/Hawkes_paper-10.5281%2Fzenodo.20102198-be123c?logo=zenodo&logoColor=white
   :target: https://doi.org/10.5281/zenodo.20102198
   :alt: Hawkes paper - 10.5281/zenodo.20102198

A dual-language (Python + R) multi-domain scientific computing toolkit for
observational inference, with sociolegal, signal-processing, cryptographic,
spatial-statistics, statistical-physics, and psychometrics modules. Hosts
the MRM (Multilevel Reconciliation Methodology) framework as a primary application
for Canadian carceral, police, and oversight data analysis.

----

Quick start
-----------

Pick any one channel — all install the **same** ``morie`` v0.4.12:

.. code-block:: bash

   # 1. One-line installer (Linux / macOS / WSL) — detects pip + R, installs both
   curl -fsSL https://hadesllm.github.io/morie/install.sh | bash

   # 2. PyPI (any platform with Python ≥3.10)
   pip install morie                  # 60+ built-in datasets
   pip install "morie[interactive]"   # + Terminal IDE (TUI)
   pip install "morie[carbon]"        # + CodeCarbon emissions (Python ≤3.14 only)

   # 3. Homebrew (macOS / Linuxbrew)
   brew tap hadesllm/morie
   brew install morie

   # 4. Docker (zero local dependencies)
   docker run --rm ghcr.io/hadesllm/morie:0.4.12 morie --help

   # 5. R package (CRAN-compatible, served from r-universe)
   install.packages("morie", repos = "https://hadesllm.r-universe.dev")

.. note::

   **Heads-up for Raspberry Pi OS / Debian Trixie users:** the system
   ``/usr/bin/python3`` is python 3.13.5, which segfaults on import for
   several scientific wheels (a Debian-packaging bug, not a morie bug).
   Work around it with ``uv``:

   .. code-block:: bash

      curl -LsSf https://astral.sh/uv/install.sh | sh
      uv python install 3.12
      uv venv ~/.venvs/morie --python 3.12
      uv pip install --python ~/.venvs/morie/bin/python morie
      ~/.venvs/morie/bin/morie --version

Run your first analysis in seconds:

.. code-block:: bash

   # Launch the Terminal IDE (multi-pane IDE)
   morie tui

   # Self-diagnostics — checks LLM providers, datasets, R, Docker
   morie doctor

   # List all 60+ built-in datasets
   morie list-datasets

   # List all 23 analysis modules
   morie list-modules

   # Run a single module against built-in data
   morie run-module power-design --output-dir /tmp/morie-outputs

   # Run the full pipeline (with enlighten progress bars)
   morie pipeline --all -y

   # Start free AI chat (no API key needed)
   morie chat

From R:

.. code-block:: r

   library(morie)

   # Load built-in dataset (DBI/RSQLite — no file paths needed)
   cpads <- morie_load_dataset("cpads_2021")

   # List all 60+ built-in datasets
   morie_list_datasets()

   # Browse dataset catalog
   morie_dataset_catalog()

   # Estimate average treatment effect
   ate <- estimate_ate(cpads, "outcome", "treatment", c("age", "sex"))

----

What MORIE does
-----------------

A unified Python + R interface across the following surfaces. See
:doc:`methods/index` for methodology details and :doc:`api/index`
for function reference.

**Causal estimators**
  ATE, ATT, ATC, GATE, CATE (T- / S-learner), LATE (2SLS / Wald),
  AIPW, IPW (Hájek), G-computation, propensity-score matching
  (1:1 NN, 5-strata subclass), Rosenbaum sensitivity bounds, E-value.

**Double machine learning**
  Partially linear regression (PLR), interactive regression model
  (IRM), partially linear IV (PLIV); cross-fitted with pluggable
  nuisance learners. Multi-SE comparison (pooled, cluster, multi-way)
  on the IRM-DML primary estimate. Propensity calibration (Platt /
  isotonic) on IPW / AIPW / SuperLearner-AIPW with Brier score.

**The MRM framework**
  Multilevel Reconciliation Methodology — a 10-estimator framework applied to OTIS
  / SIU / TPS data over a coordinated set of (treatment, outcome,
  covariates) designs. Per-row individual-level + aggregate (Poisson,
  NB GLM) modes. Mandela classifier (UN Mandela Rules 43 + 44) +
  provincial-vs-federal cross-comparison.

**Spatial statistics**
  Moran's :math:`I`, Geary's :math:`C`, Getis-Ord general :math:`G`,
  join count, LISA, Getis-Ord :math:`G_i^{*}`, local Geary, Ripley's
  K / L, geostatistical kriging (ordinary, universal, IDW,
  co-kriging), variogram fitting, GWR (basic, GW-PCA, ST-GWR),
  bivariate Moran, Moran sweep heatmap, DBSCAN / HDBSCAN, Kulldorff
  space-time scan.

**Hawkes self-exciting point processes**
  Markovian Mohler-Bertozzi-Brantingham fit (exponential kernel +
  constant baseline) plus the non-stationary, non-Markovian
  Kwan-Chen-Dunsmuir (2024) family — Gamma, Weibull, Lomax kernels
  with sinusoidal baselines. Eight (kernel × baseline) combinations
  ranked by AIC and time-rescaling Kolmogorov-Smirnov goodness-of-fit.

**Statistical physics of crime**
  Short-Brantingham reaction-diffusion PDE, Brockmann-Hufnagel-Geisel
  Lévy-flight tail (Hill estimator), Bettencourt urban-scaling
  exponent (HC3-OLS), D'Orsogna-Perc Lotka-Volterra predator-prey,
  SDB Turing-pattern demo, Helbing-Szolnoki inspection-game phase
  diagram, criminal-role co-occurrence networks.

**Survey-weighted inference**
  Horvitz-Thompson totals, Hájek means, ratio estimators, calibration
  weights (raking / IPF), complex-survey GLM, subpopulation estimates,
  stratified / cluster / PPS sampling, bootstrap + jackknife variance,
  design-effect computations, effective-sample-size diagnostics.

**Psychometrics**
  Cronbach's :math:`\alpha`, McDonald's :math:`\omega_t` /
  :math:`\omega_h`, KMO sampling adequacy, Bartlett's sphericity,
  parallel analysis, composite reliability, AVE, item-response-theory
  fits (1PL / 2PL / 3PL / GRM / PCM), differential item functioning
  (Mantel-Haenszel, logistic, generalised), measurement invariance,
  network psychometrics, Bayesian psychometrics. 250+ functions.

**Signal processing + cryptography**
  Spectral analysis, biomedical-signal helpers, homomorphic
  deconvolution, classical and modern crypto primitives
  (ChaCha20-Poly1305, etc.), TurboQuant vector quantization with
  near-optimal distortion (Zandieh et al. 2026 ICLR).

**Datasets**
  60+ built-in datasets in a portable SQLite layer (Canadian
  carceral, police, and oversight + epidemiological reference data).
  Auto dataset-profiling for arbitrary tabular input
  (``morie.dataset.profile_dataset``).

**Function namespace ``morie.fn``**
  36,000+ individual function files indexed by a registry, exposing
  short stable names for every estimator, every kernel,
  every weight matrix, every test. Use ``morie.fn.cheatsheet(name)``
  for a per-function help card.

**Federal SIU + Doob T-539-20 replication**
  Mandela classifier (Rules 43 + 44) with χ² verification, Sprott /
  Doob (± Iftene) IEDM analyses, full replication of Doob's CCRSO 2018
  Tables 1--3 and the imprisonment-vs-crime decoupling Pettitt
  change-point test. See :doc:`methods/sprott_doob`,
  :doc:`methods/doob_trends`, :doc:`methods/siuiap`.

**Toronto Police Service surface**
  ``morie.tps_*`` modules: incident I/O, CSI, neighbourhood
  spatial / temporal analyses, Hawkes (basic + advanced),
  statistical physics, Hohl-style choropleths and proportional-symbol
  district maps. Companion paper at 10.5281/zenodo.20102198.

**LLM + assistant**
  Ollama (local, private) → vendored OllamaFreeAPI (no key) → Gemini
  free tier → local-keyword fallback. Zero cloud dependency at the
  default tier. Vendored TurboQuant KV-cache compression. Polyglot
  REPL bridges variables across Python ↔ R ↔ shell ↔ 12 other
  languages.

**Carbon-aware computing**
  Built-in pure-Python emissions tracker
  (``morie.emissions``) with 213-country IEA carbon-intensity
  data, per-module and pipeline-wide CO₂ accounting. CodeCarbon
  fallback on Python ≤ 3.14.

----

Key design principles
---------------------

*Lean terminal IDE.*
  Rich terminal output — progress bars, formatted ASCII tables, color-coded
  diagnostics. Run entire pipelines from a single ``morie`` command.

*Python + R parity.*
  Every statistical estimator is implemented in both languages with matching
  APIs. Python uses scikit-learn conventions (``fit`` / ``predict``). R uses
  S3 generics (``summary()``, ``plot()``, ``predict()``).

*Automated documentation.*
  Python API docs via Sphinx autodoc. R API docs via Roxygen2 → ``.. r:function::``
  (no manual writing). Run ``devtools::document()`` to regenerate.

*Data governance built-in.*
  Raw CPADS microdata lives in ``data/datasets/``. Wrangled cache in ``data/cache/``.
  Synthetic data (``generate_synthetic_data()``) is labeled synthetic in all
  outputs. ``morie verify`` (planned) will validate manifest output provenance.

*Statistically rigorous.*
  Target estimand is always an explicit parameter (ATE vs ATT vs CATE — never
  implicit). Overlap/positivity violations raise explicit warnings. Cross-fitting
  prevents data leakage. Convergence diagnostics are built into MCMC outputs.

----

Background
----------

MORIE is a multi-domain scientific computing toolkit for
observational inference. It sits between one-off research scripts
and heavy enterprise analytics platforms, and is aimed at
researchers who need:

- A unified Python + R surface across the same estimators (no
  language-choice tax).
- Causal estimators (ATE / ATT / ATC / GATE / CATE / LATE, AIPW,
  G-computation, DML--PLR, DML--IRM, propensity-score matching,
  E-value and Rosenbaum-bound sensitivity) with explicit estimands.
- Survey-weighted inference (Horvitz-Thompson, Hájek, raking,
  cluster + stratified design) on top of the same DataFrame as the
  causal layer.
- Spatial statistics (Moran's :math:`I`, LISA, Getis-Ord
  :math:`G^{*}`, DBSCAN, Kulldorff space-time scan), Hawkes
  self-exciting point processes (Markovian and non-Markovian), and
  the statistical-physics-of-crime models (Short-Brantingham
  reaction-diffusion, Lévy-flight tail, Bettencourt urban scaling,
  Lotka-Volterra) — applied as first-class methods on the
  Toronto Police Service open-data feeds.
- Reproducible pipelines that run unattended in CI / CD — outputs
  carry provenance manifests; synthetic data is labelled as such.
- The MRM (Multilevel Reconciliation Methodology) framework as a primary
  application for Canadian carceral, police, and oversight data
  (Ontario OTIS, federal SIU, TPS).

The package ships 60+ built-in datasets (Canadian carceral, police,
and oversight + epidemiological reference data) in a portable SQLite
layer.

MORIE is licensed under GPL-2.0-only (Linus copyleft, deliberately
chosen over GPL-3.0 for compatibility with the broader
Linux-kernel-style ecosystem). See ``LICENSE`` for the full text and
``LICENSING.md`` for the rationale.

----

Documentation index
-------------------

If you prefer a single linear walkthrough rather than the sidebar
navigation, every page on this site is listed below — top to bottom:

- :doc:`learn/index` — From-zero tutorial track. Start here if you
  have never opened a Python or R console before.
- :doc:`install` — Installation instructions for Python, R, macOS,
  Linux, Windows, plus LLM provider setup.
- :doc:`cli` — Reference for every ``morie …`` subcommand.
- :doc:`methods/index` — Statistical-methods reference. Estimands,
  causal estimators, survey statistics, spatial methods, Hawkes
  processes, statistical physics of crime, OTIS / TPS / SIU
  pipelines, the MRM framework, key empirical findings.
- :doc:`api/index` — Python and R API reference
  (function signatures and docstrings).
- :doc:`contributing` — Development setup, test conventions,
  module-addition guide.

.. toctree::
   :maxdepth: 1
   :caption: Navigation
   :hidden:

   learn/index
   install
   cli

.. toctree::
   :maxdepth: 2
   :caption: Documentation
   :hidden:

   architecture
   methods/index
   api/index

.. toctree::
   :maxdepth: 1
   :caption: Development
   :hidden:

   contributing
   acknowledgments