OTIS Linkage Constraints — Read Before Doing Any Individual-Level Analysis¶
Part of Statistical Methods — MORIE’s statistical-methods reference.
TL;DR: OTIS UniqueIndividual_ID is NOT a stable person identifier. It is randomly re-assigned per fiscal year and per dataset file. Any analysis claiming individual-level linkage across years, or across datasets within the same year, is artifactual.
What the official dictionary says (verified 2026-05-08)¶
From the Ontario MCSCS data dictionary v2.0 (resource d83fe893-9634-4794-a0c1-c17bf619a95a, last modified 2025-11-19):
UniqueIndividual_ID: A random number assigned to an individual (format: YYYY-XXXXX-AA), where YYYY reflects the year at the end of the fiscal year reporting period [calendar year for d-series],XXXXXis a sequence, andAAis a dataset acronym (RC,SG,DC).The unique ID is randomly re-assigned to different individuals each year. The unique ID may also be randomly assigned to different individuals for each data file of the same year.
_id (the CKAN row ID) “cannot be used to link datasets, as the same _id will likely represent different records across different datasets.”
What this means in practice¶
✅ “Person X had N placements in fiscal year Y” — valid (within one dataset, one fiscal year).
🔴 “Same person was in segregation in 2023 and again in 2024” — not measurable: IDs don’t carry over.
🔴 “Same person appears in both a01 (RC) and b01 (Segregation) within FY 2024” — not measurable: IDs are also re-randomized between files.
✅ “% of person-years involving multiple regions” — valid (intra-year Goffmanian mobility).
✅ “Aggregate placements per region per year” — valid (no individual linkage required).
✅ “Distribution of placements per individual within a fiscal year” — valid.
Empirical confirmation¶
Run .venv-314/bin/python -c "from morie.otis_datasets import load_otis_dataset; ...":
a01 (76,934 rows, 65,467 unique IDs): 0 IDs span >1 fiscal year. YYYY-prefix == EndFiscalYear in 100% of rows.
b01 (82,001 rows, 33,136 unique IDs): 0 IDs span >1 fiscal year.
a01 ∩ b01 (any FY): 0 shared
XXXXX-AAsuffixes — the AA differs (RCvsSG), and even stripping AA the suffix space doesn’t overlap.The “same suffix” workaround fails: e.g.
2023-00002-RCand2024-00002-RCshow different demographic profiles (age 50+ → 25-49 → 50+ — biologically impossible for one person). This is the documented re-randomization at work.
Which analyses are valid¶
Within-year individual-level (intra-FY Goffmanian)¶
morie.otis_churn.within_year_placement_count(b01)— distribution of placements per (id × FY) cell. (50.3% of person-years have multiple placements; Gini = 0.432.)morie.otis_churn.within_year_region_diversity(b01)— distinct regions per (id × FY) cell. (3.8% of person-years span multiple regions.)Alert co-occurrence within an FY (chi² + Cramer’s V): mortification cluster.
Disciplinary × medical-protective overlap within an FY.
Region × alert state-richness within an FY.
Aggregate / population-level (no individual linkage)¶
morie.otis_churn.repeat_placement_concentration(b09)— population-wide placements-per-individual distribution from binned aggregate data; Goffmanian heavy tail via Gini + power-law.morie.otis_churn.embedding_distribution(b02)— total-days-in-segregation distribution; lognormal vs Pareto AIC.Year-over-year aggregate trend tests (rate ratios, Pettitt change-point).
Demographic contingency tables (race × region, gender × age, etc.) within and across years (using counts, not linkage).
Causal estimands (intra-year)¶
morie.otis_causal.otis_irm_dml— IRM-DML withcluster_cols=["yr"](treats year as the cluster, not individuals).morie.otis_causal.otis_aipw,otis_psm,otis_ipw— all operate on the (treat, vm) pair within FY.
Which analyses are INVALID — never trust their output¶
If you see code that does any of these, treat the result as artifact:
groupby("UniqueIndividual_ID")["EndFiscalYear"].diff()— every group has exactly one year, so.diff()returns NaN/0; “100% same fiscal year” is the artifact, not Goffman.Any “time-to-readmission across fiscal years” claim from OTIS data.
Any “same person also showed up in dataset X” join across
a01andb01(or any pair).Stripping the YYYY prefix to use
XXXXX-AAas a cross-year key — empirically broken (see above), and explicitly disclaimed by the dictionary.
How we got here (2026-05-08)¶
A prior morie.otis_churn.time_to_readmission() reported “100% of gaps are within the SAME fiscal year — Goffman’s cyclical inmate dynamic.” That was the artifact, not the dynamic. The function has been renamed within_year_placement_count with corrected semantics. cross_region_churn was renamed within_year_region_diversity with corrected interpretation (the math was always intra-year; only the framing was wrong).
Sources¶
Local copy:
data/datasets/OTIS/OTIS_DATA_DICTIONARY.md