Survey Sampling¶
Part of Statistical Methods — MORIE’s statistical-methods reference.
MORIE provides a complete probabilistic sampling toolkit for epidemiological
surveys. All methods are implemented in morie.sampling.
Simple Random Sampling¶
Without replacement (SRS WOR), every unit has the same inclusion probability \(\pi_i = n / N\). The Horvitz-Thompson estimator of the mean is the unweighted sample mean \(\bar{y}\).
With replacement (SRS WR), finite-population correction \(\text{fpc} = 1 - n/N\) applies to the variance estimate.
Stratified Random Sampling¶
Partition the population into \(H\) strata. Within stratum \(h\), draw \(n_h\) units by SRS:
Proportional allocation: \(n_h \propto N_h\) — minimises total variance for equal within-stratum variances.
Optimal (Neyman) allocation: \(n_h \propto N_h S_h\) — minimises variance given a fixed \(n\), where \(S_h\) is the stratum standard deviation.
Cluster Sampling¶
When a frame of individuals is unavailable, select \(m\) clusters (e.g. households, classrooms, census tracts) by SRS, then enumerate all or a random sub-sample of elements within selected clusters:
Cluster sampling introduces intra-cluster correlation (ICC), which inflates variance relative to SRS. The design effect (DEFF) measures this inflation:
where \(\bar{m}\) is the mean cluster size and \(\rho_{\text{ICC}}\) is the intra-class correlation.
Python: morie.sampling.cluster_sample()
Probability Proportional to Size (PPS)¶
PPS sampling selects units with probability proportional to a size measure \(x_i\) (e.g. enrolment count):
PPS is more efficient than SRS when the outcome is correlated with size.
Python: morie.sampling.pps_sample()
Horvitz-Thompson and Hájek Estimators¶
For any probability sample with known inclusion probabilities \(\pi_i\):
Horvitz-Thompson (unbiased for population total):
Hájek (ratio estimator for mean, more stable than HT):
Python: morie.sampling.horvitz_thompson_total(),
morie.survey.hajek_mean()
Bootstrap and Jackknife Variance Estimation¶
For complex statistics (medians, quantiles, non-linear estimators) where analytic variances are unavailable:
Bootstrap (Efron 1979):
Delete-1 Jackknife:
Python: morie.sampling.bootstrap_sample(),
morie.sampling.jackknife_estimate()
Effective Sample Size¶
Survey weights create unequal effective sample contributions. The Kish effective sample size (ESS) quantifies the equivalent SRS size:
The design effect \(\text{DEFF} = n / \text{ESS}\) measures variance inflation relative to a simple random sample of the same size.
Python: morie.sampling.effective_sample_size(),
morie.sampling.design_effect()
Calibration / Raking¶
Post-stratification and raking calibrate sample weights so that weighted marginal distributions match known population totals:
where \(d(\cdot)\) is a distance function (chi-squared → linear calibration; multiplicative → raking). MORIE uses iterative proportional fitting (IPF).
References¶
Kish L (1965). Survey Sampling. Wiley.
Cochran WG (1977). Sampling Techniques (3rd ed.). Wiley.
Lumley T (2010). Complex Surveys: A Guide to Analysis Using R. Wiley.
Valliant R, Dever JA, Kreuter F (2013). Practical Tools for Designing and Weighting Survey Samples. Springer.