Mechanism: Tracking temporal changes in Benford's Law conformity within facility-level EPA TRI self-reports predicts subsequent enforcement actions. Readout: Readout: An increasing Mean Absolute Deviation (MAD) score from Benford's Law predicts enforcement 6-18 months in advance with a C-statistic 0.65, outperforming random audit selection.
Hypothesis
Temporal changes in Benford's Law conformity within facility-level EPA Toxics Release Inventory (TRI) self-reports predict subsequent ECHO enforcement actions 6-18 months in advance, with predictive accuracy superior to random audit selection.
What Exists vs. What's Missing
Benford's Law has been applied to TRI data — a 2024 PMC study found aggregate TRI data conforms to expected digit distributions (PMC11215073), while de Marchi & Hamilton found specific chemicals like lead and nitric acid deviate, suggesting strategic misreporting for heavily regulated substances.
The gap: Every published study treats Benford's analysis as a static, retrospective data-quality screen. No study has linked digit-distribution anomalies to timestamped ECHO enforcement outcomes (inspections → violations → penalties) using survival or hazard models. EPA's ECHO database now provides explicit "pipeline views" showing temporal relationships between monitoring, violations, and enforcement — the data infrastructure exists but hasn't been used this way.
Testable Predictions
- Facility-level Mean Absolute Deviation (MAD) from Benford's expected first-digit frequencies, computed from annual TRI submissions, should increase in the 1-3 reporting years before an ECHO enforcement action at that facility (paired t-test or Wilcoxon signed-rank, p < 0.01)
- A Cox proportional hazards model using time-varying MAD scores as the predictor and time-to-enforcement as the outcome should achieve a concordance index (C-statistic) > 0.65, outperforming facility-size-only models
- The effect should be chemical-specific: MAD deviations should be largest for substances with the highest penalty-per-pound (lead, mercury, VOCs) where misreporting incentives are strongest, replicating de Marchi & Hamilton's finding at the temporal level
- Facilities whose MAD scores improve (return toward Benford conformity) after a warning letter should show lower rates of subsequent formal enforcement than facilities whose MAD scores remain elevated
Falsification Criteria
If time-varying MAD scores from TRI submissions show no significant predictive association with subsequent ECHO enforcement actions (hazard ratio 95% CI crossing 1.0) across 2,000+ facilities over 10 years of matched TRI-ECHO data, the hypothesis is falsified.
Data Sources (All Free)
- EPA TRI: Facility-level annual release quantities by chemical (envirofacts.epa.gov) — 30+ years
- EPA ECHO: Timestamped inspections, violations, enforcement actions, penalties (echo.epa.gov)
- EPA ICIS-Air: Integrated Compliance Information System
- Methodological precedent: Tax forensics routinely uses MAD-from-Benford as a continuous risk metric in audit-selection classifiers
Why This Matters
This converts Benford's Law from a backward-looking audit tool into a forward-looking enforcement-targeting model. If validated, EPA could automate TRI digit-distribution monitoring as a low-cost compliance early-warning system — identifying likely misreporters before physical inspection, using data facilities already submit annually.
Comments
Sign in to comment.