TLDR
Eight pipeline stages contribute measured error rates that compound to a worst-case correct rate of 8.4%, meaning any conclusion that passes through every stage carries substantial cumulative uncertainty (PAPER TRAIL Project, 2026a). This is not a flaw -- it is a legally required known error rate disclosure under the standard for admissible expert testimony established in Daubert v. Merrell Dow Pharmaceuticals (1993).
The Eight Stages
Every conclusion in the synthesis engine passes through a series of processing stages, each with a measured error rate. These rates are not guesses -- they are derived from validation samples, calibration benchmarks, or species richness estimation (PAPER TRAIL Project, 2026a). The eight stages and their error rates:
OCR transcription (5.0%). Measured against manually verified samples. One in 20 characters is incorrect, which can corrupt entity names, dates, or amounts.
NER precision (13.0%). The named entity recognition system's false positive rate -- 13% of extracted entities are not real entities. These are artifacts of OCR errors, boilerplate text, or ambiguous strings.
NER recall (18.0%). The missed entity rate. Eighteen percent of real entities in the documents are not extracted. They exist in the corpus but are invisible to the pipeline.
Entity resolution (16.0%). The rate at which the probabilistic record linkage system -- called Splink -- incorrectly merges or fails to merge entity records. Measured against a clerical review sample (PAPER TRAIL Project, 2026b).
Wire parsing (5.0%). The rate of field extraction errors in wire transfer records. Amounts, dates, originators, or beneficiaries may be incorrectly parsed.
FedEx parsing (3.0%). The lowest error rate in the pipeline. FedEx records are more structured than wire transfers, producing fewer extraction errors.
Temporal alignment (74.9%). The proportion of PELT change-points without a matched contextual event. This is the largest single error source, but it measures a different kind of uncertainty: a breakpoint can be real without corresponding to any event in the 50-date calibration set (PAPER TRAIL Project, 2026c).
Corpus completeness (36.3%). The Chao1 species richness estimator projects 1,290,141 total entities from the 821,633 observed -- meaning 36.3% of the estimated total is missing from the corpus entirely (PAPER TRAIL Project, 2026d).
Compound Error Arithmetic
When stages are independent -- meaning an error in one stage does not cause or prevent errors in another -- the compound correct rate is the product of each stage's correct rate. Multiplying (1 - 0.05) x (1 - 0.13) x (1 - 0.18) x (1 - 0.16) x (1 - 0.05) x (1 - 0.03) x (1 - 0.7492) x (1 - 0.363) yields 0.084, or 8.4% (PAPER TRAIL Project, 2026a).
This means that a conclusion relying on every pipeline stage being correct simultaneously has an 8.4% probability of surviving all stages without any error contamination. The corresponding error ceiling -- the maximum proportion of a conclusion's confidence that could be attributable to pipeline error -- is 0.916.
In practice, not every conclusion depends on every stage. Wire transfer analysis does not depend on FedEx parsing. Entity resolution does not depend on temporal alignment. The compound error rate is a worst case that applies to chains spanning all eight stages.
What This Means for Evidence Chains
The error propagation analysis evaluated 20 evidence chains (PAPER TRAIL Project, 2026a). Each chain has an original confidence score based on the strength and convergence of the evidence, an adjusted confidence that accounts for corpus completeness, and an error-adjusted confidence that further discounts for compound pipeline error.
The error-adjusted confidence values range from 0.175 to 0.206 across all 21 chains (PAPER TRAIL Project, 2026a). The error ceilings range from 0.648 to 0.666. Every chain is classified as "lead -> finding_if_no_error" -- meaning that if the pipeline stages were error-free, the evidence would be strong enough to cross the 0.75 finding threshold. Pipeline error is what keeps them as leads.
This classification is informative, not defeatist. It tells analysts exactly what would need to improve -- and by how much -- to elevate leads to findings. The Monte Carlo simulation confirms that corpus completeness is the most sensitive parameter (PAPER TRAIL Project, 2026e). Obtaining the estimated 468,000 missing entities through additional releases would do more than any technical refinement.
Why Disclose This
The Daubert standard requires expert testimony to disclose its known or potential error rate (Daubert v. Merrell Dow Pharmaceuticals, 1993). A computational analysis claiming conclusions without disclosing the error rates of its processing pipeline would fail this standard. The error disclosure matrix is not an appendix -- it is a core component of every conclusion the system produces (PAPER TRAIL Project, 2026a).
The 8.4% compound correct rate sounds alarming in isolation. In context, it reflects transparency that most analytical systems do not provide. A human analyst reviewing documents makes errors at every stage -- reading, interpreting, connecting, remembering -- but those error rates are rarely measured and never compounded. This system measures them, compounds them, and reports the result.
References
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).
PAPER TRAIL Project. (2026a). Error disclosure matrix and error propagation [Data set]. _exports/synthesis/error_disclosure_matrix.csv, error_propagation.csv
PAPER TRAIL Project. (2026b). Splink entity resolution calibration [Data set]. _exports/entity_resolution/
PAPER TRAIL Project. (2026c). PELT recalibrated change-points [Data set]. _exports/temporal/curated_changepoints_summary.csv
PAPER TRAIL Project. (2026d). Chao1 completeness estimates [Data set]. _exports/validation/chao1_summary.json
PAPER TRAIL Project. (2026e). Monte Carlo sensitivity results [Data set]. _exports/synthesis/mc_ach_robustness.csv
This investigation is part of the SubThesis accountability journalism network.