Header Absorption: How 190 Pages Created a Phantom City

Table of Contents

TLDR

A 190-page Ghislaine Maxwell UBS account statement (EFTA01275697.pdf) generated 190 false "Krakow" entity mentions when automated text scanning mangled the repeated header "Resource Management Account" into "Resource Krakow &moult" on every page. Combined with 6,876 false "Poland" mentions from blank KYC (Know Your Customer) form labels, these artifacts demonstrate how repeated page headers amplified by page count create phantom entities that appear statistically significant (PAPER TRAIL Project, 2026a).

The Krakow That Was Not There

When the entity database was searched for "Krakow" in response to stakeholder analysis for Poland's investigation task force, the results looked promising. Approximately 190 document hits, concentrated in DOJ Data Set 10. For a country investigating Epstein's potential connections to Eastern Europe, 190 documents mentioning the second-largest Polish city seemed like a lead worth pursuing (PAPER TRAIL Project, 2026a).

Visual inspection of the source PDF told a different story.

EFTA01275697.pdf is a 27.2 MB, 190-page document. It is a Ghislaine Maxwell UBS Financial Services Resource Management Account statement package, seized by SDNY (reference numbers SDNY_GM_00023306 through SDNY_GM_00023496). The account is at UBS Financial Services, 299 Park Avenue, 25th Floor, New York. The financial advisors are Scott Stackman and Lyle Casriel. The date is February 2014. The account value went from $0.00 on January 31 to $27,047.74 on February 28, with a single deposit from Chase (PAPER TRAIL Project, 2026a).

Every page of this statement carries the same header: "Resource Management Account." The automated text scanner processed each page independently and on every one of the 190 pages, it transformed the header into "Resource Krakow &moult." The companion text "RMA ResourceLine" was rendered as "*AA Resource" (PAPER TRAIL Project, 2026a).

One document. 190 pages. 190 false entity mentions. That is the entire Krakow presence in the corpus.

Verification

To confirm the false positive, the rest of the corpus was searched for Krakow references outside this single PDF. The only other matches were 4 House Oversight documents containing the surname "KRAKOWER, JUDITH R" — a person's last name appearing in property records, not a geographic reference to the Polish city (PAPER TRAIL Project, 2026a).

The result is definitive: there are zero verified Krakow geographic references in the 2.1 million document corpus. Every apparent reference traces back to either a scanning error in a UBS account header or a surname that happens to start with the same letters.

The Poland Parallel

The Krakow false positive was not isolated. A parallel investigation into "Poland" references produced an even larger artifact (PAPER TRAIL Project, 2026b).

OBS-5 documented the discovery: Deutsche Bank KYC (Know Your Customer) account-opening forms in Dataset 10 contain a pre-printed field label that reads "SSN (U.S. Persons/ Non-U.S. Persons)." The scanning engine, processing blank forms where this field was empty, transformed the label into "UN (US Peon/ Stolle. Poland." The text "Persons/ Non" became "Peon/ Stolle. Poland" (PAPER TRAIL Project, 2026b).

This scanning error was replicated across every blank KYC form page in the dataset, producing 6,876 false "Poland" entity mentions across 690 documents in 4 data sets. A stakeholder researching Polish connections to Epstein would find thousands of hits — all of them phantom (PAPER TRAIL Project, 2026b).

The Amplification Mechanism

Header absorption is not a single scanning error. It is an amplification mechanism. The process works in three steps:

First, the scanning engine encounters a repeated element — a page header, footer, watermark, or form label. If the text is clear, it reads correctly. If the text is degraded, stylized, or in an unusual font, the engine produces a systematic misreading (PAPER TRAIL Project, 2026a).

Second, because the element is repeated on every page of a multi-page document, the misreading is replicated proportionally. A 190-page document produces 190 identical errors. A form template used across hundreds of pages produces hundreds of identical errors.

Third, automated name-extraction processing treats each occurrence as an independent entity mention. The entity database records 190 mentions of "Krakow" from a single document, and statistical tools that count mention frequency flag it as a significant entity. In terms of the completeness estimator (Chao1), this is a high-frequency entity — the opposite of a singleton (an entity appearing only once) — which means it receives high confidence and high influence in downstream analysis (PAPER TRAIL Project, 2026c).

The result is an entity that looks important by every statistical measure. High mention count. Concentrated in a specific data set. Associated with a known person (Maxwell). Connected to a geographic investigation (Poland). Every quantitative signal points toward significance. The only thing that reveals the artifact is looking at the source PDF.

Mitigation

Reprocessing with a vision-language model (Script 08, Qwen2.5-VL-7B) can mitigate header absorption. A vision-language model processes the entire page as an image, understanding layout context. It can recognize that a header reading "Resource Management Account" is repeating boilerplate rather than meaningful content. It can distinguish between filled and blank form fields, avoiding the KYC label hallucination (PAPER TRAIL Project, 2026d).

But vision-language model reprocessing has not been applied to the entire corpus — it runs on a consumer GPU at approximately 0.5 documents per minute, making a full 2.1 million document pass impractical. The pragmatic mitigation is different: high-mention entities concentrated in contiguous document ranges should be verified against source PDFs before being treated as real (PAPER TRAIL Project, 2026d).

This is the lesson of header absorption. The more pages a document has, the more damage a single scanning error can do. Statistical significance, in a corpus processed by imperfect automated scanning, is not the same as evidentiary significance. The numbers say "Krakow is important." The PDF says "Krakow was never there."

References

PAPER TRAIL Project. (2026a). Observations — OBS-6: Krakow false positive, EFTA01275697.pdf [Data]. OBSERVATIONS.md

PAPER TRAIL Project. (2026b). Observations — OBS-5: Poland false positive, blank KYC forms [Data]. OBSERVATIONS.md

PAPER TRAIL Project. (2026c). Chao1 validation — completeness estimate (Script 23) [Data]. _exports/validation/chao1_summary.json

PAPER TRAIL Project. (2026d). VLM reprocessing (Script 08) [Software]. app/scripts/08_vlm_reprocess.py

U.S. Department of Justice. (2026). Epstein Records Library — Data Set 10 [Government release]. 01_DOJ_DataSets/DataSet_10/VOL00010/IMAGES/0001/