Data Set 11: The Misidentified Email Trove | Epstein Revealed

TLDR

Data Set 11 was publicly described by media outlets and community indexers as "financial ledgers and USVI flight manifests." It is actually 331,655 PDFs of seized email correspondence in Relativity load file format (a standardized format used by legal teams to organize large document collections for review), dating to at least 2017. The misidentification went unnoticed until systematic file inspection, demonstrating why direct analysis of government releases matters more than secondhand descriptions.

What Everyone Said It Was

When Data Set 11 was published on January 30, 2026, descriptions proliferated quickly. Multiple media outlets characterized it as "financial ledgers, USVI flight manifests, property seizure records." Community indexing efforts, including EpsteinExposed.com, described it as containing "approximately 180,000 images and 2,000 videos." These descriptions were repeated, cited, and amplified across social media and research communities.

They were wrong.

What It Actually Is

Direct inspection of the files revealed that Data Set 11 contains 331,655 PDFs of seized email correspondence plus 4 .m4v video files (PAPER TRAIL Project, 2026a). The files are structured as a Relativity load file — specifically VOL00011.DAT/OPT — with sequential page numbering (a legal page-numbering system where every page receives a unique number in order) in the EFTA_R1_00936xxx+ range. DAT and OPT are database files used by legal document review software. This is the same litigation review format used in Data Set 9. The emails were processed through legal review software, rendered to PDF, and sequentially numbered before release.

The data set occupies approximately 28 GB on disk, making it substantially smaller than DS9 (94.5 GB) or DS10 (81.1 GB) but containing a significant volume of email correspondence.

The Emails Inside

Sample inspection revealed the content and time period of the correspondence (PAPER TRAIL Project, 2026a). EFTA02212883 is a two-page Lesley Groff email chain from May 30, 2017. In it, "jeffrey E." — Epstein — requests "cookies or small cakes. tea" for a noon appointment with "Maxim and his mom." The email footer contains jeevacation@gmail.com, Epstein's vacation email address, providing a searchable anchor across the entire email corpus.

EFTA02212885 is a four-page travel coordination email from April 24, 2017 (U.S. Department of Justice [DOJ], 2026). It documents an American Express Centurion Travel booking for a Russian Federation citizen flying STT (St. Thomas) to JFK on American Airlines Flight 936. The email contains Russian-language text and a request to Lesley Groff for a "new ticket back to Moscow."

These are operational emails. They document the daily logistics of the Epstein network: travel booking, catering, scheduling, coordination. The date range — at least April through May 2017 — places the correspondence firmly in the period after Epstein's 2008 conviction and non-prosecution agreement, when Epstein was a registered sex offender. These emails document continued operations by a convicted sex offender's support infrastructure.

How the Misidentification Happened

The most likely explanation is that initial descriptions were based on DOJ's own metadata or summary descriptions rather than direct file inspection (PAPER TRAIL Project, 2026a). When DOJ publishes data sets, the accompanying descriptions are brief. Media outlets and community indexers work under deadline pressure to characterize releases quickly. If DOJ's internal description was ambiguous or inaccurate, that error propagated through every downstream description.

The confusion may also have arisen from conflation with DS10, which does contain images and videos. DS10 and DS11 were released simultaneously, and their descriptions may have been swapped or merged in the rapid-fire reporting that followed the January 30 publication.

Whatever the cause, the result was that for weeks, the research community operated under a false assumption about what one-sixth of the January release contained. Researchers looking for flight manifests in DS11 would not find them. Researchers ignoring DS11 because they were not interested in financial ledgers were missing 331,655 emails.

NER Now Complete

When this project first processed DS11, Named Entity Recognition (NER) — the automated process of identifying and categorizing names of people, organizations, and locations in text — coverage was at 0% because the data set had not been run through entity extraction due to the misunderstanding about its contents (PAPER TRAIL Project, 2026b). Once the true content was identified, DS11 was prioritized for NER processing alongside DS9. Both are now complete, producing the full 863,000-document email corpus that feeds into the entity resolution, co-occurrence graph, and cross-domain synthesis pipelines.

The NER results from DS11 include the expected email entities: person names in salutations and signatures, organization names in email footers, location references in travel coordination. But they also include entities that would not appear in the "financial ledgers" DS11 was thought to contain — names of travel agents, hotel staff, personal assistants, and service providers who constitute the operational layer of the network.

The Lesson

OBS-7, the observation that resolved DS11's true contents, is a methodological finding as much as a data finding (PAPER TRAIL Project, 2026a). It demonstrates that government data releases cannot be characterized by their descriptions. They must be characterized by their contents. The 331,655 emails in DS11 were always there. The error was not in the data — it was in the assumption that someone else had already looked.

In a corpus of 2.1 million documents, assumptions compound. If DS11 had remained unexamined under the false "financial ledgers" label, one-third of the email corpus would have been invisible to analysis. The investigative cost of that gap — in missed entities, missed connections, missed operational patterns — would have been unquantifiable because no one would have known what they were not finding.

References

PAPER TRAIL Project. (2026a). Observation log: OBS-7. [Data analysis: OBSERVATIONS.md].

PAPER TRAIL Project. (2026b). NER processing status. [Script: app/scripts/04_extract_entities.py].

PAPER TRAIL Project. (2026c). Corpus audit. [Export: _exports/audit/corpus_inventory.csv].

U.S. Department of Justice. (2026). Epstein files library: Data Sets 9-12. Published January 30, 2026. justice.gov/epstein.

Continue the Investigation

What 2.1 Million Documents Look Like

The 16-Script Pipeline

The 42% Gap