Relativity Load Files: The Emails Were Already Reviewed

Table of Contents

TLDR

Both Data Set 9 (531,256 files) and Data Set 11 (331,655 files) are structured as Relativity load files — a standardized format used in litigation document review — with DAT/OPT metadata and Bates numbering (unique sequential identifiers assigned during document processing) prefixed EFTA_R1_. Relativity is litigation review software used by law firms and government agencies to process discovery documents. This means the 863,000 emails were already reviewed, tagged, and organized through a professional document review pipeline before DOJ released them to the public — raising questions about what was excluded during that review (PAPER TRAIL Project, 2026).


What Relativity Is

Relativity is the dominant platform for electronic document review in U.S. litigation. Developed by Relativity (formerly kCura), the software is used by law firms, government agencies, and corporate legal departments to ingest, process, search, review, and produce documents in discovery (the legal process where parties exchange relevant documents). When a court orders one party to produce emails, financial records, or other electronic documents, the producing party typically processes them through Relativity before turning them over.

The platform's core function is to convert raw electronic data — email server exports, PST files, file shares — into a structured, searchable database. Each document receives a Bates number (a unique sequential identifier), is converted to a standard image format (usually TIFF or PDF), and gets a metadata record containing fields like date, author, recipient, file type, and custodian (the person whose files were collected). The output is a "load file": a package of images plus metadata that can be ingested by the receiving party's own Relativity instance.

Data Sets 9 and 11 are Relativity load files.

The Evidence

The structural indicators are unambiguous. DS11 carries the volume identifier VOL00011, matching Relativity's standard volume naming convention. Documents carry Bates numbers with the prefix EFTA_R1_ followed by eight-digit sequential numbers — EFTA_R1_00938289, EFTA_R1_00936296, and so on. Each email has been printed to a separate PDF file, creating the one-page-per-file structure characteristic of Relativity image productions. The underlying DAT and OPT files contain the metadata and image pointer tables that Relativity uses to link images to their metadata records (PAPER TRAIL Project, 2026).

DS9 follows the same pattern with 531,256 PDFs under VOL00009. The Bates prefix is the same: EFTA_R1_. The file structure is the same: individual PDFs per email page.

What This Means

The Relativity format tells us something important about the provenance of these emails. They were not dumped raw from an email server. They were processed through a professional document review pipeline — ingested, deduplicated, Bates-stamped, and converted to production format. Someone (or more likely, a team of document reviewers) went through these emails before they became public.

In standard litigation practice, the Relativity review workflow involves several stages. First, data is ingested and processed (deduplication, email threading, metadata extraction). Second, a review team — attorneys or trained reviewers — examines documents for responsiveness, privilege (legal protections such as attorney-client privilege that shield certain communications from disclosure), and confidentiality. Third, non-responsive or privileged documents are withheld. Fourth, the remaining documents are produced in load file format with Bates numbers.

The EFTA_R1_ prefix and the sequential Bates numbering indicate that these emails went through this full workflow. The numbers are not arbitrary — they are assigned during the production phase after review decisions have already been made. Every document with a Bates number was affirmatively designated for production. Documents without Bates numbers were withheld.

The Gap Question

This raises the obvious question: how many emails were withheld? The Bates numbers in the DS11 samples include EFTA_R1_00936296 and EFTA_R1_00938289. If these numbers are part of a single continuous sequence, the gap between them — approximately 2,000 numbers — may represent either documents assigned to other data sets or documents withheld during review.

The total Bates range across DS9 and DS11 is unknown without processing all 863,000 files to extract the minimum and maximum Bates numbers. But a statistical estimation method called the German Tank Problem — originally used by Allied forces in WWII to estimate German tank production from serial numbers, and the same method Script 18 uses to detect document gaps in sequential identifier series — could be applied to the Bates numbers to estimate the total production size and identify potential gaps where documents may have been removed (PAPER TRAIL Project, 2026).

If, for example, the Bates numbers span from EFTA_R1_00000001 to EFTA_R1_01200000, but only 863,000 documents were produced, that would imply approximately 337,000 documents were reviewed and withheld. The actual numbers require computation, but the method is straightforward: the highest Bates number minus the lowest, compared to the count of produced documents, yields the withholding rate.

The Misidentification

The Relativity format is also relevant to the misidentification of DS11. Media reports described Data Set 11 as "financial ledgers, USVI flight manifests, property seizure records." Community indexing efforts cataloged it as "approximately 180,000 images and 2,000 videos." Neither description matched the actual contents (PAPER TRAIL Project, 2026).

The confusion likely arose because Relativity load files look unusual to people unfamiliar with the format. A directory containing 331,655 PDFs, each one or two pages long, with cryptic Bates numbers for filenames, does not immediately communicate "email." The DAT and OPT files that explain the structure are small metadata files that many viewers would overlook. Without opening sample PDFs and reading the content, the data set appears to be an undifferentiated mass of document images.

The resolution came from manual inspection: opening EFTA02212883 and EFTA02212885, reading the email content, and recognizing the Relativity production format from the Bates stamps and file structure. The lesson is that metadata about a data set — whether from DOJ descriptions or community indexing — is not a substitute for actually looking at the documents.

Implications for Analysis

The Relativity format provides analytical advantages that raw email exports would not. The Bates numbering creates a reliable unique identifier for every document. The sequential numbering enables gap analysis. The metadata files, if fully parsed, may contain fields like email date, sender, recipient, and subject line — structured data that would dramatically accelerate the email network analysis described in the project's email methodology research (PAPER TRAIL Project, 2026).

The disadvantage is that the production format strips away technical email metadata (SMTP headers, routing information, IP addresses) that would be present in raw email exports. Relativity productions typically contain the rendered content of the email — the body text, attachments, and basic header fields — but not the full technical envelope. For communication network analysis, this means thread reconstruction must rely on content similarity rather than message-ID threading.

The 863,000 emails have completed entity extraction processing. The next analytical frontier is parsing the Relativity metadata files to extract the structured header fields that would enable systematic sender-recipient mapping across the entire corpus.


References

PAPER TRAIL Project. (2026). Email network analysis methodology [Research].

PAPER TRAIL Project. (2026). Institutional forensics: German Tank Problem estimator [Script 18].

PAPER TRAIL Project. (2026). Observations (OBS-7: Data Set 11 content resolution) [Data set].

U.S. Department of Justice. (2026). Epstein files: Data Sets 9 and 11. DOJ Epstein Library.