TLDR
PostgreSQL 16 serves as the single master database for the entire Epstein corpus analysis. Six core tables hold 2.1 million documents, 2.38 million entities, 29.5 million relationship pairs, 224 wire transfers, 2,894 FedEx shipments, and 229,000 classified bank documents. A fast full-text search system built into the database enables sub-second queries across the full corpus (PAPER TRAIL Project, 2026a).
One Database, No Warehouse
The architectural decision was deliberate: one PostgreSQL instance, no data warehouse, no secondary analytics database, no external search engine. The database name is epstein_files. It runs on the local machine. Every script in the pipeline reads from it and writes to it (PAPER TRAIL Project, 2026a).
This simplicity has costs. There is no read replica for analytical queries to run against while ingestion continues. There is no column-oriented store optimized for aggregation. There is no graph database for the 29.5 million relationship pairs. But the simplicity also has a benefit that outweighs all of those costs: there is exactly one source of truth, and every query hits it.
The Six Core Tables
The documents table holds 2,100,266 records (PAPER TRAIL Project, 2026b). Each row represents a single file in the corpus with fields for the extracted text, the file location on disk, which of the 17 sources it came from, a flexible metadata column (holding attributes like document prioritization scores and processing status), and various timestamp fields. The text column carries a fast full-text search index that enables the search capability underlying the corpus search agent.
The entities table holds 2,383,751 records across three types: organizations (1,342,000, 57%), persons (911,000, 37%), and locations (129,000, 6%) (PAPER TRAIL Project, 2026c). Each entity has a cluster assignment from the entity resolution tool (Splink, a software tool that uses statistics to decide which database records refer to the same person), which reduced the raw entity count to 519,000 unified groups (PAPER TRAIL Project, 2026d). The 1.47-to-1 ratio of organizations to persons reflects the corporate density of Deutsche Bank financial records, where every wire transfer generates multiple organizational references.
The entity_relationships table holds 29.5 million unique pairs representing co-occurrence within documents (PAPER TRAIL Project, 2026e). If two entities appear in the same document, they share a row in this table. This is the foundation for network topology analysis: community detection (which groups related entities), broker identification (which finds entities that bridge gaps between otherwise disconnected groups — what network scientists call structural holes), and the co-occurrence queries that power the corpus search agent's cooccur subcommand. The table is capped at 5,000 documents per query to prevent memory exhaustion on high-frequency entity pairs.
The wire_transfers table holds 224 parsed transactions totaling $24.1 million (PAPER TRAIL Project, 2026f). After multiple vision-language model recovery passes, 58.9% have dates, 92.9% have originators, and 94.6% have beneficiaries identified. Each row links back to its source document and carries structured fields for amount, date, originator, beneficiary, and memo text.
The fedex_shipments table holds 2,894 shipments spanning 2000 to 2005 across two accounts (Epstein personal and NYSG LLC) (PAPER TRAIL Project, 2026g). Fields include sender, recipient, origin address, destination address, weight, cost, and tracking reference. The October 2005 cutoff — four months before Epstein's Palm Beach arrest — marks the boundary of the seized shipping records.
The bank_documents table holds 229,000 classified records from Deutsche Bank releases (PAPER TRAIL Project, 2026f). The classification script categorized each page by document type: wire confirmations, account statements, KYC (Know Your Customer) forms, correspondence, and others. This classification layer sits between the raw documents table and the parsed wire_transfers table, enabling targeted analysis of specific document categories.
Analytical Tables
Beyond the six core tables, the database holds outputs from every analytical script (PAPER TRAIL Project, 2026a):
The temporal_changepoints table stores 889 breakpoints detected across document activity patterns using PELT (Pruned Exact Linear Time), an algorithm that finds sudden shifts in time-series data. Each row records the entity, the change-point date, and the magnitude of the shift. These are anchored against 50 verified calibration dates from primary government and court sources (PAPER TRAIL Project, 2026h).
The synthesis_events table holds 232,083 events from the cross-domain synthesis engine, unifying records from wire transfers, FedEx shipments, bank documents, and institutional forensics into a single event registry. The companion entity_event_bridge table holds 143,791 rows linking entities to events, enabling cross-domain profile generation (PAPER TRAIL Project, 2026i).
The cross_domain_contradictions table stores 10 detected contradictions (7 temporal misalignment, 3 coverage gap, 0 true contradictions). The ach_matrices table stores 3 Analysis of Competing Hypotheses matrices (a structured method for evaluating how well evidence supports competing explanations). The evidence_chains table stores 38 formalized evidence chain nodes (PAPER TRAIL Project, 2026i).
Full-Text Search Indexes and Performance
The full-text search index on text columns is the single most important performance feature in the schema. It transforms full-text search from a sequential scan (minutes on 2.1 million rows) to an index lookup (milliseconds) (The PostgreSQL Global Development Group, 2024).
The institutional forensics module (Script 18) runs seven linguistic marker category searches against the full document text — phrases like "normal for this client" and "sent to a friend for tuition." Without these indexes, each search would require scanning every row. With them, each resolves in under a second (PAPER TRAIL Project, 2026j).
The corpus search agent (Script 25) provides eight subcommands — entity lookup, full-text search, document retrieval, wire search, FedEx search, co-occurrence analysis, temporal timeline, and schema introspection — all resolving to PostgreSQL queries that benefit from the search indexing (PAPER TRAIL Project, 2026k).
Tuning for 64 GB
The database is tuned for its specific hardware: 16 GB reserved for the database's own cache (25% of total RAM), 48 GB signaled to the query planner as available through the operating system's file cache (75% of total RAM). This leaves 16 GB for the operating system, Python scripts, and GPU operations (PAPER TRAIL Project, 2026a).
The tuning reflects a deliberate trade-off. With 64 GB of RAM, allocating more to the database cache would improve repeated query performance but starve the vision-language model inference pipeline when both run simultaneously. The 16/48 split keeps both workloads functional, though neither is optimal in isolation.
What the Schema Encodes
A database schema is a theory about the world. This schema encodes the theory that Epstein's network can be understood through six lenses: documents (what was written), entities (who and what is mentioned), relationships (who appears with whom), financial flows (where money moved), physical logistics (where packages went), and institutional records (what banks documented).
Every analytical finding in the PAPER TRAIL series traces back to a query against these tables. Every number is reproducible. The schema is the contract between the raw documents and the claims we make about them.
References
PAPER TRAIL Project. (2026a). Project configuration and database specification [Data]. CLAUDE.md
PAPER TRAIL Project. (2026b). Metadata ingestion — 2,100,266 documents (Script 01) [Data]. Database: epstein_files
PAPER TRAIL Project. (2026c). Named entity recognition — entity type distribution (Script 04) [Data]. Database: epstein_files
PAPER TRAIL Project. (2026d). Entity resolution — 519,000 clusters (Script 19) [Software]. app/scripts/19_entity_resolution.py
PAPER TRAIL Project. (2026e). Co-occurrence graph — 29.5 million relationship pairs (Script 05) [Data]. Database: epstein_files
PAPER TRAIL Project. (2026f). Bank record classification and wire transfer parsing — 224 wires, $24.1 million (Script 16) [Data]. Database: epstein_files
PAPER TRAIL Project. (2026g). FedEx shipment parsing — 2,894 records (Script 11) [Data]. Database: epstein_files
PAPER TRAIL Project. (2026h). PELT change-point detection — 889 breakpoints (Script 20) [Data]. _exports/temporal/
PAPER TRAIL Project. (2026i). Cross-domain synthesis — events, contradictions, ACH matrices, evidence chains (Script 25b) [Data]. _exports/synthesis/
PAPER TRAIL Project. (2026j). Institutional forensics — willful blindness module (Script 18) [Software]. app/scripts/18_institutional_analysis.py
PAPER TRAIL Project. (2026k). Corpus search agent (Script 25) [Software]. app/scripts/25_corpus_search.py
The PostgreSQL Global Development Group. (2024). PostgreSQL 16 documentation: Full text search. https://www.postgresql.org/docs/16/textsearch.html