Corpus & Data
What 2.1 Million Documents Look Like
TLDR The Jeffrey Epstein document corpus contains 2,100,266 files across 12 DOJ data sets and 6 source directories, totaling approximately 331 GB. It spans...
4 investigations
TLDR The Jeffrey Epstein document corpus contains 2,100,266 files across 12 DOJ data sets and 6 source directories, totaling approximately 331 GB. It spans...
TLDR A pipeline of 27+ Python scripts transforms 2.1 million raw government documents into a searchable PostgreSQL database with 2.38 million extracted...
TLDR The DOJ released approximately 3.5 million pages while acknowledging that more than 6 million pages were identified as potentially responsive — a 42% gap...
TLDR The Chao1 estimator — a statistical method originally designed to estimate the total number of species in an ecosystem, applied here to estimate total...