Redacted vs. Unredacted: What Flight Log Quality Costs
TLDR Vision-language model processing of unredacted flight logs extracted 4,286 flights with 392 unique passenger names at zero errors. The same model on...
11 investigations
TLDR Vision-language model processing of unredacted flight logs extracted 4,286 flights with 392 unique passenger names at zero errors. The same model on...
TLDR A software tool that uses statistics to decide which database records refer to the same person (called Splink) merged eight or more scanning variants of...
TLDR Nearly one in four entities in the corpus (197,945 out of 821,633) appears in only one document. An entity that appears in only one document — called a...
TLDR A pipeline of 27+ Python scripts transforms 2.1 million raw government documents into a searchable PostgreSQL database with 2.38 million extracted...
TLDR A FedEx third-party shipment lists the recipient as "DR BRUCE LOSKOWTZ" in West Palm Beach. Character substitution analysis suggests...
TLDR Two observations were retracted after visual inspection revealed that scanning software (OCR, or optical character recognition — software that reads text...
TLDR OCR engines (software that converts images of text into searchable characters) produce phantom entities from blank form labels and repeated document...
TLDR Custom-built parsing scripts extracted 2,894 FedEx shipments from DOJ-released invoices, spanning July 2000 through October 2005 (PAPER TRAIL Project,...
TLDR Of 148 FedEx shipments marked "Third Party," 94% still billed to Epstein's personal account -- meaning the designation identifies who initiated...
TLDR Traditional text extraction failed completely on Epstein's handwritten flight logs. A vision language model -- a type of AI that interprets entire page...
TLDR Six parallel research agents independently verified corpus findings against public registries, FAA records, and court filings (PAPER TRAIL Project,...