4,286 Flights: How a Vision Language Model Read Handwritten Flight Logs

Table of Contents

TLDR

Traditional text extraction failed completely on Epstein's handwritten flight logs. A vision language model -- a type of AI that interprets entire page images rather than individual characters -- running on a consumer-grade GPU extracted 4,286 flights with 392 unique passenger names from unredacted logs at zero errors, plus 1,119 flights from redacted logs with a 2.3% error rate (PAPER TRAIL Project, 2026a). The result is a structured dataset that feeds directly into network and temporal analysis.


When Text Extraction Fails

Some documents resist automation. Handwritten flight logs -- filled in by pilots in cursive, often in varying ink quality, on pre-printed forms that have been photocopied multiple times -- are among the most challenging. When we ran standard text-extraction engines against the flight log corpus in Data Set 5, the output was effectively unreadable (U.S. Department of Justice [DOJ], 2025). Character recognition rates on cursive handwriting were so low that the extracted text bore little resemblance to the original entries.

This is not an unusual problem. Handwritten text recognition has been one of the hardest challenges in document processing for decades. But for this project, the flight logs are among the most analytically valuable documents in the entire corpus. They contain dates, routes, aircraft identifiers, and most critically, passenger names -- the raw material for building networks of who traveled with whom.

The VLM Approach

A vision language model (VLM) represents a different approach to document understanding. Rather than attempting character-by-character recognition, a VLM processes the entire page as an image and uses a language model to interpret what it sees. The model understands layout, context, and visual patterns in ways that traditional text extraction cannot.

We used Qwen2.5-VL-7B, a 7-billion parameter vision language model running in 4-bit quantization on a single NVIDIA RTX 4070 with 8 GB of video memory (PAPER TRAIL Project, 2026a). This is consumer-grade hardware -- the kind of GPU found in a gaming desktop. The processing speed was approximately 0.5 documents per minute, modest by industrial standards but entirely adequate for a corpus of a few hundred pages.

The Results

The unredacted flight logs -- 116 pages -- yielded 4,286 individual flights with 392 unique passenger names and zero extraction errors (PAPER TRAIL Project, 2026a). Zero. Every name the model extracted was a valid passenger entry. This is a remarkable result for handwritten document processing, and it demonstrates that VLM technology has crossed a threshold where it can reliably handle documents that defeated previous generations of text extraction.

The redacted flight logs -- 100 pages where portions had been blacked out before release -- yielded 1,119 flights with 26 errors, a 2.3% error rate (PAPER TRAIL Project, 2026a). The errors came from a specific and predictable source: initial-only entries. Where the original log contained only "JE" or "GM" or "DI," the model could extract the initials but could not resolve them to full names. This is not a model failure -- it is a data limitation. You cannot read what was never written.

The Disambiguation Problem

Flight log analysis faces a unique challenge with the Epstein corpus: tail number N212JE was used by two different aircraft. Before August 2017, it designated a Gulfstream G-IV (serial number 1085). After January 2018, it designated a Gulfstream G550 (serial number 5173) (Federal Aviation Administration, n.d.; PAPER TRAIL Project, 2026c). Any flight log entry referencing N212JE must be dated before it can be attributed to the correct aircraft. The VLM extraction captures dates alongside passenger names, making this disambiguation possible in the structured output.

From Names to Networks

The 392 unique passenger names extracted from unredacted logs feed directly into two downstream analyses. First, they enter a grouping algorithm that finds clusters of closely connected entities -- in this case, grouping passengers who frequently appear on the same flights (PAPER TRAIL Project, 2026d). Two passengers who appear on the same flight are connected; passengers who repeatedly fly together form strong connections. The raw flight data is first structured as a two-sided network (flights on one side, passengers on the other) and then converted into a direct passenger-to-passenger network.

Second, the names integrate with the entity resolution pipeline, where a statistical matching method links flight log names against the 2.38 million entities extracted from the broader corpus (PAPER TRAIL Project, 2026e). A name appearing in both flight logs and financial documents creates a cross-domain link.

For redacted logs, where only initials survive, the methodology shifts to a probability-based alias resolution approach with evidence weights: 40% for name similarity, 30% for temporal co-occurrence, 20% for route patterns, and 10% for co-traveler frequency (PAPER TRAIL Project, 2026b). This approach cannot achieve the certainty of full-name matching, but it can generate ranked candidate lists for human review.

The Completeness Question

Despite processing 100% of available flight log pages, a statistical richness estimator -- a method borrowed from ecology that estimates total species from partial observations -- reports only 23.2% completeness for flight log entities (PAPER TRAIL Project, 2026f). The reason is an 88% singleton ratio -- the vast majority of passenger names appear on only one flight. In statistical terms, this means the flight logs likely contain a large number of individuals who flew once and never appeared again, and the true population of unique passengers is roughly four times larger than what has been observed.

This does not mean the extraction failed. It means the underlying population is highly diverse -- many people flew on these aircraft, most of them infrequently. The completeness estimate tells us that even a perfect extraction leaves us seeing less than a quarter of the total picture.

What This Proves

The flight log extraction demonstrates two things. First, consumer-grade VLM hardware can achieve professional-grade results on documents that defeat traditional text extraction. Second, the structured output from VLM processing -- dates, names, routes, aircraft -- creates analytical possibilities that were simply unavailable from the raw handwritten pages. The 4,286 flights are not just a dataset. They are the foundation for network analysis, temporal pattern detection, and cross-domain synthesis.


References

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11(4), 265-270.

Federal Aviation Administration. (n.d.). Aircraft registry inquiry. https://registry.faa.gov/aircraftinquiry

PAPER TRAIL Project. (2026a). VLM flight log extraction results [Data set]. Script 16f output.

PAPER TRAIL Project. (2026b). Flight log reconstruction methodology [Data set]. research/FLIGHT_LOG_RECONSTRUCTION.md

PAPER TRAIL Project. (2026c). Entity ownership research [Data set]. research/ENTITY_OWNERSHIP.md

PAPER TRAIL Project. (2026d). Network topology analysis [Data set]. Script 21, _exports/network/

PAPER TRAIL Project. (2026e). Entity resolution pipeline [Data set]. Script 19, _exports/entity_resolution/

PAPER TRAIL Project. (2026f). Chao1 completeness estimates [Data set]. _exports/validation/chao1_summary.json

U.S. Department of Justice. (2025). Epstein document release, Data Set 5: Flight logs [Government records].