TLDR
Qwen2.5-VL-7B (a vision-language model — an AI system that reads images the way a human would) running on a single NVIDIA RTX 4070 with 8 GB of video memory extracted 4,286 flights from handwritten flight logs with zero errors, recovered 47 wire dates from degraded bank documents, and processed 70 Regulation E forms to recover 36 recipient names worth $1.47 million. The entire 2.1 million document pipeline runs on one consumer PC.
The Hardware
The machine is not impressive by data center standards. An Intel i9-13950HX with 24 cores and 32 threads. An NVIDIA GeForce RTX 4070 with 8 GB of video memory (VRAM — the dedicated memory on a graphics card that AI models use for processing). 64 GB of system RAM. A PostgreSQL 16 database with tuned memory settings. Total cost: a consumer gaming laptop (PAPER TRAIL Project, 2026a).
This is the machine that processed 2.1 million documents. Every script in the pipeline — from OCR (optical character recognition — converting images of text to searchable text) to automated name detection to entity resolution to cross-domain synthesis — runs on this single PC. There is no cloud infrastructure, no GPU cluster, no distributed computing framework. One machine, one GPU, one database (PAPER TRAIL Project, 2026a).
What Vision-Language Models Do
A vision-language model, or VLM, processes images and text together. Unlike text-only OCR, which converts pixels to characters and then discards the visual information, a VLM sees the page the way a human would. It understands layout — that a handwritten name in a log book is different from a printed header. It can read degraded text that confuses traditional OCR. It can distinguish between a filled form field and an empty one (PAPER TRAIL Project, 2026b).
Qwen2.5-VL-7B is a 7-billion-parameter model from Alibaba's Qwen team (Qwen Team, 2024). "7 billion parameters" refers to the number of adjustable values the model learned during training — more parameters generally means better understanding, but also more memory required. At 4-bit quantization (a compression technique that reduces the model's memory footprint by storing each parameter with fewer bits of precision), it compresses to approximately 4 GB, leaving headroom on the 8 GB GPU for image processing. The model runs through Ollama, a local inference server that manages model loading and request handling (PAPER TRAIL Project, 2026a).
4,286 Flights, Zero Errors
The most dramatic result came from the flight logs. Handwritten passenger manifests are among the most difficult documents for text-only OCR — variable handwriting, inconsistent formatting, abbreviated names, and physical degradation from decades of handling (PAPER TRAIL Project, 2026c).
Script 16f fed 116 pages of unredacted flight logs through the VLM. The model extracted 4,286 individual flights with 392 unique passenger names. The error rate was zero — every extracted name was verifiable against the source pages (PAPER TRAIL Project, 2026c).
The redacted logs were harder. From 100 pages of partially redacted flight logs, the model recovered 1,119 flights with 26 errors, a 2.3% error rate. The errors were not hallucinations (cases where the model invents information) — they were caused by initial-only passenger entries (e.g., "G.M." instead of "Ghislaine Maxwell") that the model could not expand without context (PAPER TRAIL Project, 2026c). The distinction matters: the model did not invent names. It failed to resolve abbreviations.
Recovering the Wire Transfers
The wire transfer recovery happened across multiple scripts. Script 16g processed 66 bank documents covering 215 pages, targeting wire transfers with missing fields. The model recovered 47 dates, 44 originator names, and 2 beneficiary names, updating 53 wire transfer records in the database (PAPER TRAIL Project, 2026d).
Before this recovery, many wires were incomplete — amounts without dates, transactions without named parties. After Script 16g, date coverage rose from roughly 38% to 58.9% of all wires. Originator coverage reached 92.9%. Beneficiary coverage hit 94.6% (PAPER TRAIL Project, 2026d).
Script 16c tackled a specific document type: Regulation E forms (standardized consumer banking disclosures). Text-only OCR had failed on 70 of these forms. The VLM reprocessed all 70, recovering 36 recipient names associated with $1.47 million in wire transfers (PAPER TRAIL Project, 2026e).
Zero Hallucinations Verified
Across 18 names specifically verified against source documents, the VLM produced zero hallucinations — zero cases where it invented a name that did not appear on the page (PAPER TRAIL Project, 2026f). This is a small verification sample, but it is the right one: names are the highest-stakes field in this corpus. A hallucinated name could create a false connection between an innocent person and a criminal network. The VLM's perfect score on verified names is the minimum acceptable standard.
The Crash Problem
The system is not without limitations. Under sustained load, the local inference server crashes approximately every 10 to 20 documents due to video memory pressure (PAPER TRAIL Project, 2026a). The 8 GB GPU is running at capacity, and sustained inference causes memory fragmentation that eventually forces a restart.
The engineering solution is pragmatic: automatic crash detection and restart, with a 30-second recovery window. The observation extraction script (running a text-only 7-billion-parameter model) processes at approximately 0.5 documents per minute with auto-restart handling. It is slow. It crashes. It recovers. It keeps going (PAPER TRAIL Project, 2026a).
This is the tradeoff of consumer hardware. A data center GPU with 40 or 80 GB of video memory would eliminate the crash problem and process documents 10 to 50 times faster. But the point of this project is not speed — it is accessibility. The analytical techniques used here are not locked behind enterprise infrastructure. Anyone with a modern gaming GPU can run them.
What This Means
The entire analytical pipeline described in this series — 2.38 million entities extracted, 29.5 million relationships mapped, 519,000 entity clusters resolved, 889 temporal change-points detected, 125,620 communities identified, 224 wire transfers parsed, 2,894 FedEx shipments recovered — runs on hardware that costs less than a month of cloud GPU rental (PAPER TRAIL Project, 2026a).
This is not a limitation. It is a design principle. If forensic-scale document analysis requires cloud infrastructure, it is available only to organizations with cloud budgets. If it runs on a consumer PC, it is available to anyone. Independent researchers, journalists, congressional staff, international prosecutors — anyone with a laptop and the willingness to let scripts run overnight.
The VLM on a consumer GPU is not a compromise. It is a proof of concept that the barrier to serious document analysis is no longer hardware. It is methodology.
References
Qwen Team. (2024). Qwen2.5-VL: A frontier multimodal model. Alibaba Group. https://github.com/QwenLM/Qwen2.5-VL
PAPER TRAIL Project. (2026a). Hardware specifications and infrastructure [Project documentation]. CLAUDE.md, Available Machines section.
PAPER TRAIL Project. (2026b). VLM reprocessing script [Computer software]. app/scripts/08_vlm_reprocess.py
PAPER TRAIL Project. (2026c). Flight log VLM extraction (Script 16f) [Data set]. 116 unredacted pages, 4,286 flights, 0 errors; 100 redacted pages, 1,119 flights, 26 errors (2.3%).
PAPER TRAIL Project. (2026d). VLM wire NULL field recovery (Script 16g) [Data set]. 66 documents, 215 pages, 53 wires updated.
PAPER TRAIL Project. (2026e). VLM Regulation E re-OCR (Script 16c) [Data set]. 70 forms, 36 recipients, $1.47M recovered.
PAPER TRAIL Project. (2026f). VLM hallucination verification [Data set]. 0 hallucinations across 18 verified names.