Markdown to PowerPoint: Version-Controlled Presentations

Table of Contents

TLDR

All PAPER TRAIL episodes are authored as numbered markdown files (one per slide), converted to PowerPoint via Python scripts, and tracked through version control. This pipeline enables trackable file-by-file changes, text-to-speech compatibility checking (ensuring the script reads well when spoken aloud), batch regeneration when data updates, and parallel production of jurisdiction-specific decks for France, New York, UK, and USVI (PAPER TRAIL Project, 2026).

The Authoring Problem

Producing a 14-episode documentary series with approximately 450 slides creates a version control problem that PowerPoint cannot solve. Every time a number changes in the database — a new wire transfer parsed, a PELT (an algorithm that detects sudden shifts in patterns over time) change-point count corrected, a NER (Named Entity Recognition) completion percentage updated — every slide referencing that number must be found and updated. In PowerPoint, this is a manual search-and-replace operation across binary files that cannot be meaningfully compared line by line (PAPER TRAIL Project, 2026).

The solution is to author in plain text and compile to presentation format. Every PAPER TRAIL episode exists as a directory of numbered markdown files: slide_01.md through slide_38.md (or however many slides the episode requires). Each markdown file contains the text content for one slide, including speaker notes, data references, and formatting hints.

The Conversion Pipeline

Python scripts convert the markdown directories into .pptx files. Episode-specific scripts handle episodes with custom layouts. A master script enables batch regeneration of all PowerPoint files when slide content is updated — one command to rebuild the entire series (PAPER TRAIL Project, 2026).

The conversion pipeline has produced .pptx files for EP06 through EP14. Earlier episodes (EP01-EP05) were authored before the pipeline was fully standardized. Jurisdiction-specific decks — France, New York, UK, and USVI — were also produced using a separate script, each with dedicated slide directories containing investigation-relevant subsets of the full series content.

Why Markdown

Markdown is plain text with minimal formatting syntax. A heading is a line starting with #. Bold text is wrapped in double asterisks. Lists start with dashes. This simplicity creates three production advantages:

Version-control-compatible tracking. Every edit to every slide is trackable through standard comparison tools. When EP12 "The Method" needed to correct its PELT change-point count from 966 to 889 and update Leiden community count (groups of entities that cluster together in a network) to 125,620, the comparison showed exactly which lines changed. In a binary .pptx file, the same change would be invisible to version control (PAPER TRAIL Project, 2026).

Collaborative editing. Multiple contributors can work on different slides simultaneously without merge conflicts. The numbered file convention — each slide is its own file — means editing slide 17 never touches the file containing slide 23.

Searchability. Finding every reference to a specific number, entity, or claim across all 450 slides is a text search, not a manual review of 14 PowerPoint files. When stale data needed correction across the staged episodes (EP10-EP14), text search identified every instance in seconds.

Text-to-Speech Compatibility

Text-to-speech (TTS) compatibility checking ensures that all slides are readable by automated narration systems. This is not an accessibility afterthought — it is a production requirement for a series designed to be distributed as narrated video (PAPER TRAIL Project, 2026).

TTS compatibility checking catches problems that visual review misses. The abbreviation "Chao1" (a statistical estimator named after Anne Chao) would be read by speech engines as "chow-one" or "chay-oh-one." The correction — rendering it as "Chao-one" in speaker notes — ensures consistent pronunciation across all episodes. Similar corrections handle statistical notation, legal citations, and foreign names.

All five staged episodes (EP10-EP14) have passed TTS compatibility verification, meaning they are ready for automated narration without manual intervention on pronunciation (PAPER TRAIL Project, 2026).

Stale Data Correction

The markdown pipeline's greatest test came when pipeline outputs changed after episodes were initially authored. The PELT algorithm originally reported 966 change-points; database verification corrected this to 889. The Leiden community detection algorithm produced community counts that needed updating. NER completion percentages changed as DS9 and DS11 email processing finished (PAPER TRAIL Project, 2026).

In a traditional production workflow, these corrections would require opening each PowerPoint file, searching for the old numbers, replacing them, and verifying that formatting survived the edit. In the markdown pipeline, the corrections were text replacements across plain files, followed by batch regeneration of all .pptx outputs. The version control comparison showed exactly what changed. The regeneration script ensured consistency.

Jurisdiction Decks

The same markdown-to-PowerPoint pipeline that produces the main series also produces jurisdiction-specific presentation decks. These are subsets of the full series tailored for specific investigative audiences (PAPER TRAIL Project, 2026):

  • France deck: Wire transfers to French entities ($1.575 million), PIASA/ARTCURIAL/AUP, Jean-Luc Brunel network
  • New York deck: Deutsche Bank accounts, 575 Lexington convergence, 9 East 71st Street, Southern District of New York proceedings
  • UK deck: Maxwell UBS accounts, Butterfly Trust UK passport, Andrew/Mandelson arrests
  • USVI deck: 6100 Red Hook Quarter entities, Southern Trust EDC exemption, Great St. James

Each jurisdiction deck has its own slide directory and .pptx output, maintained through the same version-controlled pipeline. When the main series content updates, the jurisdiction decks can be regenerated to reflect the changes.

The Production Stack

The full production stack for PAPER TRAIL is: markdown files (authoring) to Python scripts (conversion) to PowerPoint (distribution), with SVG graphics (Scalable Vector Graphics — images that stay sharp at any size) and references files (verification) as companion artifacts. Every layer is plain text except the final .pptx output, and that output is regenerable from the plain text sources at any time (PAPER TRAIL Project, 2026).

This stack runs on a single machine. The same PC that processes 2.1 million documents, runs NER extraction, and performs entity resolution (matching different references to the same real-world person or organization) also produces the presentations that communicate the results. The pipeline that analyzes the evidence and the pipeline that presents the evidence share the same hardware, the same database, and the same version control.

References

PAPER TRAIL Project. (2026). Slide directories [Data]. communications/ep02_slides/ through ep14_slides/.

PAPER TRAIL Project. (2026). Conversion scripts [Script]. communications/generate_all_pptx.py, ep06_to_pptx.py, ep07_to_pptx.py.

PAPER TRAIL Project. (2026). Jurisdiction decks: France, New York, UK, USVI [Data]. communications/jurisdiction_france_slides/, jurisdiction_ny_slides/, etc.

PAPER TRAIL Project. (2026). TTS compatibility: "Chao1" to "Chao-one" correction [Data]. MEMORY.md.

PAPER TRAIL Project. (2026). Stale data correction: PELT 966 to 889, Leiden 125,620 [Data]. MEMORY.md.

PAPER TRAIL Project. (2026). PowerPoint outputs: EP06 through EP14 [Data]. communications/EP06_2894_Packages.pptx through EP14_What_Remains.pptx.

PAPER TRAIL Project. (2026). Jurisdiction PowerPoint files: France, NY, UK, USVI [Data]. communications/.