Data Quality
The Singleton Crisis: 197,945 Entities That Appear Once
TLDR Nearly one in four entities in the corpus (197,945 out of 821,633) appears in only one document. An entity that appears in only one document — called a...
3 investigations
TLDR Nearly one in four entities in the corpus (197,945 out of 821,633) appears in only one document. An entity that appears in only one document — called a...
TLDR A cleanup script (Script 19b) dissolved 8 entity groups containing 3,576 records that had been incorrectly merged because unreadable text that the...
TLDR A 190-page Ghislaine Maxwell UBS account statement (EFTA01275697.pdf) generated 190 false "Krakow" entity mentions when automated text scanning...