Technical Deep Dives
The Many Rodgers: How Entity Resolution Handles Scanning Chaos
TLDR A software tool that uses statistics to decide which database records refer to the same person (called Splink) merged eight or more scanning variants of...
4 investigations
TLDR A software tool that uses statistics to decide which database records refer to the same person (called Splink) merged eight or more scanning variants of...
TLDR Nearly one in four entities in the corpus (197,945 out of 821,633) appears in only one document. An entity that appears in only one document — called a...
TLDR A cleanup script (Script 19b) dissolved 8 entity groups containing 3,576 records that had been incorrectly merged because unreadable text that the...
TLDR Probabilistic entity resolution using a statistical matching method reduced 2.38 million raw Named Entity Recognition entries to 519,000 unified clusters...