Math and Computer Science Colloquium: Speculative Bibliography: Probabilistic Texts, Page Maps, and Propagation Networks

4:30 PM - 5:30 PM

Jepson Hall, 109
The Department of Math and Computer Science welcomes Ryan Cordell, Assistant Professor of English, and David Smith, Assistant Professor of Computer & Information Science, Northeastern University to give a lecture.

The era of mass digitization seems to provide a mountain of source material for digital scholarship, but its foundations are constantly shifting. Selective archiving and digitization obscures data provenance, metadata fails to capture the presence of texts of mutable genres and uncertain authorship embedded within the archive, and automatic optical character recognition (OCR) transcripts contain word error rates for newspapers above 40%. Beyond these issues, even the identity of any given transcription might change due to improved image processing or upgraded OCR. The condition of the mass-digitized text is thus closer to the manuscript sources of an edition than to a scholarly publication.

In this talk, Drs. Cordell and Smith will discuss several aspects of our work on "speculative bibliography" in the Viral Texts Project ( as applied to multi-authored, generically hybrid nineteenth-century newspapers. After briefly summarizing the work of the Viral Texts project, they will discuss an archaeology for tracing the provenance of digitized historical newspapers. They will then discuss methods exploiting the redundancy and cluster structure in these archives to improve OCR accuracy using multi-input attention models and unsupervised learning. They will then demonstrate what page layout analysis can tell us about the evolution of particular newspapers, as well as broader patterns in publishing. They will conclude with a model for inferring frequent communication paths among newspapers.

This event is free and open to the public. Starting at 4 pm, light refreshments are served in the lounge outside Jepson 212.

This talk is cosponsored with the Digital Humanities Faculty Learning Community.

