March 22, 2015
Professor James Allan, co-Director of the UMass Amherst Center for Intelligent Information Retrieval, is collaborating with Northeastern University’s NULab for Texts, Maps, and Networks on a half million dollar grant from the Andrew W. Mellon Foundation to develop the Proteus toolset for information retrieval and visualization.
Allan is working with Northeastern Professors David Smith (adjunct UMass Amherst CS faculty), Ryan Cordell, Elizabeth Maddock Dillon, and Benjamin Schmidt to build software tools to help researchers in the digital humanities to explore the contents of large, unstructured collections of historical books (two million out-of-copyright books), newspapers, and other documents.
The initial work in many research projects goes toward forming a corpus of relevant documents. Although scholars today have access to an unprecedented amount of source material from mass digitization projects by Google, the Internet Archive, the Library of Congress, and others, a single subject heading or search-engine query in these archives is unlikely to capture all of the materials relevant to a long-term scholarly research effort.
Users of the Proteus system will be able to interactively and incrementally build up collections by analyzing networks of text reuse among books, passages, authors, and journals; provide feedback on terms, phrases, named entities, and metadata; and explore these growing collections during search, while browsing, and with Bookworm, the interactive full-text visualization tool.