The National Archives and Records Administration, the official recordkeeper of the United States, provides digital access to more than 110 million digital records, a number that continues to grow exponentially.

“The National Archives is expected to host as many as 13 billion records within the next few years,” said Sylvester Johnson, director of the Virginia Tech Center for Humanities. “So if you want to tackle a specific topic — to find the proverbial needle in the haystack — your ability to search those records in a meaningful way is becoming harder and harder. The needle you’re seeking hasn’t increased in size, yet the haystack is millions of times larger than it used to be.”

The Andrew W. Mellon Foundation has awarded Virginia Tech a planning grant to work with the National Archives and a number of universities nationwide to understand the opportunity for using artificial intelligence to search digital records.

“The Mellon Foundation is making it possible for us to understand how we can enable researchers to continue studying archives when particular sources are buried among billions of other digital records,” said Bill Ingram, assistant dean for archives and technology services in University Libraries at Virginia Tech, who will serve as principal investigator for the grant. “Citizens should be able to easily access and find governmental information they need to be informed about government actions and policy. An informed citizenry is essential to maintaining democracy. We are excited that Virginia Tech can pursue such an important opportunity.”

The planning grant will make possible a two-day workshop in which Virginia Tech librarians, archivists, and humanities faculty researchers will join with deep subject experts in the fields of digital libraries, machine learning, archives, information retrieval, document analysis, natural language processing, computational linguistics, and deep learning. Together, the scholars will explore the application of artificial intelligence and machine learning to help ensure public and scholarly access to the large-scale collection of government records at the National Archives.

“The University Libraries at Virginia Tech has spent years developing new ways for technology to support scholarly communication,” said Ingram. “As digital technology advances, this has become even more urgent. And collaborating with the National Archives will provide citizens new methods of search and discovery within the large collection of governmental documents housed in the archives.”

Representing the National Archives on the leadership project team will be Pamela Wright, who, as chief innovation officer, is responsible for formulating and implementing the agency’s strategic direction for providing online public access to its holdings.

“The National Archives is eager to collaborate with Virginia Tech and scholars across the country to explore ways that emerging technologies could improve digital access to our records,” said Wright.

Johnson believes the timing is right for testing the promise of those technologies in ensuring access to the nation’s holdings.

 “We need a more sophisticated ability to search those records, as humans literally cannot look at billions of records,” said Johnson, who is leading another Center for Humanities project recently funded by the Mellon Foundation. “We envision this process will allow us to understand at a technical level and more broadly how artificial intelligence can help solve this problem.”

A key outcome of the planning workshop will be the design of a subsequent pilot project aimed at enhancing access to National Archive collections, including the creation of new tools, techniques, and practices.

“Understanding the business of our government is foundational to democracy,” Johnson said. “Our ultimate aim is to ensure that scholars and members of the public alike can use these records. We want our university to have an impact beyond the boundaries of our campus, and this workshop is one important way we can accomplish that.”

Share this story