Andrea Dietrich, professor of civil and environmental engineering in the Charles E. Via Jr. Department of Civil & Environmental Engineering (CEE), faced a daunting challenge. Her AeroTrak particle counter malfunctioned and jeopardized her National Science Foundation-funded research. Her data wasn’t recoverable from the instrument by normal means.

“Our research investigates aerosols produced from humidifiers. Our 1,080 records were collected in winter under conditions of low humidity in an empty room where we had worked on campus for the past three years,” said Dietrich. “The room was reassigned to office space in 2019, and we would not be able to use it again. And the weather conditions would not happen again. Retrieving the data was essential.

“So as not to lose the data, cell phone images were taken of each record on the screen of the AeroTrak,” said Dietrich. “A colleague suggested that an optical character reader (OCR) would be a better route to recover the data than retyping the data off individual images, both for time and accuracy. I reached out to the University Libraries because I thought they could help.”

Dietrich was referred to Nathan Hall, director of digital imaging and preservation in the University Libraries. Hall quickly pulled together a team of library data experts to help.

Library faculty Chreston Miller, data and informatics consultant for engineering, and Jonathan Petters, data management consultant and curation services coordinator, joined Hall in investigating the best way to extract data from cell phone images.

Miller, a computer scientist who earned his bachelor’s, master’s, and Ph.D. from Virginia Tech, is the library’s expert in image processing, OCR, and data mining.

“There were thousands of lines of data,” said Miller. “We needed to pull out data points that met their specific criteria. Andrea and her graduate student Wenchuo Yao thought they would have to do it by hand. But I was able to convert the images to black and white, run the images through Adobe Pro’s OCR engine, and then use the pdfminer Python library to extract the text.”

Through this harrowing experience, Yao, a Water INTERface Interdisciplinary Graduate Education Program Ph.D. student in civil and environmental engineering, was able to experience how a multidisciplinary team goes to work to solve a daunting problem.

“The data retrieval was an interdisciplinary success accomplished by data scientists at the University Libraries and environmental engineers,” said Yao. “Having all the data retrieved and accessible in an Excel file was both exciting and a relief because it saved me days of retyping. I was joyous to have my data, and so was my advisor.”

“This was an interesting and challenging problem,” said Miller. “I love my job because I use my background in computer science to help people solve problems and overcome challenges with data.”

University Libraries’ data services team has expertise in diverse data disciplines.

“When research faculty come to us with specific challenges, we are able to refer them to the data expert with the background to best assist them,” said Miller. “We are here to help researchers with their data and support them in their success.”

Share this story