Data privacy: Subterranean footstep blues with computational modeling and data analytics
A team of researchers from Virginia Tech and Penn State wanted to use sensors on an underground fiber optic network on the latter’s campus to monitor for limestone caverns, damaged buildings, broken water mains, and earthquake tremors.
The network, called the FORESEE — that’s short for fiber optics for environmental sensing array, has been monitoring underground vibrations since spring 2019. The work was the result of a collaboration between Tieyuan Zhu, who led the experimental efforts at Penn State, and Eileen Martin of the College of Science at Virginia Tech, who led data analytics efforts.
Then researchers found something else in the data. At first, they weren’t sure what it was. They saw strong signals, moving more slowly than vehicles. They came from a part of campus where scooters and bikes were prohibited.
On a slow July morning on the Penn State campus, the researchers noted that there was little noise. One of them went to a bus stop and wrote down the time people disembarked from buses and began walking. He also marked his own beginning location and time and started walking.
When the team looked at the data from the sensors — the information from the researcher at the bus stop and the pace of the movement of the marks — it added up to one thing: the underground sensors were recording people’s footsteps.
These signals were a problem for researchers because they could interfere with the ability to monitor the subsurface. There was a larger long-term concern as well: researchers hoped to release a mass of data from the 5 kilometer underground fiber optic grid in the coming years, and they were concerned about future larger experiments being planned in other cities where footstep detection could be a concern for residents.
“The privacy wasn’t a concern until we realized what’s possible,” said Weichen Li, a graduate of the computational modeling and data analytics (CMDA) program in the College of Science’s Academy of Integrated Science and now a graduate student in data analytics engineering at Chicago’s Northwestern University.
“Past studies have shown that people with different body shapes tend to walk differently and that people have unique walking patterns. With some observations, it might be possible to track a specific person,” Li said.
More so, it would be possible to see if someone was coming or going from their residence or place of work. Though people voluntarily disclose their whereabouts everyday with location tracking software activated on their cell phones, that is a choice a person can toggle on or off. It’s a greater privacy concern if peoples’ identity and location are tracked and revealed without their awareness.
Eileen Martin, an assistant professor of mathematics at Virginia Tech, specializes in developing algorithms to work with data extracted from communications networks and had been working with Junzhu Shen, a graduate student in geosciences at Penn State, and his advisor, Tieyuan Zhu, an assistant professor at Penn’s College of Earth and Mineral Sciences.
Martin, a Luther and Alice Hamlett Junior Faculty Fellow, knew the detection of human footsteps in the data was a big problem, partly due to the amount of data being gathered. The 2,200 sensor locations pulled in more than 500 data points per second per location. That added up to logging a terabyte of data every three days, making it impossible to manually comb through the data to remove the footstep signals.
“I don’t think people would be comfortable having their footsteps or their car driving over their driveway recorded as they come and go,” Martin said. “Even if it would be hard to figure out who it is for sure, it’s still kind of creepy.”
Virginia Tech’s CMDA program – a combination of computer science, mathematics, and statistics – has a tradition for its fourth-year students: In their capstone course, teams of CMDA students tackle real-world problems submitted by colleagues, corporations, or government agencies.
When it was time for CMDA capstone projects in fall 2019, Martin brought the footsteps-detection problem to Mark Embree, director of the CMDA program, and he agreed it would be a good capstone exploration. The team of then seniors Li, Srikanth Jakkampudi (now working at Expedition Technology), and Ayush Dev (now working at Octo Consulting) tackled the issue of figuring out how to remove the footstep data with deep learning.
Fred Faltin, an associate professor of practice in the Department of Statistics who orchestrates the CMDA capstone projects, said the footsteps privacy conundrum involved “an incredibly hard problem at its roots.” And like all CMDA capstone projects, this was not a theoretical problem for which faculty knew the solution.
“We give [the students] a problem and we say, ‘Nobody knows how best to deal with this,’ and turn them loose, let them be creative, and let them go out and try to fix the world, and great things can happen,” Faitlin said.
That’s when the CMDA students’ rigorous cross-disciplinary data science skills came into play.
Martin and Jakkampudi got funding from the Luther and Alice Hamlett Undergraduate Research Program for Jakkampudi to continue the project through spring 2020 to improve the accuracy and better understand limitations of the model. Jakkampudi led a journal paper published late last year along with Martin, Shen, Li, Dev, and Zhu in The Leading Edge, a special section on smart-city geophysics.
Though the fiber optic array is now used on a college campus, Martin said West Coast cities one day could use the sensor networks for early detection of earthquakes with fiber-optic sensors buried under public streets and neighborhoods. But, of course, not until the footsteps data are removed or de-identified in some way.
The student team showed it was possible. They were able to see the data-represented footsteps when two people crossed paths and continued in different directions. At one point, Shen, Zhu, and Martin detected a different sound pattern in the data and determined it was a concert by the alternative band Grouplove.
“You can actually play back the vibrations and hear the bassline of the music,” Martin said.
Jakkampudi, a 2020 CMDA graduate who now works on neural network projects for a government contractor, said the challenge came with making the fiber optic array data public.
“If you identify where the footsteps are, it’s not enough to just take them out,” he said. “You need to take them out in a way that it’s not obvious. Otherwise, instead of looking for the presence of footsteps, a bad actor could look for the absence of footsteps.”
Analyses of a wider variety of signals by Shen, Zhu, and Martin was also recently published in Solid Earth.
Currently, undergraduate researchers in Martin’s lab are using these FORESEE data to develop algorithms for automatically detecting the effect of other noises on underground imaging, and to develop software leading up to a similar experiment to monitor permafrost thaw in Alaska later this year.