Critical infrastructure in the United States is increasingly interdependent and interconnected.
A natural gas pipeline, for example, might supply fuel to residential customers as well as a power plant. That power plant, in turn, might provide electricity for the grid, which powers a water treatment facility.
In the wake of a disaster, damage to that pipeline might impact residential households, utility operations, and commercial businesses. The effects of those outages on vital industries ranging from energy to medical supplies can ripple across the entire country.
As emergency managers work to prepare communities for natural or human-made disasters, understanding how critical infrastructure interconnects is key for maintaining the availability of vital goods and services.
But cataloguing all that critical infrastructure is difficult and time consuming. For instance, there are more than 50,000 privately owned water utilities operating in the United States. Each utility has its own interconnected infrastructure consisting of pipelines, pumping stations, towers and tanks. And much of that infrastructure is nondescript, located underground or unnoticed to the average citizen.
Now, researchers at Idaho National Laboratory are using machine learning to teach computers to recognize critical infrastructure from satellite imagery. The three-year project is supported by INL's Laboratory Directed Research and Development funding program.
"The goal is to build a machine learning model that can look at a piece of satellite imagery and say, 'Oh, that's a wastewater treatment plant,' or 'Oh, that's a power plant,'" said Shiloh Elliott, a data scientist at INL.
"It could help a FEMA controller direct resources in a natural disaster, such as protecting a water treatment plant during a wildfire," Elliott continued.
Or it could help investigators discern the impacts of an infrastructure shutdown following a cyberattack.
How to train a model
To train the unsupervised learning model to recognize a certain type of infrastructure from a satellite image, the researchers must give the model known examples.
"Machine learning models take a tremendous amount of data to train and run," Elliott said. "We have a bunch of images that we know are certain types of facilities - airports and water treatment plants, for example. We tell the program, 'OK we're going to train you now,' and we feed those images into the computer. If you give a computer known images of a water treatment plant, it eventually learns to identify the characteristics of a water treatment plant."
The model breaks each image down into regions that are assigned a number based on their attributes. That numerical representation is then compared with other data from known images of facilities or features such as water tanks.
Elliott and her colleagues use two data sets to inform the model. One set comes from the All Hazards Analysis - a propriety tool developed at INL for the Department of Homeland Security that helps emergency managers anticipate the effects of critical infrastructure dependencies and respond quickly after a disaster. The other set comes from the Intelligence Advanced Research Projects Activity (I-ARPA), a research effort within the Office of the Director of National Intelligence that works to solve challenges for the U.S. intelligence community.
"With I-ARPA's data, we can train our model and test on the All Hazards Analysis data set and vice versa," Elliott said.
Looking inside the 'black box'
One quirk of most unsupervised learning technologies is the "black box." Once a computer model identifies an image, there's typically no way for the operator to know how the model made that decision.
"If the model doesn't show its work - if you can't show that it's a water treatment plant - people won't trust the model," Elliott said.
To document how the model identifies infrastructure, the INL team is collaborating with the University of Washington to incorporate Local Interpretable Model-agnostic Explanations (LIME) into the modeling software.
"LIME explains the black box," Elliott said. "We're hoping that any models that come out of this research have that trust factor."
All Hazards Analysis
As the satellite imagery recognition model develops, it may one day be integrated with the lab's existing All Hazards Analysis technology.
With All Hazards Analysis, managers can map and model the effects of natural and human-made incidents before a disaster strikes, enabling effective mitigation planning or, in the wake of a disaster, respond more effectively.
But, emergency managers need the best information possible in order to make their decisions.
The ability to recognize infrastructure from satellite images is one potential source of that information. The image recognition technology also has important research and development implications for other industries.
"We've already developed a model that's capable of saying a certain facility exists," Elliott said. "The next step is identifying specific features of a plant. It's a complicated problem, but we are making strides."