U.S.Department of Energy Research News
Text-Only | Privacy Policy | Site Map  
Search Releases and Features  
Biological SciencesComputational SciencesEnergy SciencesEnvironmental SciencesPhysical SciencesEngineering and TechnologyNational Security Science

Home
Labs
Multimedia Resources
News Releases
Feature Stories
Library
Contacts
RSS Feed



US Department of Energy National Science Bowl


Back to EurekAlert! A Service of the American Association for the Advancement of Science

 

Algorithms -- A new perspective on data

We live in the age of information. Analysts are among those inundated with data. But with the aid of powerful computing techniques, analysts can make sense of volumes of data that come in many forms--text, numbers, images, video, audio.

Statisticians at Pacific Northwest National Laboratory are marrying computational power with statistical techniques to sift through all these forms of data together. Their work is being applied in a variety of areas, such as analyzing handwriting and identifying bioagents.

Whether clients come in with existing data or PNNL gathers the data, statisticians help uncover hidden information through exploratory analysis, grouping like kinds of information and extracting key features. Using systematic sampling and experimental design techniques, they ensure data are reliable and will support confident decisions.

"We take varying types of information, whether it's text, video or audio and turn it into mathematical representations. Once we have a mathematical representation, we can apply our statistical techniques of clustering and data analysis," said Brent Pulsipher, who manages PNNL's statistical and quantitative sciences group.

PNNL statisticians use clustering algorithms to find groups that share a common feature in some dimension and "cluster" them together. "Many of our algorithms are self-clustering. We don't say 'group these into a certain category that relates to a certain feature.' The algorithms specify categories themselves," said Pulsipher. Identifying these groupings is called "lead generation" because it provides leads that may explain what is causing a problem.

In one project, statisticians are developing algorithms to identify handwriting samples. These algorithms quantify handwriting characteristics, such as density, height and slant. "We use statistical methods to test for similarities and differences between unknown and known handwriting samples," said Kris Jarman, who leads the effort.

In another project, statisticians are using algorithms with a bio-pathogen sensor being developed at the Laboratory called Matrix-Assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS). The algorithms quickly identify the unique features of questionable bacteria and categorize those features in real time according to pathogen type. In lab tests, these algorithms were more than 95 percent accurate in classifying bacteria strains.

###

 

Text-Only | Privacy Policy | Site Map