News Release

New mathematical model to predict viruses

Machine learning approach may predict the future of pathogen evolution and detection

Peer-Reviewed Publication

University of Southern California

To fight variants of the coronavirus or any other future epidemic, variants need to be identified fairly quickly so vaccines could be modified. But with so many possibilities—it is necessary to get the help of mathematical models like the one USC researcher, Paul Bogdan’s electrical and computer engineering lab has developed along with his student Xiongye Xiao and Caltech colleagues Siddharth Jain and Jehoshua (Shuki) Bruck.  Their work was featured in the COVID-19 collection of Scientific Reports. 

The origin of this work came from Bogdan’s ruminations on nucleotides—the building blocks of DNA. 

“I was thinking how we can learn the ‘algorithm,’ printer or computer program behind a DNA or RNA sequence and how this printer encodes the rules and predicts the next nucleotides it needs to print,” says Bogdan a lead author of the paper and expert on network dynamics and structures. 

Bogdan believes that this is one of the few instances in which engineers have developed a mathematical model that can open analytical paths for deciphering the inner working rules of a virus using concepts from algorithmic information theory. 

Bogdan says this work contrasts the conventional approach of directly comparing genomic sequences. Rather, the team of researchers from USC and Caltech propose a computational, machine learning approach that identifies the generator-that Bogdan likens to a computer program or algorithm -- behind an RNA sequence and performs comparisons between such generators/programs. The generators, says Bogdan, are similar to computer programs as they encode the hidden similarities, dependencies and rules that may exist among RNA nucleotides located far apart from each other along the RNA sequence. This tool can evaluate genomic sequences and mutations based on region or length and the time period in which it may have evolved. It’s like a ancestry.com for viruses.

To demonstrate the efficacy of their technique on real genomic data, they clustered different strains of SARS-CoV-2 viral sequences, characterized their evolution and identified regions of the viral sequence with mutations.  Bogdan indicates that they can get as granular as determining if a singular nucleotide has been changed—when there are as many as 30,000 nucleotides in SARS-Cov-2 alone). 

Bogdan explains that beyond looking at human hosts, this sort of forensic approach could be used to track pathogens that could also affect crops—so that entire industries that could be vulnerable would be protected. 


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.