Public Release: 

'Predicting' the origins of mysterious outbreaks using viral RNA

American Association for the Advancement of Science

Researchers have used machine learning to develop a model capable of predicting hosts and vectors of otherwise mysterious viral infections. The approach's ability to evaluate these viral features is much faster than existing processes, which can require years, stalling responses to emerging infectious diseases. Over 200 species of RNA viruses are known to be capable of infecting humans. Notably, a few new viral species are discovered each year. Outbreaks of infectious diseases caused by unknown viruses have the potential to rapidly spread and become serious public health crises. Understanding these virus's natural hosts - the animals from which they originated, like rodents - is important. Also, critical to identifying the populations at greatest risk, and to considering effective responses, is understanding these virus's vectors - or ways in which they are transmitted to humans, such as through the bite of an infected flea. However, identifying these features of some pathogens can require many years of field and laboratory studies, greatly limiting rapid control and prevention efforts, particularly in emergency conditions. While an understanding of the biology of an unknown virus may remain obscure for years, its genome can be obtained quickly. Here, Simon Babayan and colleagues assembled a dataset containing the genome sequences of over 500 single-stranded RNA viruses and, leveraging machine learning algorithms, used the dataset to create a model that can predict the animal reservoirs and arthropod vectors of mysterious pathogens directly from viral genome sequences. The authors demonstrate their approach's capabilities by identifying a potential cloven-hooved mammal host and a midge-borne vector for the ill-understood Bas-Congo virus. In a related Perspective, Mark Woolhouse discusses the limitations of Mimica et al.'s model, but also notes that the study "is a valuable step forward and hopefully presages further advances in our ability to extract information of public health value directly from virus genome sequences."


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.