BUFFALO, N.Y. -- Scientists are using machine learning to identify important sequences of DNA within the mosquito genome that regulate how the insect's cells develop and behave.
The research project, funded by the National Institutes of Health (NIH), could have implications for disease control, potentially facilitating efforts to use genetic engineering to control mosquito populations, or to create mosquitoes that have reduced ability to transmit maladies, such as malaria, to humans.
"Our work will break new ground in the field of mosquito genomics and genetics," says Marc Halfon, PhD, professor of biochemistry in the Jacobs School of Medicine and Biomedical Sciences at the University at Buffalo. "Mosquitoes are responsible for hundreds of thousands of deaths each year. Although we know the sequence of the mosquito genome, we have little functional information about what much of that genome sequence does.
"Our work will take important steps toward filling in this crucial missing information. It will demonstrate our ability to functionally annotate the regulatory elements within genomes of various insect disease vectors without requiring extensive -- and expensive -- new genome-scale experimental data for each."
The project is funded by a $449,000 grant from the National Institute of Allergy and Infectious Diseases. It focuses on Anopheles gambiae, an important vector for malaria transmission.
Using machine learning to interpret the mosquito genome
Within the genome of every plant and animal, there are regulatory switches -- strings of DNA that control the behavior of genes, dictating when and where in the body different genes are turned on and off.
These regulatory sequences matter because they can affect a species' mating success and resistance to insecticides, Halfon says. In addition, regulatory mechanisms are crucial to genetic engineering of mosquitoes, in which researchers seek to control the expression of foreign or mutated genes introduced in a target animal.
For over a decade, Halfon has worked with UB's Center for Computational Research to build a database called REDfly that contains more than 5,600 regulatory sequences for a different insect species, the fruit fly Drosophila melanogaster. Now, his team is leveraging this trove of information to learn more about regulatory mechanisms within the mosquito genome.
With Saurabh Sinha, a computer scientist at the University of Illinois at Urbana-Champaign, Halfon developed a software called SCRMshaw that learns the regulatory sequences within REDfly, then searches the genomes of other insects for strings of DNA with similarities. The software has successfully identified regulatory sequences in mosquitoes that look nothing like Drosophila sequences to the human eye, but that possess similar traits (such as containing a related assortment of short 3- to 6- letter DNA subsequences).
"Finding regulatory elements is hard -- traditionally, it has been done by tedious experimental work that examines one gene at a time," Halfon says. "We wanted to know how you can do this faster: Just by looking at a DNA sequence, can you tell where the regulatory elements are? In at least some cases, the answer appears to be, 'Yes'."
Early implementation of SCRMshaw
Using SCRMshaw in mosquitoes, Halfon, Sinha and colleagues were able to identify some of the regulatory sequences that may cause the activity of a network of genes to shift from the midline of the ventral nerve cord -- analogous to the human spinal cord -- to the lateral regions during the formation of the embryo of the mosquito Aedes aegypti, which transmits Zika, dengue fever and chikungunya.
This work, published online June 21 in the journal Developmental Biology, highlights how SCRMshaw can pinpoint regulatory sequences in non-Drosophila species.
"It shows how we can use SCRMshaw to address interesting biological questions of development and evolution," Halfon says.
The next step is to use the new NIH funding to conduct extensive discovery of regulatory elements within Anopheles gambiae.
"We will focus on trying to identify regulatory sequences most useful for understanding aspects of mosquito biology that are relevant to its role as a disease vector -- for instance, development of the salivary glands or the midgut, or olfaction -- or that could be useful for biocontrol methods, such as genes affecting reproduction," Halfon says. "Once we have generated a high-confidence set of regulatory element predictions, we will test them in transgenic mosquitoes."
The new NIH project is a collaboration between UB and the University of Maryland. The effort will be bolstered by continued development of the REDfly database, which is supported by a $1.2 million grant from the National Institute of General Medical Sciences, part of the NIH, and a $447,000 grant from the National Science Foundation.
Founded in 1846, the Jacobs School of Medicine and Biomedical Sciences at the University at Buffalo is beginning a new chapter in its history with the largest medical education building under construction in the nation. The eight-story, 628,000-square-foot facility is scheduled to open in 2017. The new location puts superior medical education, clinical care and pioneering research in close proximity, anchoring Buffalo's evolving comprehensive academic health center in a vibrant downtown setting. These new facilities will better enable the school to advance health and wellness across the life span for the people of New York and the world through research, clinical care and the education of tomorrow's leaders in health care and biomedical sciences. The school's faculty and residents provide care for the community's diverse populations through strong clinical partnerships and the school's practice plan, UBMD Physicians' Group.