Kadir Dede and Dr. Enno Ohlebusch at Ulm University in Germany have devised a method for constructing pan-genome subgraphs at different granularities without having to wait hours and days on end for the software to process the entire genome. Scientists will now be able to create visualizations of pan-genomes on different scales much more rapidly.
The research article "Dynamic construction in pan-genome structures", was published in De Gruyter's open access journal Open Computer Science.
In order to analyze specific parts of a genome, scientists must be able to "see" the parts they are investigating, and this requires a large amount of processing power and time. The Computational Pan-Genomics Consortium encourages researchers to ensure that all information within a data structure is easily accessible for human eyes by visualization support on different scales. However, a pan-genome graph can have thousands to millions of nodes, which are not very easy for human eyes to visualize.
In an experiment, Dede and Ohlebusch used 10 human genomes and computed a graph that contains part of the large repetitive central exon of the human MUC5AC gene. Formerly, researchers had to create an entire index structure of the genomes, which takes about 8.5 hours and requires 38.5 GB of memory. Using the method developed by Dede and Ohlebusch, the researcher simply has to compute two bit-vectors (on which the construction of the subgraph is based) and the subgraph ? containing the reference path and its ?-neighborhood.
Instead of over eight hours, the software constructed the subgraph (including the computation of the bitvectors, which requires about 10 minutes) in only 24.5 minutes and required 39.6 GB of main memory; the subgraph itself required merely 15 KB of memory.
"Based on solid theory, Dede and Ohlebusch present a new method for the flexible and efficient exploration of suspicious genomic regions, highlighting for example pathogenic genes that distinguish new variants of a virus from all previously known genomes," said Prof. Dr. Jens Stoye, head of the Genome Informatics team, Faculty of Technology, Bielefeld University.
The open access paper can be found here: https://doi.org/10.1515/comp-2020-0018
Open Computer Science