Scientists from the Qingdao Institute of Bioenergy and Bioprocess Technology (QIBEBT), Chinese Academy of Sciences (CAS), developed a way to objectively evaluate the novelty and impact of plethora of microbiomes in the vast universe of microbiome big-data, based on an innovative tool called Microbiome Search Engine (MSE). These inventions, published in mBio, are the compasses guiding mankind's exploration in the vast universe of microbiome big-data.
Microbiomes, microbial societies that colonize almost every corner of our planet, are pivotal to human health, indoor environment, air, soil, as well as the ocean, and shape these ecosystems' past, today and destiny.
To unravel their secret in benefiting our body and biosphere, a series of large, globally coordinated microbiome sequencing projects have been launched since 2010, such as the Earth Microbiome Project (EMP) and the Human Microbiome Project (HMP). These have led to an ongoing explosion of microbiome sequences (the metagenome data), which describe the structure and function of these microbial societies.
Despite the immense volume of these data, few computational approaches are available to process and integrate them. In particular, it is difficult to relate a new microbiome sample to the huge number of existing microbiome samples.
"MSE to microbiome big-data is like Google or Baidu to webpage big-data. By searching for the most structurally or functionally similar microbiomes in a super-fast manner, MSE offers the first opportunity to relate each microbiome ever published to the microbiome big-data known to mankind so far," said SU Xiaoquan, Lead of the Bioinformatics Group at Single-Cell Center, QIBEBT.
In databases of 100 thousand to 1 million microbiomes, MSE is up to three orders of magnitude faster in searching for the closest neighbors of a microbiome in terms of structure, compared with existing strategies (pairwise comparisons).
"MSE makes comparison of microbiome at the global scale possible, enabling a bird's eye view of microbiome data universe," said SU Xiaoquan.
Taking advantage of MSE, a search-based approach for in-depth mining of microbiome big-data was established. Two innovative evaluation indices including Microbiome Novelty Score (MNS) and Microbiome Attention Score (MAS) were proposed.
MNS evaluates the compositional uniqueness of a microbiome sample at the time of its birth. MAS quantifies the scientific attention devoted to the microbiome by counting the number of close neighbors of the microbiome. Microbiome Focus Index, or MFI, which is derived from MNS and MAS, can measure the impact and contribution of a microbiome sample to mankind's exploration for novel microbiomes.
"Microbiome samples with extraordinary MFI are samples that were born with high novelty and then attracted a lot of follow-up scientific investigation," said XU Jian, director of the Single-Cell Center, QIEBET.
"Therefore, MNS, MAS and MFI serve as one objective way to measure the novelty and impact of a sample, a project, a scientist or a research area; these so called 'alt-metrics', which are based on the 'data' themselves, are fundamentally different from the conventional ways of assessing research impact such as the citation numbers or the Impact Factor, which are subject to human judgments and thus can be biased or skewed."
Using MSE, the team predicts the "sleeping beauty" microbiomes, i.e., published microbiome samples that are still very novel in structure at present yet are destined to attract a lot of scientific attention in the next several years, based on temporal growth of their MAS.
These "sleeping beauties" are mainly from marine environments and mother-baby interactions. Thus, data mining, made possible by MSE, can help the scientific community and the funding agencies decide the research areas with the highest potential in generating high-novelty and high-impact microbiome data.
"We envision that such search against the microbiome database will be an important first step for data analysis at various scales in microbiome studies, just as a BLAST search is essential and universal in sequence analysis studies today," said Rob Knight, Director of Center for Microbiome Innovation, University of California at San Diego.
"This work is of great interest to the microbiome research community and is broadly useful to explore available amplicon datasets," commented by Emiley Eloe-Fadrosh from DOE Joint Genome Institute, who is not related to this study.
As one of the first big-data mining tools introduced by Chinese scientists in the Earth Microbiome Project, MSE will support ongoing mining of the immense datasets being generated by EMP as well as the CAS Microbiome Project.