An international team of scientists has collated all known bacterial genomes from the human gut microbiome into a single large database. Their work, published in Nature Biotechnology, will allow researchers to explore the links between bacterial genes and proteins, and their effects on human health.
This project was led by EMBL's European Bioinformatics Institute (EMBL-EBI) and included collaborators from the Wellcome Sanger Institute, the University of Trento, the Gladstone Institutes, and the US Department of Energy Joint Genome Institute.
More microbes than human cells
Bacteria coat the human body, inside and out. They produce proteins that affect our digestion, our health, and our susceptibility to diseases. They are so prevalent that the body is estimated to contain more cells in its microbiome - the bacteria, fungi, and other microbes - than it has human cells.
To understand the role that bacterial species play in human biology, scientists usually isolate and culture them in the lab before they sequence their DNA. However, many bacteria thrive in conditions that are not yet reproducible in a laboratory setting.
To obtain information on such species, researchers take another approach: they collect a single sample from the environment - in this case, the human gut - and sequence the DNA from the whole sample. They then use computational methods to reconstruct the individual genomes of thousands of species from that single sample. This method, called metagenomics, offers a powerful alternative to isolating and sequencing the DNA of individual species.
Biodiversity in the human gut
"Last year, three independent teams, including ours, reconstructed thousands of gut microbiome genomes. The big questions were whether these teams had comparable results, and whether we could pool them into a comprehensive inventory," says Rob Finn, Team Leader at EMBL-EBI.
The scientists have now compiled 200 000 genomes and 170 million protein sequences from more than 4 600 bacterial species in the human gut. Their new databases, the Unified Human Gastrointestinal Genome collection and the Unified Gastrointestinal Protein catalogue, reveal the tremendous diversity in our guts and pave the way for further microbiome research.
"This immense catalogue is a landmark in microbiome research, and will be an invaluable resource for scientists to start studying and hopefully understanding the role of each bacterial species in the human gut ecosystem," explains Nicola Segata, Principal Investigator at the University of Trento.
The project revealed that more than 70% of the detected bacterial species had never been cultured in the lab - their activity in the body remains unknown. The largest group of bacteria that falls into that category is the Comantemales, an order of gut bacteria first described in 2019 in a study led by the Bork Group at EMBL Heidelberg.
"It was a real surprise to see how widespread the Comantemales are. This highlights how little we know about the bacteria in our gut," explains Alexandre Almeida, EMBL-EBI/Sanger Postdoctoral Fellow in the Finn Team. "We hope our catalogue will help bioinformaticians and microbiologists bridge that knowledge gap in the coming years."
A freely accessible data resource
All the data collected in the Unified Human Gastrointestinal Genome collection and the Unified Human Gastrointestinal Protein catalogue are freely available in MGnify, an EMBL-EBI online resource that allows scientists to analyse their microbial genomic data and make comparisons with existing datasets.
The project already has a number of users in the scientific community. As new datasets emerge from research teams around the world, the catalogue might expand to include the microbiomes of other body parts, like the skin or inside the mouth.
"This catalogue provides a very rich source of information for microbiologists and clinicians. However, we will likely discover many more novel bacterial species in under-represented geographical areas like South America, Asia, and Africa. We still don't know much about the variation in bacterial diversity across different human populations," explains Almeida.
European Bioinformatics Institute (EMBL-EBI)
The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, analysis and dissemination of large biological datasets. We help scientists realise the potential of 'big data' by enhancing their ability to exploit complex information to make discoveries that benefit humankind.
We are at the forefront of computational biology research, with work spanning sequence analysis methods, multi-dimensional statistical analysis and data-driven biological discovery, from plant biology to mammalian development and disease.
We are part of EMBL and are located on the Wellcome Genome Campus, one of the world's largest concentrations of scientific and technical expertise in genomics.