Public Release:  Reproducible research for biofuels and biogas

Biofuels and biogas bioinformatics: Virtual containers unload a wealth of resources to tackle climate change


New research in the Open Access journal GigaScience presents a virtual package of data for biogas production, made reusable in a containerized form to allow scientists to better understand the production of biofuels.

One of the promising areas in biofuels development is biogas, which has huge potential as a renewable and clean source of energy. Biogas is the production of methane gas through the anaerobic digestion (fermentation) of organic matter such as agricultural or food waste. Detailed knowledge on the functioning of the fermentation process is key for optimizing this process; however, the vast majority of the microbes involved remain unknown and cannot be cultivated in laboratories.

In new research just published in the Open Access journal GigaScience, researchers from Bielefeld University in Germany have now characterized the complex communities of micro-organisms in a biogas plant that generates heat and power from maize silage and pig manure. Further, the authors took an unusual step to make their research more reproducible by creating a virtual 'container' of their data and tools.

For their study, the researchers carried out metagenomic and meta-transcriptomic analyses, which resulted in the generation of DNA and RNA sequences from the thousands of microbial species present. From this they were able to create a catalogue of 250,000 genes that enabled the researchers to begin defining the underlying biology of methane production. While this data production only scratches the surface of the vast amount of information gathered, the authors furthered the usefulness of this resource by releasing all of the data and computational methods as a shareable container. These containers enable others, at the press of a few buttons, to execute the same analyses in the cloud. This not only makes the research reproducible, but also allows researchers around the world to build on these resources to more rapidly delineate the important processes involved in biogas generation and to better explore its use for biofuel.

As experiments become more data-intensive, reviewing and publishing the methods and results of scientific studies become increasingly challenging. To get around this, the authors used the rapidly emerging Docker platform, which effectively wraps software in a system that includes everything needed to rerun it. This removes the need for other researchers to install and maintain the many complex bioinformatics tools and software libraries: something that can be very technically challenging for researchers without the computational resources and skills.

"We decided to use virtualisation techniques to encapsulate our analysis workflow and make it basically independent from the host it is executed on" says Andreas Bremges, first author of the study. Peter Belmann built the Docker container for the biogas study, and is a core team member of the bioboxes project to standardize interchangeable bioinformatics software containers.

"The reproducibility of published research is an important aspect of science," highlights Peter Li, Lead Data Manager at GigaScience, who undertook the step of exactly recreating the results in the paper, which is extremely unusual in any other scientific publication. "Andreas and his colleagues provided a Docker container that encapsulated the method used to process the data from their biogas study. This made my job of checking the reproducibility of their results much easier as their Docker container took care of installing the bioinformatics tools and their dependencies on my cloud server".

The use of Docker in this "container" publication is a step towards moving publishing away from static and often un-reproducible papers --which have changed little since the 17th century-- to more reproducible digital objects that better fit 21st century technology.



1. Bremges, A., Maus, I., Belmann, P., Eikmeyer, F., Winkler, A., Albersmeier, A., Puhler, A., Schluter, A., Sczyrba, A.: (2015) Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant. GigaScience 4:33 doi:10.1186/s13742-015-0073-6

Docker accessible version of the study:

2. Bremges, A., Maus, I., Belmann, P., Eikmeyer, F., Winkler, A., Albersmeier, A., Puhler, A., Schluter, A., Sczyrba, A.: Supporting data and materials for "Deeply sequenced metagenome and metatranscriptome of a biogas-producing microbial community from an agricultural production-scale biogas plant". GigaScience

Database (2015).

Notes to News Writers:

1. GigaScience is co-published by BGI, the world's largest genomics organization, and BioMed Central, the world's largest open-access publisher. The journal covers research that uses or produces 'big data' from the full spectrum of the life sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format -- one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB , as well as in their publicly available repositories. GigaScience will provide users access to associated online tools, containers and workflows, and has integrated a data analysis platform, maximizing the potential utility and re-use of data. (Follow us on twitter @GigaScience; Facebook, and keep up-to-date with our blog

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.