New model in scientific publishing: GigaScience combines article and data publication

Launch of big-data journal GigaScience establishes a novel approach to publication and data sharing

July 12, 2012, Hong Kong, China – BGI, the world's largest genome sequencing institute, and their publishing partner BioMed Central, a leader in scientific data sharing, announce the launch of a new journal, GigaScience, which publishes large-scale biological research in a unique format. The journal combines standard article publishing with complete data hosting and analysis tools, all of which are open access and freely available.

This launch is a major first step towards revolutionizing the publishing industry with the open access publication of complete, reproducible accounts of all parts of data-intensive scientific research projects. Together GigaScience and its integrated database GigaDB provide scientific analyses, full dataset hosting, and access to the software tools used to conduct these analyses, along with publication of more traditional scientific articles describing the studies.

Having all these together finally allows readers to not only glean the scientific conclusions in the papers, but also to directly test these using the underlying data and analysis tools. By doing this, GigaScience offers a way to help overcome the growing problem of the lack of reproducibility of research. GigaScience publications also include Digital Object Identifier (DOIs) for all datasets in the journal database, GigaDB. This helps make datasets more permanent, as well as fully track-able, discoverable, linkable, and citable, which traditionally has only been possible for journal articles. Citation of data enables scientists, who generate these enormous datasets and share them with the community, to gain more appropriate credit for their contributions to research.

Laurie Goodman, Editor-in-Chief, says, "The full use of large-scale data has sadly lagged far behind our ability to produce it. The leaders of BGI realized they had the ability, given their vast computational resources, to create an innovative new journal format — one where enormous datasets could be fully hosted and directly linked to their original scientific studies. By including analysis tools in a data platform, as well as the planned addition of cloud technology later this year, GigaScience can serve as a means to put such data into the hands of researchers who do not have the vast computational resources required for optimal data use. This is in keeping with the goals of our co-publisher BioMed Central, which makes them the perfect partner in achieving this endeavour."

Exemplifying GigaScience and GigaDB's innovative approach to publishing, in the launch edition, is a research article from Stephan Beck's group at the University College London, UK (pre-release version here: http://goo.gl/2nZgD). This article focuses on ways to conduct whole-genome analyses of DNA methylation, an important mechanism that regulates gene expression. The article contains all of the supporting data and software tools needed to recreate the experiments — a total of 84 GB — freely available for download and reuse from GigaDB. Using BGI's data storage capacity, GigaScience is able to host these and other files, which are far larger than any other journals are able to publish. GigaDB furthermore supports open data by giving up all copyright in published datasets by its use of the Creative Commons CC0 public domain dedication waiver. This allows anyone to access and reuse published data without restrictions.

As well as this innovative, big-data-driven publication format the journal also provides reviews and commentaries that address the many hurdles that still need to be surmounted to improve future big-data handling.

Further highlights from the first issue include:

A compressed future for DNA archiving

Scientists at the EMBL-Bioinformatics Institute in Hinxton, UK, argue that sustainable DNA archiving will depend on the understanding that not all data is created–or preserved–equally. They propose a graded system for storing DNA sequences under differing levels of compression based on ease of reproduction of the data and the availability of DNA samples for resequencing.

Digitizing pathogen surveillance

The rapid development of genomics technology and understanding over the last two decades has laid the groundwork for a major advancement in public health. Researchers at Cold Spring Harbor Laboratories and the University of Maryland claim that the time is right for a sequencing-based pathogen surveillance system. They believe that the biggest hurdle for the system would not be the necessary technology, but rather scientific attitudes towards data sharing.

An ambitious plan to digitally characterize ecological diversity

The Genomic Observatories network plans to digitally characterize whole ecosystems of specific 'research hotspots' with the aim of better enabling predictive modeling of biodiversity dynamics. This article, authored by scientists in the US and UK who are a part of the long-term initiative, delineates how collecting and harnessing such a vast body of genetic variation data would greatly benefit both science and broader society.

Closing the issue, Jonathan Eisen also discusses good omes, badomes — and how to tell the difference.


Notes to Editors

1. BGI, a China-based scientific institution, was founded in 1999 and has since become the largest genomic organization in the world. With a focus on research and applications in the healthcare, agriculture, conservation, and bio-energy fields, BGI has a proven track record of innovative, high profile research, which has generated over 178 publications in top-tier journals such as Nature and Science. BGI's distinguished achievements have made a great contribution to the development of genomics in both China and the world. Their goal is to make leading-edge genomics highly accessible to the global research community by integrating industry's best technology, economies of scale, and expert bioinformatics resources. BGI and its affiliates, BGI Americas and BGI Europe, have established partnerships and collaborations with leading academic and government research institutions, as well as global biotechnology and pharmaceutical companies.

2. GigaScience (http://www.GigaSciencejournal.com) is co-published by BGI, the world's largest genomics institute, and BioMed Central, the world's largest open-access publisher. The journal covers research that uses or produces 'big data' from the full spectrum of the life-sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format — one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB (http://gigadb.org), as well as in their publicly available repositories. GigaScience will provide users access to associated online tools and workflows, and will be integrating a data analysis platform and cloud resources into the database later this year, maximizing the potential utility and re-use of data. (Follow us on twitter @GigaScience; sina-weibo http://weibo.com/GigaSciencejournal, and keep up-to-date on our blogs http://blogs.openaccesscentral.com/blogs/gigablog/feed/entries/rss).

3. BioMed Central (http://www.biomedcentral.com/) is an STM (Science, Technology and Medicine) publisher, which pioneered the open-access publishing model. All peer-reviewed research articles published by BioMed Central are made immediately and freely accessible online, and are licensed to allow redistribution and reuse. BioMed Central is part of Springer Science+Business Media, a leading global publisher in the STM sector.

