News Release 5-Oct-2022

Improvements for Man and Machine in Scientific Publishing

Frictionless Data improves not just machine readability of scientific articles, but also enables humans to directly interact with the data within the article itself

Peer-Reviewed Publication

GigaScience

From Frictionless Data to Interactive Visualization — image: A Frictionless Data package is a JSON file enclosing a list of local or remote resources (the data) and the meta-information of the package and each resource (for example, author and license). Drawing represents the overall elements involved in transforming a Frictionless Data Package to Interactive visualization as part of an article in the journal GigaByte. view more

Credit: Raniere Silva

The need for information from research outputs to be more findable, accessible, interoperable, and reusable (FAIR) has spurred researchers, database managers, and publishers to continually look for new and better ways to make information machine-readable. Another equally important area is creating articles that readers can actively engage with, rather than passively taking in information from reading a published article. One tool that easily improves machine readability of data is a data standard called Frictionless Data, developed by the Open Knowledge Foundation. Published today in the Open Science journal GigaByte revealed that not only does Frictionless Data drastically improve machine readability, but that it can also turn normally static figures within the article into dynamic entities that allow readers to directly interact with the data within the article. Demonstrating that the use of Frictionless Data can tackle two important activities: allowing both man and machine to use and directly engage with scientific outputs in a dynamic fashion.

Integration of Frictionless Data was carried out on an article by a team of researchers from the University of Melbourne in Australia, led by Professor Anthony Papenfuss, whose lab have been long time advocates of open and reproducible research. Making sure the data, source code, and every other sharable component of their research is openly available to the community. This makes their work especially amenable to utilising new tools on top of their articles to make the published work dynamic and actively usable. The article here presents two new open source tools, svaRetro and svaNUMT, for interpreting difficult to structural variation in genome analysis. These help annotate novel genomic events that are missed in most genome assembly pipelines: such as retrotransposition events and insertion of DNA fragments from the mitochondria to the nuclear DNA, which contribute to the complexity of genome sequences and the understanding of gene function and genome evolution.

The openness and availability of all of the research components behind these tools and analyses created a perfect opportunity to implement Frictionless Data to make the article far more machine readable. During the process of adding this to the article, Raniere Silva from City University of Hong Kong, as part of a FAIR data internship, made the fortuitous discovery that Frictionless Data could also play a role in improving human interaction with the article. The figures, for the first time, were regenerated in an interactive manner. In the example here, readers can not only view the summary information presented in the figure, they can hover over data points to see the exact numbers and information behind these, and also manipulate the figure itself to view specific components that are of interest.

Silva says: “My biggest surprise was that the Frictionless Data Package specifications in conjunction with the popular Plotly tool has functions to convert a static visualisation into a dynamic one. This massively reduces the barrier for many researchers to produce dynamic data visualisation as they only need to add a line or two to their code. GigaByte made a huge leap by publishing the dynamic data visualisation and I hope it inspires other journals to publish dynamic data visualisation.”

When asked what they found most useful from this process, the authors stated: “The interactive figures are a great addition to the paper. We found the interactive functions made reading labels easier, especially for label-rich figures, and liked that the figures were accessible in SVG format, allowing viewing and editing without losing information from the figures.”

To promote the use of Frictionless Data in more published articles, Silva wrote a detailed handbook that includes an introduction to the use of Frictionless Data, an introduction to the specifications, short working examples for creating an author’s own data package, and long examples, based on published articles in GigaScience and GigaByte journals, illustrating the creation and use of Frictionless Data. The goal is for the handbook to serve as the start of a conversation within the scientific community of how to embrace Frictionless Data. This handbook also provides a resource and guidance to make things easier and for data producers to submit articles with these packages to data publishers, such as GigaScience Press.

Of added interest, in addition to the inclusion of Frictionless Data, paper is that for the first time as the figures were regenerated in an interactive manner this process combined a CODECHECK certificate of reproducible computation.

The use of Frictionless Data and all the downstream elements it enables, serves as transformative steps in scientific publishing, as they improve machine readability and reproducibility, and turn scientific articles from their old-fashioned static format into a 21^st century living document. These types of novel, data-literate additions to the publication process are part of the reason GigaByte was the winner of the 2022 ALSPS Innovation in Publishing Award presented this month.

To encourage more authors to use Frictionless Data in their articles, all manuscripts submitted before the end of 2022 that include Frictionless Data examples will be given a free APC (normally $350). Authors interested in finding out how to do this should contact the editors at GigaByte at editorial@gigabytejournal.com.

Further Reading:

Dong R, Cameron D, Bedo J, Papenfuss AT. (2022). svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data. GigaByte. 2022. https://doi.org/10.46471/gigabyte.70

Dong R, Cameron D, Bedo J, Papenfuss AT. (2022). Data and scripts for the manuscript of svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7053649

Raniere Gaia Costa da Silva. (2022) From Frictionless Data to Interactive Visualisation. GigaBlog

Raniere Gaia Costa da Silva. (2022). Frictionless Data Handbook for Researchers. http://dx.doi.org/10.5524/102316

Raniere Gaia Costa da Silva. (2022). CODECHECK Certificate 2022-018. Zenodo. https://doi.org/10.5281/zenodo.7030414

Media contacts:

GigaScience Editor-in-Chief:

Scott Edmunds, Scott@gigasciencejournal.com, Office: +852 3610 3531 Cell: +852 92490853

Sharing on social media?

Find GigaScience online on twitter @GigaScience; Facebook https://www.facebook.com/GigaScience, and keep up-to-date with our blog http://gigasciencejournal.com/blog/

About GigaScience Press

GigaScience Press is BGI's Open Access Publishing division, which publishes scientific journals and data. Its publishing projects are carried out with international publishing partners and infrastructure providers, including Oxford University Press and River Valley Technologies. It currently publishes two data-centric award-winning journals: its premier journal GigaScience (launched 2012) and its new journal GigaByte (launched 2020). The press also publishes data, software, and other research objects via its GigaDB.org database. GigaScience won the 2018 PROSE award for Innovation in Journal Publishing. GigaByte was just announced as the 2022 winner of the ALPSP Innovation in Publishing Award. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to all GigaScience Press journals that all supporting data and source code be made available in GigaDB or in a community approved, publicly available repository. See GigaSciencePress.com

About GigaByte:

GigaByte provides a way to rapidly and cost-effectively share research, making the scientific process more inclusive and accessible to the broader community. It uses an exclusively XML-based publishing system that automates the production process and makes it effortless to change views, languages and embed interactive content. Enabling readers to directly interact with the underlying data and software allows immediate use of published research, improves reproducibility, and increases trust. Upon acceptance this system converts manuscripts to online - and PDF-ready articles within hours with minimal human intervention, dramatically reducing production time and cost to provide an equitable solution to publish open science. https://gigabytejournal.com

Journal

Gigabyte

DOI

10.46471/gigabyte.70

Method of Research

Computational simulation/modeling

Subject of Research

Not applicable

Article Title

svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data.

Article Publication Date

30-Sep-2022

COI Statement

The authors declare that they have no competing interests.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.