The need for information from research outputs to be more findable, accessible, interoperable, and reusable (FAIR) has spurred researchers, database managers, and publishers to continually look for new and better ways to make information machine-readable. Another equally important area is creating articles that readers can actively engage with, rather than passively taking in information from reading a published article. One tool that easily improves machine readability of data is a data standard called Frictionless Data, developed by the Open Knowledge Foundation. Published today in the Open Science journal GigaByte revealed that not only does Frictionless Data drastically improve machine readability, but that it can also turn normally static figures within the article into dynamic entities that allow readers to directly interact with the data within the article. Demonstrating that the use of Frictionless Data can tackle two important activities: allowing both man and machine to use and directly engage with scientific outputs in a dynamic fashion.
Integration of Frictionless Data was carried out on an article by a team of researchers from the University of Melbourne in Australia, led by Professor Anthony Papenfuss, whose lab have been long time advocates of open and reproducible research. Making sure the data, source code, and every other sharable component of their research is openly available to the community. This makes their work especially amenable to utilising new tools on top of their articles to make the published work dynamic and actively usable. The article here presents two new open source tools, svaRetro and svaNUMT, for interpreting difficult to structural variation in genome analysis. These help annotate novel genomic events that are missed in most genome assembly pipelines: such as retrotransposition events and insertion of DNA fragments from the mitochondria to the nuclear DNA, which contribute to the complexity of genome sequences and the understanding of gene function and genome evolution.
The openness and availability of all of the research components behind these tools and analyses created a perfect opportunity to implement Frictionless Data to make the article far more machine readable. During the process of adding this to the article, Raniere Silva from City University of Hong Kong, as part of a FAIR data internship, made the fortuitous discovery that Frictionless Data could also play a role in improving human interaction with the article. The figures, for the first time, were regenerated in an interactive manner. In the example here, readers can not only view the summary information presented in the figure, they can hover over data points to see the exact numbers and information behind these, and also manipulate the figure itself to view specific components that are of interest.
Silva says: “My biggest surprise was that the Frictionless Data Package specifications in conjunction with the popular Plotly tool has functions to convert a static visualisation into a dynamic one. This massively reduces the barrier for many researchers to produce dynamic data visualisation as they only need to add a line or two to their code. GigaByte made a huge leap by publishing the dynamic data visualisation and I hope it inspires other journals to publish dynamic data visualisation.”
When asked what they found most useful from this process, the authors stated: “The interactive figures are a great addition to the paper. We found the interactive functions made reading labels easier, especially for label-rich figures, and liked that the figures were accessible in SVG format, allowing viewing and editing without losing information from the figures.”
To promote the use of Frictionless Data in more published articles, Silva wrote a detailed handbook that includes an introduction to the use of Frictionless Data, an introduction to the specifications, short working examples for creating an author’s own data package, and long examples, based on published articles in GigaScience and GigaByte journals, illustrating the creation and use of Frictionless Data. The goal is for the handbook to serve as the start of a conversation within the scientific community of how to embrace Frictionless Data. This handbook also provides a resource and guidance to make things easier and for data producers to submit articles with these packages to data publishers, such as GigaScience Press.
Of added interest, in addition to the inclusion of Frictionless Data, paper is that for the first time as the figures were regenerated in an interactive manner this process combined a CODECHECK certificate of reproducible computation.
The use of Frictionless Data and all the downstream elements it enables, serves as transformative steps in scientific publishing, as they improve machine readability and reproducibility, and turn scientific articles from their old-fashioned static format into a 21st century living document. These types of novel, data-literate additions to the publication process are part of the reason GigaByte was the winner of the 2022 ALSPS Innovation in Publishing Award presented this month.
To encourage more authors to use Frictionless Data in their articles, all manuscripts submitted before the end of 2022 that include Frictionless Data examples will be given a free APC (normally $350). Authors interested in finding out how to do this should contact the editors at GigaByte at firstname.lastname@example.org.
Dong R, Cameron D, Bedo J, Papenfuss AT. (2022). svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data. GigaByte. 2022. https://doi.org/10.46471/gigabyte.70
Dong R, Cameron D, Bedo J, Papenfuss AT. (2022). Data and scripts for the manuscript of svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7053649
Raniere Gaia Costa da Silva. (2022) From Frictionless Data to Interactive Visualisation. GigaBlog
Raniere Gaia Costa da Silva. (2022). Frictionless Data Handbook for Researchers. http://dx.doi.org/10.5524/102316
Raniere Gaia Costa da Silva. (2022). CODECHECK Certificate 2022-018. Zenodo. https://doi.org/10.5281/zenodo.7030414
Scott Edmunds, Scott@gigasciencejournal.com, Office: +852 3610 3531 Cell: +852 92490853
Sharing on social media?
About GigaScience Press
GigaScience Press is BGI's Open Access Publishing division, which publishes scientific journals and data. Its publishing projects are carried out with international publishing partners and infrastructure providers, including Oxford University Press and River Valley Technologies. It currently publishes two data-centric award-winning journals: its premier journal GigaScience (launched 2012) and its new journal GigaByte (launched 2020). The press also publishes data, software, and other research objects via its GigaDB.org database. GigaScience won the 2018 PROSE award for Innovation in Journal Publishing. GigaByte was just announced as the 2022 winner of the ALPSP Innovation in Publishing Award. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to all GigaScience Press journals that all supporting data and source code be made available in GigaDB or in a community approved, publicly available repository. See GigaSciencePress.com
GigaByte provides a way to rapidly and cost-effectively share research, making the scientific process more inclusive and accessible to the broader community. It uses an exclusively XML-based publishing system that automates the production process and makes it effortless to change views, languages and embed interactive content. Enabling readers to directly interact with the underlying data and software allows immediate use of published research, improves reproducibility, and increases trust. Upon acceptance this system converts manuscripts to online - and PDF-ready articles within hours with minimal human intervention, dramatically reducing production time and cost to provide an equitable solution to publish open science. https://gigabytejournal.com
Method of Research
Subject of Research
svaRetro and svaNUMT: modular packages for annotating retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data.
Article Publication Date
The authors declare that they have no competing interests.