Preprinting, the sharing of freely available manuscripts prior to peer-review, has been on the rise in the biosciences since 2013 and experienced a surge during the COVID-19 pandemic, expediting the dissemination of timely research. But how do preprints relate to the final peer-reviewed papers? Two new studies publishing in the open access journal PLOS Biology February 1st took different approaches to explore how preprints posted on bioRxiv and medRxiv compare with their published versions.
One study, led by Dr. Jonathon Coates of Queen Mary University of London, manually compared over 180 preprints to their published versions in the first 4 months of the COVID-19 pandemic. The other study, led by Mr. David Nicholson of University of Pennsylvania’s Perelman School of Medicine, used machine learning and textual analytics to explore the relationships between nearly 18,000 bioRxiv preprints and their published version.
Concerns over the quality of preprints have existed since the emergence of preprinting in the sciences. As Coates notes, “Approximately 40% of the early COVID-19 research was first shared as a preprint and these were used in policy and public health decisions. Therefore, knowing the quality of these preprints is vital in having trust in science at a time when many are attempting to erode that trust”. Analysis of public scientific preprint repositories also has the potential to illuminate many previously hidden details of the peer-review process.
Coates and his colleagues compared all the COVID-19 preprints posted and published within the first 4 months of the pandemic and found that over 83% of COVID and 93% of non-COVID-related life sciences articles do not change from their preprint to final published versions.
Comparing the entire bioRxiv corpus to eventually published versions, Nicholson and colleagues found that many differences appear to occur from typesetting and the addition of supplementary materials; there were only modest changes in the linguistic characteristics of most manuscripts during the peer-review and publication process.
Furthermore, Nicholson and their team created a website that uses their machine learning tool to recommend potential journals that publish linguistically similar articles that can be found at https://greenelab.github.io/preprint-similarity-search/.
Dr. Casey Greene of the University of Colorado School of Medicine, a co-author on the Nicholson et al. study, adds, “Collectively, our studies both provide evidence supporting the reliability and use of preprints both during a global pandemic and for general scientific outputs. Examining preprint-publication pairs provides an opportunity to study the process of peer review and taken together our results should provoke a rethinking of the role and prominence of peer-review in the current publication system.”
Coates adds, “With such a large proportion of early COVID-19 literature shared as non-peer reviewed preprints it is essential to know if those studies are reliable or not. By manually comparing the preprints to their peer reviewed, published, versions we show that over 83% of COVID-19 and 93% of non-COVID preprints are reliable and trustworthy.”
In your coverage, please use this URL to provide access to the freely available papers in PLOS Biology: http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001285 http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001470
Citation 1: Brierley L, Nanni F, Polka JK, Dey G, Pálfy M, Fraser N, et al. (2022) Tracking changes between preprint posting and journal publication during a pandemic. PLoS Biol 20(2): e3001285. https://doi.org/10.1371/journal.pbio.3001285
Citation 2: Nicholson DN, Rubinetti V, Hu D, Thielk M, Hunter LE, Greene CS (2022) Examining linguistic shifts between preprints and publications. PLoS Biol 20(2): e3001470. https://doi.org/10.1371/journal.pbio.3001470
Author Countries: United Kingdom, United States, Germany
Funding 1: NF acknowledges funding from the German Federal Ministry for Education and Research, grant numbers 01PU17005B (OASE) and 01PU17011D (QuaMedFo). LB acknowledges funding from a Medical Research Council Skills Development Fellowship award, grant number MR/T027355/1. GD thanks the European Molecular Biology Laboratory for support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Funding 2: This work was supported by grants from the Gordon Betty Moore Foundation (GBMF4552) and the National Institutes of Health’s National Human Genome Research Institute (NHGRI) under award R01 HG010067 to CSG and the National Institutes of Health’s NHGRI under award T32 HG00046 to DNN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Subject of Research
Competing Interests 1: I have read the journal’s policy and the authors of this manuscript have the following competing interests: JP is the executive director of ASAPbio, a non-profit organization promoting the productive use of preprints in the life sciences. GD is a bioRxiv Affiliate, part of a volunteer group of scientists that screen preprints deposited on the bioRxiv server. GD and JAC are contributors to preLights and ASAPbio Fellows. Competing Interests 2: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Marvin Thielk receives a salary from Elsevier Inc. where he contributes NLP expertise to health content operations. Elsevier did not restrict the results or interpretations that could be published in this manuscript.