News Release

Mind the detection gap: Why publishing needs a multilayered defense against industrial-scale papermills

Advance insights media briefing: pre-publication findings

Reports and Proceedings

Frontiers

Overlap of flags (yellow or red) between the tools

image: 

Overlap of flags (yellow or red) between the tools - 396 of the alerts flagged match.

view more 

Credit: Photo credit: Frontiers

The research question

Combating papermill activity is critical to protecting the integrity of the scientific record. An effective response to papermills — and to upholding research integrity more broadly — requires a multilayered approach:

  • Al screening to detect scalable, pattern-based risks at submission

  • Research integrity expertise to interrogate anomalies, link behaviors, and identify emerging tactics

  • Editorial oversight and peer review to assess scientific validity, coherence, and credibility in context

Human expertise and Al tools are both essential to this effort.

A range of commercial and proprietary tools have been developed to screen submissions for papermill activity. This study focuses on that first checkpoint: detecting suspected papermill submissions before entering the review pipeline. Specifically, Frontiers analyzed the output of three papermill detection tools on more than 37,000 manuscript submissions across six journals, assessing how reliably these tools flag fraudulent behavior.

Methodology and selected statistics in brief

Three leading Al-powered detection systems were benchmarked against the same dataset of submissions. Each flagged markedly different proportions of manuscripts, ranging from roughly 10% to 27%. The spread underscores a fundamental issue for the sector: there is no shared threshold for what constitutes a suspicious submission.

The divergence is most evident in the overlap. Of the 8,649 submissions flagged by at least one tool, just 396 were flagged by all three, meaning there was agreement of only 4.5% about which articles indicated papermill activity. In other words, the tools are largely identifying different manuscripts rather than corroborating the same risks.

Why the detection gap?

The Frontiers' team examined the tools' output in more detail to better understand the poor signal-overlap among the groups of detected articles.

The overall pattern was clear. The tools appear to emphasize different types of signals, with one relying more on author-related indicators and others placing greater weight on content or reference-based signals. This may help explain both the divergence in flagging rates and the low manuscript-level overlap, with each tool capturing a different aspect of papermill risk.

The full report will include additional new findings and insights:

  • Comparison of Al-detected versus human-expert detected papermill submissions

  • Data on different signals used by different detection tools

  • Insight into how and why detection tool sensitivity fluctuates

  • Impact analysis of both negative and false positives

  • Frontiers' cross-industry advice and calls to action

 


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.