Public Release: 

Machine learning improves searches in world's largest biomedical literature database


Results sorted by relevance, instead of date, provide an improved experience for users of PubMed, the world's largest biomedical literature database, according to a study publishing August 28 in the open access journal PLOS Biology by Zhiyong Lu and colleagues at the National Library of Medicine (NLM)/National Center for Biotechnology Information (NCBI), which develops and maintains PubMed.

PubMed contains over 28 million article abstracts from the biomedical literature, with an average of two more added every minute. It is an indispensable resource, global in scope, accessed by millions of users every day. From its inception, search results were returned only in reverse chronological order, most recent first, a ranking system that emphasized recency rather than relevance to the search query. In 2013, a relevance ranking system was introduced, but it depended on artificial weighting factors and required continual manual adjustment.

In June 2017, NLM/NCBI staff introduced a machine-learning algorithm which draws on dozens of relevance signals including user responses--specifically, the frequency of click-throughs to the articles returned for a given search--to improve relevance ranking. This ranking system, called Best Match, is offered as an alternative to chronological ordering. The team found that the click-through rate increased 20% on the returned results by Best Match compared to the same results presented chronologically. The overall usage of relevance sorting increased from 7.5% of all searches before the introduction of Best Match to 12% as of April 2018. Since machine-learning systems depend on user input to improve, the increase in use should allow the system to "teach itself" to become more valuable to its users over time.

"Overall, the new Best-Match algorithm shows a significant improvement in finding relevant information over the default time order in PubMed," the authors stated. "We encourage PubMed users to try this new relevance search and provide input to help us continue to improve the ranking method."


Peer-reviewed / Modelling

In your coverage please use this URL to provide access to the freely available article in PLOS Biology:

Citation: Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V, et al. (2018) Best Match: New relevance search for PubMed. PLoS Biol 16(8): e2005343.

Image Caption: NCBI staff introduced a machine-learning algorithm which draws on user intelligence to improve relevance ranking.

Image Credit: Markus Spiske on Unsplash

Funding: The author(s) received no specific funding for this work.

Competing Interests: The authors have declared that no competing interests exist.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.