News Release

New text-mining algorithm to prioritize research on chemicals, disease for public database

A press release from PLOS ONE

Peer-Reviewed Publication

PLOS

A new text-mining algorithm can help identify the most relevant scientific research for a public database that reveals the effects of environmental chemicals on human health, according to research published April 17 in the open access journal PLOS ONE by Allan Peter Davis, Thomas Wiegers and colleagues from North Carolina State University.

The Comparative Toxicogenomics Database (CTD), managed in part by the lead authors, is a manually curated, public database that correlates environmental chemicals with their effects on genes and human health. Thousands of new research papers are published each day, and finding the most relevant ones to include can be challenging. As Davis explains, "Over 33,000 scientific papers have been published on heavy metal toxicity alone, going as far back as 1926. We simply can't read and code them all. And, with the help of this new algorithm, we don't have to."

The algorithm described in the study assigns scientific articles a score based on data content, biological and toxicological relevance and several other parameters. Integrating this algorithm with the current system of manual curation helped the researchers significantly improve their process by prioritizing more relevant articles for inclusion in the database, increasing productivity by 27 percent and novel data content by 100 percent.

Only 15 percent of the papers studied were incorrectly identified by the algorithm as being highly relevant, but the researchers were able to identify the reasons for these inaccurate results. "Now, we can go back and tweak the algorithm to account for this and fine-tune the system," says Wiegers.

"We're not at the point yet where a computer can read and extract all the relevant data on its own," concludes Davis, "but having this text-mining process to direct us toward the most informative articles is a huge first step."

###

(Adapted from materials provided by North Carolina State University)

Note: This research paper is part of the PLOS Text Mining Collection, which includes two other PLOS ONE papers embargoed until April 17, 2013 at 5 pm Eastern time. For more information, please contact onepress@plos.org.

Citation: Davis AP, Wiegers TC, Johnson RJ, Lay JM, Lennon-Hopkins K, et al. (2013) Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database. PLOS ONE 8(4): e58201. doi:10.1371/journal.pone.0058201

Financial Disclosure: This work was supported by the National Institute of Environmental Health Sciences (NIEHS) grants "Comparative Toxicogenomics Database" [grant number R01-ES014065] and "Generation of a centralized and integrated resource for exposure data" [grant number R01-ES019604]. Funding for open access charge: NIEHS grants R01-ES014065 and R01-ES019604. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interest: The authors have declared that no competing interests exist.

PLEASE LINK TO THE SCIENTIFIC ARTICLE IN ONLINE VERSIONS OF YOUR REPORT (URL goes live after the embargo ends): http://dx.plos.org/10.1371/journal.pone.0058201

The PLOS Text Mining Collection will be available at: http://www.ploscollections.org/textmining

Disclaimer: This press release refers to upcoming articles in PLOS ONE. The releases have been provided by the article authors and/or journal staff. Any opinions expressed in these are the personal views of the contributors, and do not necessarily represent the views or policies of PLOS. PLOS expressly disclaims any and all warranties and liability in connection with the information found in the release and article and your use of such information.

About PLOS ONE:

PLOS ONE is the first journal of primary research from all areas of science to employ a combination of peer review and post-publication rating and commenting, to maximize the impact of every report it publishes. PLOS ONE is published by the Public Library of Science (PLOS), the open-access publisher whose goal is to make the world's scientific and medical literature a public resource.

All works published in PLOS ONE are Open Access. Everything is immediately available—to read, download, redistribute, include in databases and otherwise use—without cost to anyone, anywhere, subject only to the condition that the original authors and source are properly attributed. For more information about PLOS ONE relevant to journalists, bloggers and press officers, including details of our press release process and our embargo policy, see the everyONE blog at http://everyone.plos.org/media.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.