New! Sign up for our free email newsletter.
Science News
from research organizations

State-of-the-art text mining technologies for chemistry

Date:
June 21, 2017
Source:
Centro Nacional de Investigaciones Oncológicas (CNIO)
Summary:
The first exhaustive revision of the state-of-the-art methodologies underlying chemical search engines, named entity recognition and text mining systems, has now been published by researchers.
Share:
FULL STORY

In a recent Chemical Reviews article, the Biological Text Mining Unit at the Spanish National Cancer Research Centre (CNIO) together with with researchers at the Center for Applied Medical Research (CIMA), of the University of Navarra, in Pamplona, and the Barcelona Supercomputing Centre (BSC-CNS) have published the first exhaustive revision of the state-of-the-art methodologies underlying chemical search engines, named entity recognition and text mining systems.

The rapidly growing field of big data applications in biomedical research together with the use of machine learning and artificial intelligence technologies for text data mining has resulted in promising tools. "This review -state the authors- is organised to serve as a practical guide to researchers entering in this field but also to help them to envision the next steps in this emerging data science field."

"Through the release of Gold Standard datasets and the organisation of several community challenge benchmark events, the Biological Text Mining Unit has played a critical role in the development and evaluation of current chemical text mining systems, as highlighted in this article," explains Martin Krallinger, head of the Unit and co-first author of the review.

A huge amount of unstructured data

A considerable fraction of biomedical-relevant data is only available in the form of unstructured data. This type of data includes the rapidly growing scientific literature, medicinal chemistry patents, electronic health records or clinical trial documents. In fact, every year, over 20,000 new compounds are published in medicinal and biological chemistry journals.

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is becoming critical for a range of very heterogeneous applications. These include the identification of new drug targets and chemical probes to validate/discard those new potential targets, re-purposing of approved drugs, the identification of adverse drug events or retrieval of systems biology associated with chemical-disease or chemical-gene networks.

Chemical compounds constitute a key entity type of critical relevance for biomedical research; as a therapeutic strategy to treat medical needs. In fact, "the construction of large chemical knowledge bases, integrating chemical information with biological and clinical data, is crucial to identify and validate new therapeutic targets for unmet medical needs as well as to speed up the drug discovery process" explains Julen Oyarzabal, Director of Translational Sciences at CIMA and co-leader of this report.


Story Source:

Materials provided by Centro Nacional de Investigaciones Oncológicas (CNIO). Note: Content may be edited for style and length.


Journal Reference:

  1. Martin Krallinger, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal, Alfonso Valencia. Information Retrieval and Text Mining Technologies for Chemistry. Chemical Reviews, 2017; DOI: 10.1021/acs.chemrev.6b00851

Cite This Page:

Centro Nacional de Investigaciones Oncológicas (CNIO). "State-of-the-art text mining technologies for chemistry." ScienceDaily. ScienceDaily, 21 June 2017. <www.sciencedaily.com/releases/2017/06/170621103106.htm>.
Centro Nacional de Investigaciones Oncológicas (CNIO). (2017, June 21). State-of-the-art text mining technologies for chemistry. ScienceDaily. Retrieved May 23, 2024 from www.sciencedaily.com/releases/2017/06/170621103106.htm
Centro Nacional de Investigaciones Oncológicas (CNIO). "State-of-the-art text mining technologies for chemistry." ScienceDaily. www.sciencedaily.com/releases/2017/06/170621103106.htm (accessed May 23, 2024).

Explore More

from ScienceDaily

RELATED STORIES