Featured Research

from universities, journals, and other organizations

Digging for data with Chemlist and ChemSpider

Date:
March 22, 2010
Source:
BioMed Central
Summary:
Just like the rest of us, scientists today are swamped with information. As more chemical resources become freely available, text mining applications -- previously focused on correctly identifying gene and protein names -- are now shifting towards also correctly identifying chemical names.

Just like the rest of us, scientists today are swamped with information. As more chemical resources become freely available, text mining applications -- previously focused on correctly identifying gene and protein names -- are now shifting towards also correctly identifying chemical names. Now database experts have compared two chemical name dictionaries head to head, and report on the payoffs of manual versus automatic data curation in the open access publication, Journal of Cheminformatics.

Chemlist's creators wanted to investigate the effect extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. Kristina Hettne and her team based in the Netherlands, together with US-based colleagues, compared Chemlist, a dictionary for identifying small molecules and drugs in text automatically generated from a number of publicly available databases, with a second dictionary extracted from the ChemSpider database which has been curated manually to establish valid chemical name to structure relationships. To compare automatic curation with manual curation, the authors used only the ChemSpider component containing manually curated names and synonyms in their research.

The researchers tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of some 80,000 names was less than a third of the size of Chemlist at around 300,000. The ChemSpider dictionary had a precision of 0.43 and recall of 0.19 before filtering and disambiguation, with results of 0.87 and 0.19 after filtering and disambiguation. Meanwhile the Chemlist dictionary scored 0.20 for precision and 0.47 for recall before filtering and disambiguation, and 0.67 and 0.40 for these two measures afterwards.

This means that although ChemSpider achieved the best precision, the Chemlist dictionary had a higher recall and the best F-score, a function of a test's accuracy incorporating both precision and recall. "Rule-based filtering and disambiguation is necessary to achieve high precision for both automatically generated and the manually curated dictionaries," Hettne concludes. Antony Williams, project lead for ChemSpider comments "Such validated name-structure dictionaries studied in this work provide a strong foundation for semantic markup technologies, interlinking and various online resources." Both ChemSpider and the chemical databases included in Chemlist continue to grow at high speed, and further investigation is needed to see how this growth affects the performance of the dictionaries.


Story Source:

The above story is based on materials provided by BioMed Central. Note: Materials may be edited for content and length.


Journal Reference:

  1. Kristina M Hettne, Antony J Williams, Erik M van Mulligen, Jos Kleinjans, Valery Tkachenko and Jan A Kors. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining. Journal of Cheminformatics, (in press) [link]

Cite This Page:

BioMed Central. "Digging for data with Chemlist and ChemSpider." ScienceDaily. ScienceDaily, 22 March 2010. <www.sciencedaily.com/releases/2010/03/100322194757.htm>.
BioMed Central. (2010, March 22). Digging for data with Chemlist and ChemSpider. ScienceDaily. Retrieved July 22, 2014 from www.sciencedaily.com/releases/2010/03/100322194757.htm
BioMed Central. "Digging for data with Chemlist and ChemSpider." ScienceDaily. www.sciencedaily.com/releases/2010/03/100322194757.htm (accessed July 22, 2014).

Share This




More Matter & Energy News

Tuesday, July 22, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Government Approves East Coast Oil Exploration

Government Approves East Coast Oil Exploration

AP (July 18, 2014) The Obama administration approved the use of sonic cannons to discover deposits under the ocean floor by shooting sound waves 100 times louder than a jet engine through waters shared by endangered whales and turtles. (July 18) Video provided by AP
Powered by NewsLook.com
Sunken German U-Boat Clearly Visible For First Time

Sunken German U-Boat Clearly Visible For First Time

Newsy (July 18, 2014) The wreckage of the German submarine U-166 has become clearly visible for the first time since it was discovered in 2001. Video provided by Newsy
Powered by NewsLook.com
Obama: U.S. Must Have "smartest Airports, Best Power Grid"

Obama: U.S. Must Have "smartest Airports, Best Power Grid"

Reuters - US Online Video (July 17, 2014) President Barak Obama stopped by at a lunch counter in Delaware before making remarks about boosting the nation's infrastructure. Mana Rabiee reports. Video provided by Reuters
Powered by NewsLook.com
Crude Oil Prices Bounce Back After Falling Below $100 a Barrel

Crude Oil Prices Bounce Back After Falling Below $100 a Barrel

TheStreet (July 16, 2014) Oil Futures are bouncing back after tumbling below $100 a barrel for the first time since May yesterday. Jeff Grossman is the president of BRG Brokerage and trades at the NYMEX. Grossman tells TheStreet the Middle East is always a concern for oil traders. Oil prices were pushed down in recent weeks on Libya increasing its production. Supply disruptions in Iraq fading also contributed to prices falling. News from China's economic front showing a growth for the second quarter also calmed fears on its slowdown. Jeff Grossman talks to TheStreet's Susannah Lee on this and more on the Energy Department's Energy Information Administration (EIA) report. Video provided by TheStreet
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins