Featured Research

from universities, journals, and other organizations

Researchers Create Search Engine To Hunt Molecules Online

Date:
July 30, 2007
Source:
Penn State
Summary:
ChemxSeer, the first publicly available search engine designed specifically for chemical formulae, can sort out when "He" refers to helium and not a person more than nine times out of 10, according to the Penn State College of Information Sciences and Technology researchers who created the tool.

ChemxSeer, the first publicly available search engine designed specifically for chemical formulae, can sort out when "He" refers to helium and not a person more than nine times out of 10, according to the Penn State College of Information Sciences and Technology (IST) researchers who created the tool.

With the new engine, scientists searching for research on CH4 or methane no longer have to wade through search results about Channel 4 or Chapter 4 as ChemxSeer will only return documents with references to the chemical formula.

The new algorithm also can identify related chemicals with different formula representations and chemicals with related substructures or similarities, said C. Lee Giles, professor of information sciences and technology and co-director of the IST Cyber Infrastructure Lab where the research originated.

"Results from our search engine are much more relevant than results returned by popular search engines," Giles said. "It is one of several cyber tools under development in our lab which will enable better access to and sharing of information and data among scientists and scholars."

The tool is described in a paper, "Extraction and Search of Chemical Formulae in Text Documents on the Web," presented at the recent 16th International World Wide Web Conference in Alberta, Canada. In addition to Giles, the authors are Bingjun Sun and Qingzhao Tan, graduate students in computer science and engineering, and Prasenjit Mitra, assistant professor of information sciences and technology and co-director of Penn State's Cyber Infrastructure Lab.

Electronically hunting for chemical formulae poses some unique challenges for popular search engines, which typically focus on key words. For one, scientists often search for parts of chemical formulae, with the part appearing in the beginning, at the end or in between.

Similarly, some chemical molecules can have more than one formula representation. As a result, if a person is searching for CH4 using a popular search engine and the article identifies the molecule as H4C, the article won't be included in the search results. In addition, molecules can be confused with non-chemical abbreviations. While people would recognize "OH" as Ohio in a particular context, a machine with a chemical dictionary could confuse it with the chemical notation for a hydroxide. A similar slip up can occur with "I" (iodine) or "In" (indium).

In addition, molecules can be confused with non-chemical abbreviations. While people would recognize "OH" as Ohio in a particular context, a machine with a chemical dictionary could confuse it with the chemical notation for a hydroxide. A similar slip up can occur with "I" (iodine) or "In" (indium).

In designing the engine, the researchers built on their expertise in information-extraction algorithms created for CiteSeer, a search engine for academic and science documents.

Besides extracting formulae, ChemxSeer also allows for various query models appropriate for any scientist looking for a molecule. Not only does it query for exact matches, but it also queries for formulae with additional terms or elements as well as for formulae with similar structures. The engine also can search for the range of occurrence of an element in various formulae, the researchers said.

To create ChemxSeer, the researchers basically "taught" machines how to recognize chemical formulae by providing training samples of occurrences of both chemical formulae and non-chemical formulae.

"Teaching the computer to classify what is a formula and what is not was complex because language is inherently context sensitive and judging the meaning of a term using its context is hard for a machine," Mitra said. Future research will focus on improving the reliability of identification, linking to existing molecular databases, data archiving and increasing the relevance of search results.

The engine is part of an open-source cyber infrastructure project focusing on chemical document search for environmental chemistry and funded by the National Science Foundation. The grant awarded to the Penn State Department of Chemistry aims to enable automatic data analysis.

"This tool replaces time-intensive manual searching, allowing our research team to focus more on solving problems with as much relevant information as possible," said Karl Mueller, professor of chemistry and PI of the cyber infrastructure grant.

Note: the ChemxSeer Project web site can be found at http://chemxseer.ist.psu.edu/


Story Source:

The above story is based on materials provided by Penn State. Note: Materials may be edited for content and length.


Cite This Page:

Penn State. "Researchers Create Search Engine To Hunt Molecules Online." ScienceDaily. ScienceDaily, 30 July 2007. <www.sciencedaily.com/releases/2007/07/070726210910.htm>.
Penn State. (2007, July 30). Researchers Create Search Engine To Hunt Molecules Online. ScienceDaily. Retrieved April 23, 2014 from www.sciencedaily.com/releases/2007/07/070726210910.htm
Penn State. "Researchers Create Search Engine To Hunt Molecules Online." ScienceDaily. www.sciencedaily.com/releases/2007/07/070726210910.htm (accessed April 23, 2014).

Share This



More Matter & Energy News

Wednesday, April 23, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Is North Korea Planning Nuclear Test #4?

Is North Korea Planning Nuclear Test #4?

Newsy (Apr. 22, 2014) South Korean officials say North Korea is preparing to conduct another nuclear test, but is Pyongyang just bluffing this time? Video provided by Newsy
Powered by NewsLook.com
China Falls for 4x4s at Beijing Auto Show

China Falls for 4x4s at Beijing Auto Show

AFP (Apr. 22, 2014) The urban 4x4 is the latest must-have for Chinese drivers, whose conversion to the cult of the SUV is the talking point of this year's Beijing auto show. Duration: 00:40 Video provided by AFP
Powered by NewsLook.com
Lytro Introduces 'Illum,' A Professional Light-Field Camera

Lytro Introduces 'Illum,' A Professional Light-Field Camera

Newsy (Apr. 22, 2014) The light-field photography engineers at Lytro unveiled their next innovation: a professional DSLR-like camera called "Illum." Video provided by Newsy
Powered by NewsLook.com
3 Reasons Why Harley Davidson Is Selling Tons of Epic Hogs

3 Reasons Why Harley Davidson Is Selling Tons of Epic Hogs

TheStreet (Apr. 22, 2014) Sales of motorcycles have continued to ride back from the depths of hell known as the Great Recession. Excluding scooters, sales of motorcycles increased 3% in 2013. In units, however, at 465,000 sold last year, the total remained about 50% below the peak hit in 2007. Industry leader Harley Davidson’s shareholders have benefited both by the industry recovery and positive headlines emanating from the company. Belus Capital Advisors CEO Brian Sozzi takes you beyond the headlines of the motorcycle maker. Video provided by TheStreet
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins