Featured Research

from universities, journals, and other organizations

Researchers 'Text Mine' The New York Times, Demonstrating Ease Of New Technology

Date:
July 27, 2006
Source:
University of California - Irvine
Summary:
Performing what a team of dedicated and bleary-eyed newspaper librarians would need months to do, scientists at UC Irvine have used an up-and-coming technology to complete in hours a complex topic analysis of 330,000 stories published primarily by The New York Times.

Performing what a team of dedicated and bleary-eyed newspaper librarians would need months to do, scientists at UC Irvine have used an up-and-coming technology to complete in hours a complex topic analysis of 330,000 stories published primarily by The New York Times.

The demonstration is significant because it is one of the earliest showing that an extremely efficient, yet very complicated, technology called text mining is on the brink of becoming a tool useful to more than highly trained computer programmers and homeland security experts.

“We have shown in a very practical way how a new text mining technique makes understanding huge volumes of text quicker and easier,” said David Newman, a computer scientist in the Donald Bren School of Information and Computer Sciences at UCI. “To put it simply, text mining has made an evolutionary jump. In just a few short years, it could become a common and useful tool for everyone from medical doctors to advertisers; publishers to politicians.”

Text mining allows a computer to extract useful information from unstructured text. Until recently, text mining required a great deal of preparation before documents could be analyzed in a meaningful way. A new text-mining technique called “topic modeling” – which UCI scientists used in their New York Times experiment – looks for patterns of words that tend to occur together in documents, then automatically categorizes those words into topics – all with minimal human effort.

UCI researchers didn’t invent topic modeling, but they developed a technique that allows the technology to be used on huge document collections. They also are among the first to demonstrate its ease and effectiveness by applying it to a newspaper archive. The results reveal few surprises, but the application demonstrates the ability of topic modeling to spot trends and make connections in a way that could be applied to more complicated and cumbersome documents such as those used by medical researchers and lawyers.

Newman and UCI researchers Padhraic Smyth, Mark Steyvers and Chaitanya Chemudugunta presented their research at the recent Intelligence and Security Informatics conference in San Diego.

The topic model, applied to the collection of news articles published from 2000 to 2002, identified patterns of words that occurred together in the stories. From those words, researchers were able to identify topics. Information associated with those topics was charted over time, allowing the scientists to pinpoint what months of the year certain topics were most in the news and how much ink they received from year to year.

For example, the model generated a list of words that included “rider,” “bike,” “race,” “Lance Armstrong” and “Jan Ullrich.” From this, researchers were easily able to identify that topic as the Tour de France. By examining the probability of words appearing in stories about the Tour de France, researchers learned that Armstrong was written about seven times as much as Ullrich. Charting information over time, researchers discovered that discussion of Tour de France peaked in the summer months but decreased slightly year to year.

“If I were interested in advertising a product related to the Tour de France, I might want to know whether interest in the Tour de France is increasing or decreasing,” Newman said. “This might be very important knowledge.”

Including the Tour de France, the model automatically identified a total of 400 topics ranging from renting apartments in Brooklyn and diving in Hawaii to voting irregularities and dinosaur bones. As for newsmakers, topics included Tiger Woods, Elian Gonzalez, Denzel Washington and Barbie.

“Text mining is an incredible tool,” Newman said. “It already allows a doctor to identify the common thread in old and new medical research. With topic modeling, connections can be drawn faster and more efficiently in large volumes of text.”

About topic modeling: UCI researchers performed their experiment using a statistical topic model based on a text model developed at UC Berkeley in 2003. Thanks to an improved solution technique proposed by Mark Steyvers and a research partner, this model has advanced from academic use to something that is now widely used in the research community. Topic modeling looks for patterns of words that tend to occur together in documents, then automatically categorizes those words into topics. Older text-mining techniques require the user to come up with an appropriate set of topic categories and manually find hundreds to thousands of example documents for each category. This human-intensive process is called supervised learning. In contrast, topic modeling, a type of unsupervised learning, doesn’t need suggestions for an appropriate set of topic categories or human-found example documents. This makes retrieving information easier and quicker.


Story Source:

The above story is based on materials provided by University of California - Irvine. Note: Materials may be edited for content and length.


Cite This Page:

University of California - Irvine. "Researchers 'Text Mine' The New York Times, Demonstrating Ease Of New Technology." ScienceDaily. ScienceDaily, 27 July 2006. <www.sciencedaily.com/releases/2006/07/060727100528.htm>.
University of California - Irvine. (2006, July 27). Researchers 'Text Mine' The New York Times, Demonstrating Ease Of New Technology. ScienceDaily. Retrieved July 26, 2014 from www.sciencedaily.com/releases/2006/07/060727100528.htm
University of California - Irvine. "Researchers 'Text Mine' The New York Times, Demonstrating Ease Of New Technology." ScienceDaily. www.sciencedaily.com/releases/2006/07/060727100528.htm (accessed July 26, 2014).

Share This




More Computers & Math News

Saturday, July 26, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Mobile App Gives Tour of Battle of Atlanta Sites

Mobile App Gives Tour of Battle of Atlanta Sites

AP (July 25, 2014) Emory University's Center for Digital Scholarship has launched a self-guided mobile tour app to coincide with the 150th anniversary of the Civil War's Battle of Atlanta. (July 25) Video provided by AP
Powered by NewsLook.com
Bill Gates: Health, Agriculture Key to Africa's Development

Bill Gates: Health, Agriculture Key to Africa's Development

AFP (July 24, 2014) Health and agriculture development are key if African countries are to overcome poverty and grow, US software billionaire Bill Gates said Thursday, as he received an honourary degree in Ethiopia. Duration: 00:36 Video provided by AFP
Powered by NewsLook.com
Creative Makeovers for Ugly Cellphone Towers

Creative Makeovers for Ugly Cellphone Towers

AP (July 24, 2014) Mobile phone companies and communities across the country are going to new lengths to disguise those unsightly cellphone towers. From a church bell tower to a flagpole, even a pencil, some towers are trying to make a point. (July 24) Video provided by AP
Powered by NewsLook.com
Robot Parking Valet Creates Stress-Free Travel

Robot Parking Valet Creates Stress-Free Travel

AP (July 23, 2014) 'Ray' the robotic parking valet at Dusseldorf Airport in Germany lets travelers to avoid the hassle of finding a parking spot before heading to the check-in desk. (July 23) Video provided by AP
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins