Featured Research

from universities, journals, and other organizations

In search of the key word: Bursts of certain words within a text are what make them keywords

Date:
July 17, 2012
Source:
Max-Planck-Gesellschaft
Summary:
Human beings have the ability to convert complex phenomena into a one-dimensional sequence of letters and put it down in writing. In this process, keywords serve to convey the content of the text. How letters and words correlate with the subject of a text is the subject of a new study using statistical methods. Researchers discovered that what denotes keywords is not the fact that they appear very frequently in a given text. It is that they are found in greater numbers only at certain points in the text.

Human beings have the ability to convert complex phenomena into a one-dimensional sequence of letters and put it down in writing. In this process, keywords serve to convey the content of the text. How letters and words correlate with the subject of a text is something Eduardo Altmann and his colleagues from the Max Planck Institute for the Physics of Complex Systems have studied with the help of statistical methods.

Related Articles


They discovered that what denotes keywords is not the fact that they appear very frequently in a given text. It is that they are found in greater numbers only at certain points in the text. They also discovered that relationships exist between sections of text which are distant from each other, in the sense that they preferentially use the same words and letters.

The Dresden-based scientists mathematically studied the semantic properties of texts by translating ten different English texts into various codes. One of the chosen texts was the English edition of Leo Tolstoy's "War and Peace."

One example of what the scientists did was translate letters in a text into a binary sequence. They replaced all vowels with 1 and all consonants with 0. By employing additional mathematical functions, the scientists examined different levels of the text -- both individual vowels and letters, as well as whole words -- which had been translated into various codes. In so doing, it was possible to identify repeating patterns within the text as a whole. Such correlation within a text is referred to as long-range correlation. This indicates whether two letters located at arbitrarily distant points in the text are connected with each other. For example, when we find a letter "W" at a certain point, there is a measurably higher probability that we will find the letter "W" again a few pages later. "Understandably enough, if a certain point in the book talks about war, there is a high probability that the word war will also appear a few pages later. What is surprising is that we also find this higher probability at the level of individual letters," says Altmann.

Keywords are more frequent in certain passages of text

The scientists found this long-range correlation not only between letters, but also within higher linguistic levels, such as words. Within individual levels, the correlation remains when looking at different texts. "What we find much more interesting is to examine how the correlation changes between the levels," says Altmann. Long-range correlation enables the scientists to draw conclusions about the extent to which certain words are connected to a topic. "Even the connection between a word and the letters it is composed of can be analysed in this way," explains Altmann.

Furthermore, the scientists also studied what is known as "burstiness," which describes whether increased occurrence of a pattern of characters is present in a passage of text. It shows, for instance, whether a word comes up at increased frequency in a certain text section. The more frequently a certain word is used in a passage, the more likely it is that that word is representative of a certain subject.

The scientists demonstrated that certain words come up repeatedly throughout a text, are however not present in bursts in a given text passage. Although these words do exhibit long-range correlation, they are not closely related to the topic at hand. "Articles are the best examples of these. They come up very frequently in every text, but they are not crucial in conveying a given topic," says Altmann.

Statistical text analysis works irrespective of language

Whereas both letters and words exhibit long-range correlation, it is rare for letters to appear in bursts at certain points in a text. "It is, in fact, very rare for a letter to be as closely connected with a topic as the word it forms a part of. In a manner of speaking, letters can be used more flexibly," explains Altmann. An "a," for example, can be a part of a great many words that have no connection with one and the same topic.

The scientists employed statistical text analysis as an easy way of identifying the defining words of a given text. "By so doing, it is absolutely irrelevant which language the text is written in. The only thing that matters is the story and not language-specific rules," says Altmann. Their findings could be used in future to improve Internet search engines, and they could also help to analyse texts and identify plagiarism.


Story Source:

The above story is based on materials provided by Max-Planck-Gesellschaft. Note: Materials may be edited for content and length.


Journal Reference:

  1. E. G. Altmann, G. Cristadoro, M. D. Esposti. On the origin of long-range correlations in texts. Proceedings of the National Academy of Sciences, 2012; 109 (29): 11582 DOI: 10.1073/pnas.1117723109

Cite This Page:

Max-Planck-Gesellschaft. "In search of the key word: Bursts of certain words within a text are what make them keywords." ScienceDaily. ScienceDaily, 17 July 2012. <www.sciencedaily.com/releases/2012/07/120717102633.htm>.
Max-Planck-Gesellschaft. (2012, July 17). In search of the key word: Bursts of certain words within a text are what make them keywords. ScienceDaily. Retrieved December 22, 2014 from www.sciencedaily.com/releases/2012/07/120717102633.htm
Max-Planck-Gesellschaft. "In search of the key word: Bursts of certain words within a text are what make them keywords." ScienceDaily. www.sciencedaily.com/releases/2012/07/120717102633.htm (accessed December 22, 2014).

Share This


More From ScienceDaily



More Mind & Brain News

Monday, December 22, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Researchers Test Colombian Village With High Alzheimer's Rates

Researchers Test Colombian Village With High Alzheimer's Rates

AFP (Dec. 19, 2014) In Yarumal, a village in N. Colombia, Alzheimer's has ravaged a disproportionately large number of families. A genetic "curse" that may pave the way for research on how to treat the disease that claims a new victim every four seconds. Duration: 02:42 Video provided by AFP
Powered by NewsLook.com
Double-Amputee Becomes First To Move Two Prosthetic Arms With His Mind

Double-Amputee Becomes First To Move Two Prosthetic Arms With His Mind

Buzz60 (Dec. 19, 2014) A double-amputee makes history by becoming the first person to wear and operate two prosthetic arms using only his mind. Jen Markham has the story. Video provided by Buzz60
Powered by NewsLook.com
Prenatal Exposure To Pollution Might Increase Autism Risk

Prenatal Exposure To Pollution Might Increase Autism Risk

Newsy (Dec. 18, 2014) Harvard researchers found children whose mothers were exposed to high pollution levels in the third trimester were twice as likely to develop autism. Video provided by Newsy
Powered by NewsLook.com
Yoga Could Be As Beneficial For The Heart As Walking, Biking

Yoga Could Be As Beneficial For The Heart As Walking, Biking

Newsy (Dec. 17, 2014) Yoga can help your weight, blood pressure, cholesterol and heart just as much as biking and walking does, a new study suggests. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Health & Medicine

Mind & Brain

Living & Well

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins