Science News

... from universities, journals, and other research organizations

Mathematical Distribution Links Open Source Software And Literature

Feb. 2, 2009 — The frequency of words in texts, the size of companies and the linking together of components in Linux software distributions show approximately the same mathematical distribution: they obey Zipf’s law. ETH Zurich researchers tested how this happens in Linux programs.


Share This:

In the first half of the twentieth century, the American linguist George Kingsley Zipf studied how often each word occurs in literary texts. A few words were very frequent, e.g. "the" and "and", but the majority of words occurred only rarely. The resulting pattern could be expressed in figures: the most frequent word occurred about twice as often as the second most frequent and three times as often as the third most frequent, i.e. the frequency of a word was inversely proportional to its rank. This has since been called Zipf’s law.

Scientists have discovered that this distribution holds true – more or less – for quite different systems, e.g. the numbers of visitors to web sites, the size of towns and the size of companies in numerous countries. Researchers suspected that this recurring pattern is associated with the growth process of the systems being studied.

Free-of-charge raw material due to Open Source

Doctoral student Thomas Maillart and Didier Sornette, Professor on the Chair of Entrepreneurial Risks, together with Sebastian Späth and Georg von Krogh, Professor on the Chair of Strategic Management and Innovation at ETH Zurich, have now demonstrated empirically the conditions under which a distribution obeying Zipf’s law occurs. They did this by examining the linking of Linux software packets.

Their results were published in the scientific journal Physical Review Letters and mentioned in Nature as a Research Highlight.

In an earlier publication, Sornette had already suggested carrying out an empirical test of Zipf’s law. When searching for a subject for his thesis, his doctoral student Thomas Maillart came across an article about open source software by Sebastian Späth and Georg von Krogh. Maillart realised that this contained data with which the origin of Zipf’s law could be verified.

Linux is an operating system similar to Microsoft Windows or Mac OS. Many versions of it are available to download free of charge via the Internet. Each Linux distribution consists of various software packets which thus represent free-of-charge raw material for the scientists to use in their research. Debian Linux – the distribution studied by the ETH Zurich researchers – comprised only 474 packets in 1996, whereas there were already more than 18,000 in 2007.

Characteristic distribution arises as a result of the growth

The packets are networked by numerous links through which they call one another. First of all, for four versions of Debian, Maillart examined whether the number of incoming packet links obeys Zipf’s law. This was confirmed (see graphic). The scientists then studied how the number of links referring to a packet develops over time. They assumed a proportional growth pattern: the more links that already lead to a packet, the faster the number of links increases.

The evaluation of the Linux packets data showed that the researchers’ model was correct. In new packets, the number of links deviated from Zipf’s law, and the characteristic distribution arose only as a result of the growth of the Linux distribution. A condition that the researchers had used in their model was also confirmed: the fluctuation in the number of links becomes larger as it grows. Consequently, it can drop down to zero again even if it is very large, which, for the Linux packet, means that it is no longer being used.

Conclusions on Entrepreneurial risks

Thomas Maillart describes himself as a risk manager. He says that he had already calculated risks as a Civil Engineering student at EPFL, where these risks were connected with the safety of building structures. He then worked in a company insuring Internet risks. He has now written the paper on Zipf’s law in the context of his thesis on Internet risks at the Chair on Entrepreneurial Risks at ETH Zurich.

Being able to estimate the growth of Linux packets is exciting from an entrepreneurial point of view. However, the significance of the paper extends far beyond this specialist area, because the knowledge applies to all systems obeying Zipf’s law. To the size of companies, for example: by analogy with the number of links pointing to a Linux packet, a company’s size provides no certainty that the company will survive, as the financial crisis has confirmed.

Share this story on Facebook, Twitter, and Google:

Other social bookmarking and sharing tools:

|

Story Source:

The above story is reprinted from materials provided by ETH Zurich.

Note: Materials may be edited for content and length. For further information, please contact the source cited above.


Journal Reference:

  1. Maillart et al. Empirical Tests of Zipf’s Law Mechanism in Open Source Linux Distribution. Physical Review Letters, 2008; 101 (21): 218701 DOI: 10.1103/PhysRevLett.101.218701
APA

MLA

Note: If no author is given, the source is cited instead.

Search ScienceDaily

Number of stories in archives: 138,598

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily's archives for related news topics,
the latest news stories, reference articles, science videos, images, and books.

Recommend ScienceDaily on Facebook, Twitter, and Google:

Other social bookmarking and sharing services:

|

 
Interested in ad-free access? If you'd like to read ScienceDaily without ads, let us know!
  more breaking science news

Social Networks


Follow ScienceDaily on Facebook, Twitter,
and Google:

Recommend ScienceDaily on Facebook, Twitter, and Google +1:

Other social bookmarking and sharing tools:

|

Breaking News

... from NewsDaily.com

  • more science news

In Other News ...

  • more top news

Science Video News


Kidney Exchange

Computer Scientists have created an algorithm able to sort through up to 10,000 kidney donor/patient pairs, taking over the mammoth task of. ...  > full story

Strange Science News

 

Free Subscriptions

... from ScienceDaily

Get the latest science news with our free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Feedback

... we want to hear from you!

Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?

Post this page to your favorite social bookmarking site:
Include this item in your blog or web site:
Cite this article in your essay, paper, or report:
Email this page's link to a friend or colleague: