Featured Research

from universities, journals, and other organizations

Searching genomic data faster: Biologists' capacity for generating genomic data is increasing more rapidly than computing power

Date:
July 10, 2012
Source:
Massachusetts Institute of Technology
Summary:
In 2001, the Human Genome Project and Celera Genomics announced that after 10 years of work at a cost of some $400 million, they had completed a draft sequence of the human genome. Today, sequencing a human genome is something that a single researcher can do in a couple of weeks for less than $10,000. Since 2002, the rate at which genomes can be sequenced has been doubling every four months or so, whereas computing power doubles only every 18 months. Now a new algorithm drastically reduces the time it takes to find a particular gene sequence in a database of genomes.

Researchers have developed a new algorithm that drastically reduces the time it takes to find a particular gene sequence in a database of genomes.
Credit: Christine Daniloff

In 2001, the Human Genome Project and Celera Genomics announced that after 10 years of work at a cost of some $400 million, they had completed a draft sequence of the human genome. Today, sequencing a human genome is something that a single researcher can do in a couple of weeks for less than $10,000.

Since 2002, the rate at which genomes can be sequenced has been doubling every four months or so, whereas computing power doubles only every 18 months. Without the advent of new analytic tools, biologists' ability to generate genomic data will soon outstrip their ability to do anything useful with it.

In the latest issue of Nature Biotechnology, MIT and Harvard University researchers describe a new algorithm that drastically reduces the time it takes to find a particular gene sequence in a database of genomes. Moreover, the more genomes it's searching, the greater the speedup it affords, so its advantages will only compound as more data is generated.

In some sense, this is a data-compression algorithm -- like the one that allows computer users to compress data files into smaller zip files. "You have all this data, and clearly, if you want to store it, what people would naturally do is compress it," says Bonnie Berger, a professor of applied math and computer science at MIT and senior author on the paper. "The problem is that eventually you have to look at it, so you have to decompress it to look at it. But our insight is that if you compress the data in the right way, then you can do your analysis directly on the compressed data. And that increases the speed while maintaining the accuracy of the analyses."

Exploiting redundancy

The researchers' compression scheme exploits the fact that evolution is stingy with good designs. There's a great deal of overlap in the genomes of closely related species, and some overlap even in the genomes of distantly related species: That's why experiments performed on yeast cells can tell us something about human drug reactions.

Berger; her former grad student Michael Baym PhD '09, who's now a visiting scholar in the MIT math department and a postdoc in systems biology at Harvard Medical School; and her current grad student Po-Ru Loh developed a way to mathematically represent the genomes of different species -- or of different individuals within a species -- such that the overlapping data is stored only once. A search of multiple genomes can thus concentrate on their differences, saving time.

"If I want to run a computation on my genome, it takes a certain amount of time," Baym explains. "If I then want to run the same computation on your genome, the fact that we're so similar means that I've already done most of the work."

In experiments on a database of 36 yeast genomes, the researchers compared their algorithm to one called BLAST, for Basic Local Alignment Search Tool, one of the most commonly used genomic-search algorithms in biology. In a search for a particular genetic sequence in only 10 of the yeast genomes, the new algorithm was twice as fast as BLAST; but in a search of all 36 genomes, it was four times as fast. That discrepancy will only increase as genomic databases grow larger, Berger explains.

Matchmaking

The new algorithm would be useful in any application where the central question is, as Baym puts it: "I have a sequence; what is it similar to?" Identifying microbes is one example. The new algorithm could help clinicians determine causes of infections, or it could help biologists characterize "microbiomes," collections of microbes found in animal tissue or particular microenvironments; variations in the human microbiome have been implicated in a range of medical conditions. It could be used to characterize the microbes in particularly fertile or infertile soil, and it could even be used in forensics, to determine the geographical origins of physical evidence by its microbial signatures.

"The problem that they're looking at -- which is, given a sequence, trying to determine what known sequences are similar to it -- is probably the oldest problem in computational biology, and it's perhaps the most commonly asked question in computational biology," says Mona Singh, a professor of computer science at Princeton University and a faculty member at Princeton's Lewis-Sigler Institute for Integrative Genomics. "And the problem, just for that reason, is of central importance."

In the last 10 years, Singh says, biologists have tended to think in terms of "reference genomes" -- genomes, such as the draft human sequence released in 2001, that try to generalize across individuals within a species and even across species. "But as we're getting more and more individuals even within a species, and more very closely related sequenced distinct species, I think we're starting to move away from the idea of a single reference genome," Singh says. "Their approach is really going to shine when you have many closely related organisms."

Berger's group is currently working to extend the technique to information on proteins and RNA sequences, where it could pay even bigger dividends. Now that the human genome has been mapped, the major questions in biology are what genes are active when, and how the proteins they code for interact. Searches of large databases of biological information are crucial to answering both questions.


Story Source:

The above story is based on materials provided by Massachusetts Institute of Technology. Note: Materials may be edited for content and length.


Journal Reference:

  1. Po-Ru Loh, Michael Baym, Bonnie Berger. Compressive genomics. Nature Biotechnology, 2012; 30 (7): 627 DOI: 10.1038/nbt.2241

Cite This Page:

Massachusetts Institute of Technology. "Searching genomic data faster: Biologists' capacity for generating genomic data is increasing more rapidly than computing power." ScienceDaily. ScienceDaily, 10 July 2012. <www.sciencedaily.com/releases/2012/07/120710132955.htm>.
Massachusetts Institute of Technology. (2012, July 10). Searching genomic data faster: Biologists' capacity for generating genomic data is increasing more rapidly than computing power. ScienceDaily. Retrieved July 24, 2014 from www.sciencedaily.com/releases/2012/07/120710132955.htm
Massachusetts Institute of Technology. "Searching genomic data faster: Biologists' capacity for generating genomic data is increasing more rapidly than computing power." ScienceDaily. www.sciencedaily.com/releases/2012/07/120710132955.htm (accessed July 24, 2014).

Share This




More Computers & Math News

Thursday, July 24, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Robot Parking Valet Creates Stress-Free Travel

Robot Parking Valet Creates Stress-Free Travel

AP (July 23, 2014) 'Ray' the robotic parking valet at Dusseldorf Airport in Germany lets travelers to avoid the hassle of finding a parking spot before heading to the check-in desk. (July 23) Video provided by AP
Powered by NewsLook.com
Six Indicted in StubHub Hacking Scheme

Six Indicted in StubHub Hacking Scheme

AP (July 23, 2014) Six people were indicted Wednesday in an international ring that took over more than 1,000 StubHub users' accounts and fraudulently bought tickets that were then resold. (July 23) Video provided by AP
Powered by NewsLook.com
The Reviews Are In For The Amazon Fire Phone

The Reviews Are In For The Amazon Fire Phone

Newsy (July 23, 2014) Amazon's first smartphone, the Fire Phone, is set to ship this week, and so far the reviews have been pretty mixed. Video provided by Newsy
Powered by NewsLook.com
Bigger Apple Phone, Bigger Orders

Bigger Apple Phone, Bigger Orders

Reuters - Business Video Online (July 22, 2014) Apple is asking suppliers to make 70 to 80 million units of its new larger screen iPhone, a lot more initially than its current model. Fred Katayama reports. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins