Featured Research

from universities, journals, and other organizations

High-performance computing reveals missing genes

Date:
April 14, 2010
Source:
Virginia Tech
Summary:
Scientists have used high-performance computing to locate small genes that have been missed by scientists in their quest to define the microbial DNA sequences of life. Using an ephemeral supercomputer made up of computers from across the world, the mpiBLAST computational tool used by the researchers took only 12 hours instead of the 90 years it would have required if the work were performed on a standard personal computer.

Scientists at the Virginia Bioinformatics Institute (VBI) and the Department of Computer Science at Virginia Tech have used high-performance computing to locate small genes that have been missed by scientists in their quest to define the microbial DNA sequences of life. Using an ephemeral supercomputer made up of computers from across the world, the mpiBLAST computational tool used by the researchers took only 12 hours instead of the 90 years it would have required if the work were performed on a standard personal computer.

The new study, reported in the journal BMC Bioinformatics, is the first large-scale attempt to identify undetected genes of microbes in the burgeoning GenBank DNA sequence repository that contains over 100 billion bases of DNA sequence. The genes uncovered may have important functions in the cell, but those functions need to be established by further experiment.

Skip Garner, executive director of VBI and professor of biological sciences at Virginia Tech, commented, "This is a perfect storm, where an overwhelming amount of data is analyzed by state-of-the-art computational approaches, yielding important new information about genes. These genes may be tomorrow's new targets for pharmaceutical research, for example to find new antibiotics or vaccines, which is extremely important since we need novel approaches to combat the emergence of new drug-resistant bugs."

In the past few years, enormous progress has been made in sequencing technologies that allow scientists to produce astonishing amounts of sequence data. Today more than 1200 genome sequences of microbes are housed in the GenBank database. By far one of the biggest problems facing scientists is not generating the sequence data but reliably locating and assigning a function to the many genes in a genome, a process that scientists refer to as annotation. This process crucially depends on sophisticated computational tools. The field of bioinformatics is considered by many experts to have been started to address this very need.

João Setubal, associate professor at the Virginia Bioinformatics Institute and the Department of Computer Science at Virginia Tech, commented: "Scientists have known for a long time that publicly available databases of genomes have inconsistencies, errors, and gaps. Some genes are labeled with the wrong function and for others the function is unknown. But nobody had done a systematic study to verify how many genes were simply undetected. This is what we did in our study -- discover the number of microbial genes that are under the radar."

Scientists have developed different computer tools to help them in their efforts to locate and identify genes. Most of these tools work by building a model based on the features of the sequence and working out the likelihood that an individual segment codes for a gene. Comparing DNA segments with known gene sequences stored in GenBank complements this work. If a DNA segment is similar to the sequence of known genes, then the segment is likely to be a coding gene with a similar function.

Said Setubal, "Such approaches will not find genes that have unusual sequence properties. Furthermore they will not find those genes that have not been detected up to now and hence are not present in GenBank. Our results clearly show that there are many small protein-encoding genes in the genomes of microbes that have been systematically missed."

The lowest estimate in the study placed the number of families of missing genes at 380 in the 780 genomes that were investigated. Said Setubal, "This number is most likely an underestimate since we have been conservative for the criteria we have used for finding these missing gene families."

Wu Feng, associate professor in the Department of Computer Science and the Department of Electrical and Computer Engineering at Virginia Tech, remarked: "To facilitate the rapid discovery of missing genes in genomes, we used our mpiBLAST sequence-search tool to perform an all-to-all sequence search of the 780 microbial genomes that we investigated. This process entailed running on the order of tens of trillions of sequence searches with mpiBLAST. The all-to-all sequence search was done on an ephemeral supercomputer that aggregated more than 12,000 processor cores across seven different supercomputers, distributed across the United States. It reduced the search time from nearly 90 years, when computed on a personal computer, down to a mere 12 hours."

Andrew Warren, a graduate assistant at VBI who has been working on this project as part of his PhD thesis, remarked: "At the outset of this project, the challenge was to create a method based on high-performance computing that could make meaningful predictions from such a large dataset. Through this work we were able to identify potential targets for future research and experimentation that can determine if these genes exist in vivo."

Some of the preliminary work that is described in the current paper, specifically the computational and data management, was the winner of a distinguished paper award at the 2008 International Supercomputing Conference. The paper "Distributed I/O with ParaMEDIC: Experiences with a Worldwide Supercomputer," recounted the computing experiences of an international team in finding missing genes in genomes and in constructing a genome similarity tree from the International Storage Challenge at the 2007 ACM/IEEE SC: The International Conference for High Performance Computing, Networking, Storage and Analysis.


Story Source:

The above story is based on materials provided by Virginia Tech. Note: Materials may be edited for content and length.


Journal Reference:

  1. Warren et al. Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics, 2010; 11 (1): 131 DOI: 10.1186/1471-2105-11-131

Cite This Page:

Virginia Tech. "High-performance computing reveals missing genes." ScienceDaily. ScienceDaily, 14 April 2010. <www.sciencedaily.com/releases/2010/04/100413151911.htm>.
Virginia Tech. (2010, April 14). High-performance computing reveals missing genes. ScienceDaily. Retrieved April 16, 2014 from www.sciencedaily.com/releases/2010/04/100413151911.htm
Virginia Tech. "High-performance computing reveals missing genes." ScienceDaily. www.sciencedaily.com/releases/2010/04/100413151911.htm (accessed April 16, 2014).

Share This



More Computers & Math News

Wednesday, April 16, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

German Researchers Crack Samsung's Fingerprint Scanner

German Researchers Crack Samsung's Fingerprint Scanner

Newsy (Apr. 16, 2014) — German researchers have used a fake fingerprint made from glue to bypass the fingerprint security system on Samsung's new Galaxy S5 smartphone. Video provided by Newsy
Powered by NewsLook.com
Twitter, Apple Social Data Purchases Likely to Spur More Mergers and Acquisitions

Twitter, Apple Social Data Purchases Likely to Spur More Mergers and Acquisitions

TheStreet (Apr. 16, 2014) — The social media data space is likely to see more mergers and acquisitions following Twitter Inc.'s acquisition of tweet analyzer Gnip Inc. on Tuesday and Apples Inc.'s purchase of Topsy Labs Inc. back in December. One firm in particular, the U.K.'s DataSift Inc., could be on the list of potential buyers. Among other social media startups that could be ripe for picking is Banjo, whose mobile app provides aggregated content by topic and location. Banjo could also be a good fit for Twitter. Video provided by TheStreet
Powered by NewsLook.com
Bitcoin Exchange Mt. Gox to Liquidate After Rebuilding Rejected

Bitcoin Exchange Mt. Gox to Liquidate After Rebuilding Rejected

TheStreet (Apr. 16, 2014) — Bitcoin exchange Mt. Gox has agreed to liquidate after a Japanese court rejected its plans to rebuild, according to a report by the Wall Street Journal. Mt. Gox filed for bankruptcy protection in February after announcing about 850,000 bitcoins, worth around $454 million at today's rates, may have been stolen by hackers. It has since recovered 200,000 of the missing bitcoins. The court put Mt. Gox's assets under a provisional administrator's control until bankruptcy proceedings begin. Video provided by TheStreet
Powered by NewsLook.com
BlackBerry: The Crash That Launched 1,000 Startups

BlackBerry: The Crash That Launched 1,000 Startups

Reuters - Business Video Online (Apr. 16, 2014) — Tech startups in BlackBerry's hometown of Waterloo, Ontario, are tapping talent from the struggling smartphone company and filling the void left in the region by its meltdown. Reuters correspondent Euan Rocha visits the region that could become Canada's Silicon Valley. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins