Featured Research

from universities, journals, and other organizations

Experts Sets 'Gold Standard' For Metagenomic Data Analysis

Date:
May 14, 2007
Source:
DOE/Joint Genome Institute
Summary:
The field of metagenomics is still in its infancy -- the equivalent of the early days of the California Gold Rush, with labs vying to stake their claim. Amidst the prospecting, the call has been issued for methods to separate fool's gold from the real nuggets. Such a gold standard has now been set and published by the US Department of Energy Joint Genome Institute with colleagues in the May edition of Nature Methods.

With the advent of more powerful and economical DNA sequencing technologies, gene discovery and characterization is transitioning from single-organism studies to revealing the potential biotechnology applications embedded in communities of microbial genomes, or metagenomes. The field of metagenomics is still in its infancy -- the equivalent of the early days of the California Gold Rush, with labs vying to stake their claim.

Related Articles


Amidst the prospecting, the call has been issued for methods to separate fool's gold from the real nuggets. Such a gold standard has now been provided through work led by the U.S. Department of Energy Joint Genome Institute (DOE JGI) with colleagues from Oak Ridge National Laboratory and IBM's T.J. Watson Research Center. Their results are published in the May edition of Nature Methods.

"DOE JGI and our collaborators have pioneered the use of DNA sequencing-based technologies to understand microbial communities through a combination of computational and experimental methods," said Konstantinos Mavrommatis, lead author of the paper and a post-doctoral fellow in DOE JGI's Genome Biology Program. "We are now exploring ways to analyze metagenomic data to enable accurate classification of sequence fragments into their corresponding species populations.

The goal is to reconstruct metabolic pathways by comparing with reference isolate genomes, so that we can model ecosystem dynamics using metabolic reconstructions of metagenomic data. "However, so far all the methods that have been developed were aimed toward analyzing data coming from single, isolate genomes. In this instance, the situation is simple; we know what gene belongs to which organism. In metagenomes, it's much more challenging, because you have sequences from many different organisms all mixed up, and moreover, you don't have enough sequence from each to capture an accurate picture of the entire community, so you only get a glimpse of the identities of multiple genomes. All the publications to date have made the assumption that these tools will work as efficiently for metagenomes, "but we really didn't know."

Nikos Kyrpides, DOE JGI Genome Biology Program Head, said that, to evaluate the magnitude of this problem and identify the inherent pitfalls, "we have constructed three simulated metagenomic datasets of varying complexity by mixing pieces of over one hundred already sequenced isolate organisms. This approach allowed us to quantify the fidelity of several data processing methods, since we could identify the correct answer by comparing the synthetic datasets to the corresponding isolate genomes."

"This paper provides an extremely useful survey of tools and existing approaches for metagenome analysis and points out their weaknesses," said Natalia Maltsev, of the Bioinformatics Group, Mathematics and Computer Science Division at Argonne National Laboratory. "The simulated datasets constructed by the authors provide a much-needed test bed for evaluation and comparisons of these tools. Their findings will no doubt have a very significant impact on the field of metagenomics in general. It will help groups like mine to choose efficient strategies for the development of automated methods for high-throughput metagenome analysis. And last, but not least, it will stimulate the development of new computational tools and approaches for studies of microbial communities."

In a shotgun sequencing process, the DNA from the microbial genomes is first sheared into millions of small fragments to enable the amplification, labeling, and ultimately sequencing. Genome assembly is the process of putting the sequenced fragments back in order, in effect, putting Humpty Dumpty back together again, to recreate the identity of an organism from the scattered puzzle pieces of DNA.

"One of the problems in assembling metagenomes," said Mavrommatis, "is that you end up with large fragments of unknown accuracy and a substantial number of sequences that fail to fit onto those larger fragments. On many occasions, this information is not taken into account, depriving the analysis of valuable information embedded in that sequence.

The solution proposed by Mavrommatis and his colleagues was to evaluate and compare the existing methods to see which performs best for the particular environmental samples being analyzed. "What we did was to take known sample genomes, shuffle them, create simulated metagenomes, and use those tools on them, and then we went back and compared the results to the isolate genomes. Essentially, we applied the gold standard's 'the truth' and found there were tools that shouldn't have been used because their predictive accuracy was very low. But it also validated some of our assumptions."

Mavrommatis said that, for example, when using the widely used sequence assembly tool Phrap, they actually saw artifacts created by the program caused by mixing sequences that should not have been mixed.

"It's like when you're in the market for a digital camera, you can go to web sites like CNET to see the reviews, make the comparisons, and get some guidance for choosing the right product for your particular needs."

Another major problem with metagenomes is binning. Binning is the process of identifying from what organism a particular sequence has originated. There are several methods employed to bin sequences. BLAST (Basic Local Alignment Search Tool) is a method used to rapidly search for similar sequences in existing public databases.

Mavrommatis said that a popular approach is to take the sequence, BLAST it against the database, and find the best hit and assume that the sequence queried belongs to the same group of organisms. Other methods use intrinsic features of the sequence, such as oligonucleotide frequencies. Patterns of these features help to discriminate between the possible groups of organisms that contributed the sequence.

Several more methods have been proposed for binning, but none of these on their own have proven satisfactory, Mavrommatis said. "What we propose in the paper is a way to evaluate the appropriateness and accuracy of the binning methods using the same datasets in order to set a gold standard, we have designed the reference-simulated metagenome."

Through the Nature Methods publication, Mavrommatis has invited others to contribute new methods as they arise to continue to update the server and sustain the value of the system. This is also facilitated through a server called "Fames," available at the DOE JGI (http://fames.jgi-psf.org/) where the community of researchers can check the most recent results, compare their dataset from their metagenome of interest against the simulated metagenome, and receive guidance as to which are the optimal tools for analysis.

"Having such a tool at hand for the first time now, the community can not only compare the methods, but can also ask the question, why is this method better, or why does this one fail. Over all, we hope that it will help to improve the process and lead to further development of new methods for evaluating metagenomes, particularly since this gold rush is not going away any time soon."

The other DOE JGI authors on the study are Natalia Ivanova, Kerrie Barry, Harris Shapiro, Eugene Goltsman, Asaf Salamov, Frank Korzeniewski, Miriam Land, Alla Lapidus, Igor Grigoriev, Paul Richardson, Philip Hugenholtz and Nikos Kyrpides.


Story Source:

The above story is based on materials provided by DOE/Joint Genome Institute. Note: Materials may be edited for content and length.


Cite This Page:

DOE/Joint Genome Institute. "Experts Sets 'Gold Standard' For Metagenomic Data Analysis." ScienceDaily. ScienceDaily, 14 May 2007. <www.sciencedaily.com/releases/2007/05/070514101100.htm>.
DOE/Joint Genome Institute. (2007, May 14). Experts Sets 'Gold Standard' For Metagenomic Data Analysis. ScienceDaily. Retrieved December 20, 2014 from www.sciencedaily.com/releases/2007/05/070514101100.htm
DOE/Joint Genome Institute. "Experts Sets 'Gold Standard' For Metagenomic Data Analysis." ScienceDaily. www.sciencedaily.com/releases/2007/05/070514101100.htm (accessed December 20, 2014).

Share This


More From ScienceDaily



More Plants & Animals News

Saturday, December 20, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Researchers Test Colombian Village With High Alzheimer's Rates

Researchers Test Colombian Village With High Alzheimer's Rates

AFP (Dec. 19, 2014) In Yarumal, a village in N. Colombia, Alzheimer's has ravaged a disproportionately large number of families. A genetic "curse" that may pave the way for research on how to treat the disease that claims a new victim every four seconds. Duration: 02:42 Video provided by AFP
Powered by NewsLook.com
Monarch Butterflies Descend Upon Mexican Forest During Annual Migration

Monarch Butterflies Descend Upon Mexican Forest During Annual Migration

Reuters - Light News Video Online (Dec. 19, 2014) Millions of monarch butterflies begin to descend onto Mexico as part of their annual migration south. Rough Cut (no reporter narration) Video provided by Reuters
Powered by NewsLook.com
Birds Might Be Better Meteorologists Than Us

Birds Might Be Better Meteorologists Than Us

Newsy (Dec. 19, 2014) A new study suggests a certain type of bird was able to sense a tornado outbreak that moved through the U.S. a day before it hit. Video provided by Newsy
Powered by NewsLook.com
Navy Unveils Robot Fish

Navy Unveils Robot Fish

Reuters - Light News Video Online (Dec. 18, 2014) The U.S. Navy unveils an underwater device that mimics the movement of a fish. Tara Cleary reports. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Plants & Animals

Earth & Climate

Fossils & Ruins

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins