Featured Research

from universities, journals, and other organizations

Better Sequence Searches Of Genes And Proteins Devised

Date:
March 7, 2009
Source:
Ludwig-Maximilians-Universität München
Summary:
Often, the sequences of genes and proteins can suggest to us what their function is -- especially if we compare them with known sequences. Researchers have now developed a method that makes such analyses significantly more sensitive.

Often, the sequences of genes and proteins can suggest to us what their function is -- especially if we compare them with known sequences. Researchers have now developed a method that makes such analyses significantly more sensitive.

Since the sequencing of the human genome eight years ago, enormous progress has been made in analyzing and understanding it. Nevertheless, the function of most human genes is still barely understood. An important first step in determining the function of a gene or protein is to compare its sequence with the sequences of hundreds of other organisms that are experimentally easier to investigate.

From the functions of related genes or proteins identified in these database searches, the researchers can often infer the unknown functions of human or animal genes. Now, computational biologists Johannes Söding and Andreas Biegert of the Gene Center of LMU Munich have successfully developed a method that makes database searches significantly more sensitive, while being just as quick. Instead of comparing sequences letter by letter, their idea is to take the sequence neighbors surrounding each letter into account during the comparison. This idea should be generally applicable in other areas of sequence searching and sequence analysis.

The rule for both genes and proteins is: their function is primarily based on the sequence of their DNA or amino acid components. Genes with similar sequences frequently have a similar function. The same goes for proteins, although for them, the three-dimensional structure into which they fold, and which cannot be predicted offhand from their sequence, is equally important. Still, similar protein sequences suggest relatedness or, in other words, the descent from a common ancestral protein, and with it a similar function.

Accordingly, the sequences and functions of genes and proteins all get stored away into databases, which scientists around the world use for comparing their new data against. But even the best and most frequently used algorithms such as BLAST (Basic Local Alignment Search Tool) have to make use of certain simplifications in order to make efficient searching in the gigantic databases possible at all. After all, the researchers expect BLAST to compare a given sequence – the letter code describing the sequence of DNA components or amino acids – with all sequences in the database in just a few minutes.

Search engines like BLAST evaluate the similarity between a pair of sequences by aligning them underneath each other in such a way that similar amino acids come to lie in the same columns. The sequence similarity is then calculated by adding the similarities of all aligned amino acids. Here, the similarity between amino acids is measured by how often they mutate into each other without adverse effects, a measure that largely coincides with how similar their sizes and other biophysical properties are.

BLAST has been the most important method for sequence searching since its development in 1990. It is called up around 500,000 times a day from all around the world. Yet this tried and true program is far from perfect. When evaluating the similarity of two amino acids, it ignores their neighboring amino acids, their sequence context. Johannes Söding and Andreas Biegert of the Gene Center Munich and the cluster of excellence “Center for Integrated Protein Science Munich (CIPSM)” of LMU Munich have now developed a method that significantly improves similarity searches: Their “context-specific” BLAST, or CS-BLAST, can sniff out twice as many distant “relatives” of proteins as BLAST.

When determining the similarity of an amino acid to the reference sequence, CS-BLAST includes the sequence context of every amino acid, namely its six left and six right sequence neighbors, in the analysis. “The idea is that the context says much more about how likely two amino acids are to mutate into each other”, explains Söding, who heads the group for “Protein Bioinformatics and Computational Biology” at the Gene Center Munich. “Take as an example folded and unfolded regions in proteins. In an unfolded region, the amino acid valine, for example, can usually mutate into any of the other 19 amino acids without any adverse effect. In a folded region on the other hand, it will mutate with high probability into other hydrophobic, or water-repelling, amino acids.”

The program is based on a very general idea that can be applied to every kind of sequence search and alignment method. The researchers have demonstrated this at the example of PSI-BLAST, an algorithm in which the related sequences already found are aligned one under the other into a so-called multiple alignment. This makes it possible, for example, to identify positions at which only certain amino acids can occur, which improves PSI-BLAST's ability to distinguish related from unrelated proteins. “We managed to increase PSI-BLAST's sensitivity significantly by making use of the sequence context. That way, two consecutive searches using our context-specific version of PSI-BLAST deliver better results than five searches using the conventional engine,” says Söding.

The new method is just as fast despite its better sensitivity, explains the researcher, because the sequence search takes place in two steps: “Both in conventional BLAST and in our method, a search matrix is first calculated,” Söding continues. “This step is more complicated when you do it our way, but at one second, it is still very fast. Only the second step, the database search using the search matrix, takes a lot of time – and this step is the same for both approaches.”

In the future, the scientists intend to apply the newly developed algorithm to genomic alignments as well, where not only individual genes, but rather entire segments of DNA are compared. “As with proteins, there are certain key regions in DNA that fulfill crucial regulatory functions,” explains Söding. “You can identify these regulatory regions, which are important for a deeper understanding of many diseases, by comparing the human genome with those of other mammals.” Using a context-specific method, the LMU researchers intend to substantially improve the quality of such genomic alignments and, with it, the identification of regulatory regions. “We believe context-specific methods could become standard throughout the entire field of biological sequence analysis,” concludes Söding.


Story Source:

The above story is based on materials provided by Ludwig-Maximilians-Universität München. Note: Materials may be edited for content and length.


Journal Reference:

  1. Biegert et al. Sequence context-specific profiles for homology searching. Proceedings of the National Academy of Sciences, Feb 20, 2009; DOI: 10.1073/pnas.0810767106

Cite This Page:

Ludwig-Maximilians-Universität München. "Better Sequence Searches Of Genes And Proteins Devised." ScienceDaily. ScienceDaily, 7 March 2009. <www.sciencedaily.com/releases/2009/02/090223131125.htm>.
Ludwig-Maximilians-Universität München. (2009, March 7). Better Sequence Searches Of Genes And Proteins Devised. ScienceDaily. Retrieved October 23, 2014 from www.sciencedaily.com/releases/2009/02/090223131125.htm
Ludwig-Maximilians-Universität München. "Better Sequence Searches Of Genes And Proteins Devised." ScienceDaily. www.sciencedaily.com/releases/2009/02/090223131125.htm (accessed October 23, 2014).

Share This



More Plants & Animals News

Thursday, October 23, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Working Mother DIY: Pumpkin Pom-Pom

Working Mother DIY: Pumpkin Pom-Pom

Working Mother (Oct. 22, 2014) — How to make a pumpkin pom-pom. Video provided by Working Mother
Powered by NewsLook.com
Goofy Dinosaur Blends Barney and Jar Jar Binks

Goofy Dinosaur Blends Barney and Jar Jar Binks

AP (Oct. 22, 2014) — A collection of dinosaur bones reveal a creature that is far more weird and goofy-looking than scientists originally thought when they found just the arm bones nearly 50 years ago, according to a new report in the journal Nature. (Oct. 22) Video provided by AP
Powered by NewsLook.com
San Diego Zoo's White Rhinos Provide Hope for the Critically Endangered Species

San Diego Zoo's White Rhinos Provide Hope for the Critically Endangered Species

Reuters - Light News Video Online (Oct. 22, 2014) — The pair of rare white northern rhinos bring hope for their species as only six remain in the world. Elly Park reports. Video provided by Reuters
Powered by NewsLook.com
Raw: Bear Cub Strolls Through Oregon Drug Store

Raw: Bear Cub Strolls Through Oregon Drug Store

AP (Oct. 22, 2014) — Shoppers at an Oregon drug store were surprised by a bear cub scurrying down the aisles this past weekend. (Oct. 22) Video provided by AP
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:

Strange & Offbeat Stories

 

Plants & Animals

Earth & Climate

Fossils & Ruins

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins