Computers Close In On Protein Structure Prediction
- Date:
- September 17, 2005
- Source:
- Howard Hughes Medical Institute
- Summary:
- Computers can predict the detailed structure of small proteins nearly as well as experimental methods, at least some of the time, according to new studies by HHMI researchers. The findings provide a glimmer of hope that scientists eventually may be able to determine the structure of proteins from their genomic sequences, a problem that has seemed insurmountable.
- Share:
Computers can predict the detailed structure of smallproteins nearly as well as experimental methods, at least some of thetime, according to new studies by Howard Hughes Medical Instituteresearchers.
The findings, which were reported in the September16, 2005, issue of the journal Science, provide a glimmer of hope thatscientists eventually may be able to determine the structure ofproteins from their genomic sequences, a problem that has seemedinsurmountable.
"For more than 40 years, people have known theamino acid sequence of a protein specifies its three-dimensionalstructure, but no one has been able to translate the sequence into anaccurate structure," said senior author David Baker, an HHMI researcherat the University of Washington. "The reason this research is excitingis that we're showing progress in predicting the structure from thesequence. It's not that the problem is solved, but that there is hope."
Proteinsare biological machines, and scientists need to determine theirstructures to understand how the proteins work. Now, scientistsdetermine structures exclusively by measuring the atomiccharacteristics of proteins in the lab. In contrast, "in this case, wenever touched a test tube," Baker said. "We gave it to a computer andsaid, 'go.'"
In the study, a sophisticated computer programfolded 17 short strings of amino acids into 100,000 possiblevariations. When the researchers compared the best predictions to theactual structures solved earlier by other scientists using experimentaltechniques, they had the same success rate as the best hitters in majorleague baseball.
"For about one-third of our benchmark set ofsmall proteins, we generated relatively high-resolution structurepredictions, with parts of the structures predicted to near-atomicresolution," said first author Philip Bradley, a postdoctoral fellow inBaker's lab. "For us, it is a real step forward to achieve structuresthat are in some way comparable to what you can get by experiments."
Theencouraging results come from a refinement of a sophisticated computermodeling program called Rosetta, first developed several years ago inBaker's lab. The program works on the premise that proteins collapseinto their lowest energy state, like a ball that rolls down a hilluntil it comes to rest on level ground. The energies of hundreds ofthousands of possible shapes generated by the computer are computed,and the lowest energy shape is selected as the prediction.
Theprediction process happens in two steps, Bradley said. The first stageuses an approximate model which allows rapid calculation of the energyand so can be carried out rapidly, while the second uses a verydetailed model for which the energy calculations take much longer butare much more accurate. A large scale search through possiblestructures is carried out in the first stage, and promising locationsare then explored in detail in the second stage.
The first stagetakes advantage of the fact that all amino acids have identicalsections, which form the protein backbone. The computer adds a fuzzypicture of the protruding side chains that give each amino acid itsunique identity. The sequence of side chains ultimately gives eachprotein its characteristic shape by the environment and neighbors theyprefer.
Then the computer randomly twists, loops, and bends eachamino acid sequence into 100,000 different shapes based on thepreferred location of the amino acids. Some amino acids tend to divetoward the watery world of the protein surface while others take coverinside the protein. The computer also accounts for the social habits ofthe 20 amino acids; some want to be close to each other and others liketheir distance.
In stage two, Rosetta replaces the fuzzy pictureof the side chains with detailed, physically realistic models with allthe atoms represented. From the positions of the atoms in thesidechains and the protein backbone, the computer then uses a detailedphysical chemistry based force field which favors close packing ofatoms and hydrogen bonding to more accurately compute the energy of thestructure.
"What seems to be critical is the packing of themolecule," Baker said. "The protein fits together perfectly with noholes in the middle, and no atoms on top of each other. It's about asdensely packed as it could be. It's like a three-dimensional jigsawpuzzle."
The researchers upped their odds of finding the rightmatch by repeating the two-step process with 50 homologs of theproteins from other genomes, such as a mouse or fly. The protocol wasfirst tested on a blind annual prediction test considered to be thehighest standard for removing bias from protein structure predictionmodels.
"We can't compute the energies perfectly, but the biggestproblem is the search through possible shapes," Baker said. "Where wewere not getting the right answer on the computer, it was almost alwaysthe case that the actual structure had the lowest energy, so we wouldhave succeeded if we had explored this part of the space."
In arelated paper published in the August issue of the journal Proteins,Baker and his colleagues reported that similar approaches can be usedto predict the structures of protein complexes. "For the first time,computational methods are able, for a subset of cases, to producereally accurate models," he said.
Baker compares the computersimulations of the proteins to the problem of trying to find the lowestpoint on the surface of the Earth for the first time. A simple way tofind the lowest place on the planet is to send out as many explorers aspossible. The more explorers there are the more likely one of them isto stumble onto the shoreline of the Dead Sea - the Earth's lowestpoint on land not covered by water. Each of the thousands of computersimulations is like one explorer.
Although the 33 percent successrate reported in the Science paper might be good enough to securehall-of-fame status for a baseball player, Baker is quick to point outthat it is not yet reliable enough for biology. Better models willdepend on both smarter exploration strategies and more computer power."If methods stayed where we are, we wouldn't solve the problem," Bakersaid. "On the other hand, we would do better with 10 times morecomputer time."
It takes less than one minute for a protein tofold into its correct shape in cells, but one oft-repeated estimatepredicts it would take longer than the age of the universe for acomputer to sample all the possible confirmations of a folded protein.Baker's lab already receives help from supercomputing centers in SanDiego and Illinois.
More help will soon be on its way from manyof the 5,000 freshman entering University of Washington this fall.Using software developed to assist the Search for ExtraterrestrialIntelligence (SETI) project, the students can put their computers towork at night while they are sleeping to search the atomic landscapefor the lowest energy structure of proteins.
To improve proteinstructure prediction further, Baker's group has also started adistributed computing project that they are hoping will be aided bymembers of the public. The project, called Rosetta@home, is ascientific research project that uses internet-connected computers topredict and design protein structures, and protein-protein andprotein-ligand interactions. The goal is to develop methods thataccurately predict and design protein structures and complexes, anendeavor that may ultimately help researchers develop cures for humandiseases such as cancer, HIV/AIDS, and malaria. More information isavailable online at http://boinc.bakerlab.org/rosetta.
Story Source:
Materials provided by Howard Hughes Medical Institute. Note: Content may be edited for style and length.
Cite This Page: