Featured Research

from universities, journals, and other organizations

Genome Scientists Muster Computer Software Tools For Handling The Flood Of Raw Data From The Human Genome Project And Related Efforts

Date:
February 25, 2000
Source:
University Of California, Santa Cruz
Summary:
A new discipline has emerged at the intersection of computer science and biotechnology, bringing the power of advanced computational techniques to bear on complex problems in molecular biology. Called bioinformatics or computational biology, this new field is providing essential tools for scientists on the leading edge of research in genetics and other fundamental areas of biology.

WASHINGTON, D.C. -- A new discipline has emerged at the intersection of computer science and biotechnology, bringing the power of advanced computational techniques to bear on complex problems in molecular biology. Called bioinformatics or computational biology, this new field is providing essential tools for scientists on the leading edge of research in genetics and other fundamental areas of biology.

Gene sequencing efforts such as the Human Genome Project, combined with new techniques for studying the activity of genes in living cells, are generating enormous amounts of raw data. These data are accumulating at a rapidly accelerating pace in a variety of public computer databases, such as those maintained by the National Center for Biotechnology Information at the National Institutes of Health.

"The driving force behind bioinformatics is the availability of these large databases and the need to come up with sophisticated computer models for extracting useful information from them," said David Haussler, professor of computer science at the University of California, Santa Cruz.

Haussler discussed the use of computational techniques to analyze genetic data in a talk Saturday (February 19) at the annual meeting of the American Association for the Advancement of Science in Washington, D.C.

Haussler, who directs UCSC's Center for Biomolecular Engineering, recently joined the Human Genome Project's bioinformatics team. Bioinformatics is playing an increasingly important role in the project, an international effort to identify and understand all of the roughly 100,000 human genes.

"Computer analysis will be an integral part of identifying genes and understanding their functions," Haussler said.

The set of genetic instructions for making an organism -- its genome -- is contained in long, threadlike DNA molecules neatly packaged into chromosomes within the nucleus of every cell. The sequence of chemical units in the DNA is a kind of code that specifies the structures of protein molecules, which carry out most of the functions of living cells.

The complete DNA sequence of the human genome, if compiled in books, would fill 200 volumes the size of the Manhattan telephone book. Human Genome Project scientists are close to having a rough draft of this sequence, but that will only be a first step. Buried within the genome sequence are the genes -- DNA sequences that encode specific proteins -- which ultimately determine all the inherited characteristics of humans.

Locating genes within genomic DNA sequences is one of the first tasks for which scientists have turned to bioinformatics. Less than 10 percent of the human genome is thought to comprise protein-coding gene sequences. Interspersed with the genes are control sequences, which regulate gene activity, and other "noncoding regions" whose functions are obscure.

Haussler and his coworkers at UC Santa Cruz have developed some of the most effective computational techniques for finding genes in DNA sequences. They introduced a now widely used statistical method called hidden Markov modeling to attack this problem.

To analyze the rough draft of the human genome sequence, Haussler is working closely with researchers at the Massachusetts Institute of Technology's Whitehead Institute. The Whitehead Institute is one of five major sequencing sites involved in the Human Genome Project.

Working with the rough draft, however, will be a monumentally difficult task, Haussler said. "The problem is that the rough draft does not provide a continuous DNA sequence across each chromosome -- many regions of the genome are covered only by small pieces," he said.

The first task Haussler and the Whitehead group are tackling is to line up all of the segments of the human genome sequenced so far in their proper order and orientations along the chromosomes. The next step will be to locate genes within the genome sequence. This will be done in collaboration with Neomorphic, a Berkeley-based genomics company, using a computer program called Genie.

Genie was initially developed by Haussler's group and researchers at the Lawrence Berkeley National Laboratory (LBNL). It was exclusively licensed and further developed by Neomorphic, which was founded by a group of scientists from LBNL, UC Berkeley, and UCSC. Genie was recently used to identify genes in the genome of the fruit fly, Drosophila melanogaster, which was sequenced last year. Neomorphic is now developing a new version of Genie optimized for the rough draft of the human genome sequence.

Research on the genetics of organisms such as Drosophila, yeast, and the roundworm Caenorhabditis elegans has helped lay the groundwork for studying the much more complex genome of humans. Many human genes are closely related to genes found in these simpler organisms, which are widely used as model systems for research in genetics and molecular biology. Studies of these model organisms have already yielded many valuable insights into gene functions, normal gene regulation, genetic diseases, and evolutionary processes.

According to Haussler, the role for bioinformatics in this type of research is steadily increasing as the experimental methods become more sophisticated and complex. DNA microarrays or "gene chips," for example, provide valuable information about gene expression -- when, where, and to what extent specific genes are active. This information is critical to understanding a gene's biological function. But gene chips, like genomic sequencing technology, produce enormous amounts of data that can only be analyzed and understood using sophisticated computational approaches.

"There is a lot of information pertaining to gene function that is becoming available as a result of large-scale experiments using gene chips and other methods, which generate massive datasets relating to the functions of thousands of genes," Haussler said.

To analyze these complex datasets, Haussler is pioneering the use of a new statistical method based on the theory of support vector machines (SVMs). SVMs are able to handle high-dimensional datasets in which each data point has many features or attributes.

"It's hard to visualize because we live in a three-dimensional world, and we're talking about analyzing datasets in ten thousand or more dimensions. But we're finding SVMs extremely useful for gene chip data," Haussler said.

Genomic sequencing and gene chips represent what Haussler calls "high-throughput genomic technologies," powerful new techniques for understanding molecular biology. The use of these techniques is increasing, and all of them present significant computational challenges. One of Haussler's goals is to develop new statistical and algorithmic methods for integrating these diverse types of genomic data.

For the moment, analyzing the rough draft of the human genome sequence is the focus of Haussler's efforts. But in the long run, he foresees a happy and prosperous future for the marriage of computer science and molecular biology. The application of human genomics to areas such as drug discovery and clinical diagnostics, for example, will undoubtedly require new computational methodologies, he said.

"Our vision for bioinformatics spans a broad spectrum, from basic molecular biology all the way up to clinical diagnostics," Haussler said.

Additional information about Haussler's research program is available on the Web at http://www.cse.ucsc.edu/~haussler.


Story Source:

The above story is based on materials provided by University Of California, Santa Cruz. Note: Materials may be edited for content and length.


Cite This Page:

University Of California, Santa Cruz. "Genome Scientists Muster Computer Software Tools For Handling The Flood Of Raw Data From The Human Genome Project And Related Efforts." ScienceDaily. ScienceDaily, 25 February 2000. <www.sciencedaily.com/releases/2000/02/000225080127.htm>.
University Of California, Santa Cruz. (2000, February 25). Genome Scientists Muster Computer Software Tools For Handling The Flood Of Raw Data From The Human Genome Project And Related Efforts. ScienceDaily. Retrieved October 23, 2014 from www.sciencedaily.com/releases/2000/02/000225080127.htm
University Of California, Santa Cruz. "Genome Scientists Muster Computer Software Tools For Handling The Flood Of Raw Data From The Human Genome Project And Related Efforts." ScienceDaily. www.sciencedaily.com/releases/2000/02/000225080127.htm (accessed October 23, 2014).

Share This



More Health & Medicine News

Thursday, October 23, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Orthodontist Mom Jennifer Salzer on the Best Time for Braces

Orthodontist Mom Jennifer Salzer on the Best Time for Braces

Working Mother (Oct. 22, 2014) Is your child ready? Video provided by Working Mother
Powered by NewsLook.com
U.S. Issues Ebola Travel Restrictions, Are Visa Bans Next?

U.S. Issues Ebola Travel Restrictions, Are Visa Bans Next?

Newsy (Oct. 22, 2014) Now that the U.S. is restricting travel from West Africa, some are dropping questions about a travel ban and instead asking about visa bans. Video provided by Newsy
Powered by NewsLook.com
US to Track Everyone Coming from Ebola Nations

US to Track Everyone Coming from Ebola Nations

AP (Oct. 22, 2014) Stepping up their vigilance against Ebola, federal authorities said Wednesday that everyone traveling into the US from Ebola-stricken nations will be monitored for symptoms for 21 days. (Oct. 22) Video provided by AP
Powered by NewsLook.com
Doctors Help Paralysed Man Walk Again, Patient in Disbelief

Doctors Help Paralysed Man Walk Again, Patient in Disbelief

AFP (Oct. 22, 2014) Polish doctors describe how they helped a paralysed man walk again, with the patient in disbelief at the return of sensation to his legs. Duration: 1:04 Video provided by AFP
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Health & Medicine

Mind & Brain

Living & Well

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins