Science News

... from universities, journals, and other research organizations

UC Santa Cruz Provides Access to Encyclopedia of the Human Genome

Sep. 5, 2012 — A massive international collaboration has enabled scientists to assign specific functions for 80 percent of the human genome, providing new insights into the mechanisms of gene regulation and giving biomedical researchers a solid genetic foundation for understanding how the body works in health and disease.


Share This:

The results of the Encyclopedia of DNA Elements (ENCODE) project are described in a coordinated set of 30 papers published in several journals on September 5, 2012. Scientists at the University of California, Santa Cruz, have operated the Data Coordination Center for ENCODE since an initial pilot project began in 2003, and they have made all of the ENCODE data available for public use through the UCSC Genome Browser.

"Our job was to gather data from 32 labs running different types of experiments on a staggering array of cells and tissues, and we had to establish a common data language so we could get it all into a single database that scientists across the world could use. We also developed a lot of new ways of looking at the data, creating search and visualization tools so that people could find the data most relevant to them," said Jim Kent, director of the UCSC Genome Browser project and head of the ENCODE Data Coordination Center.

ENCODE is supported by the National Human Genome Research Institute (NHGRI), one of the National Institutes of Health. Hundreds of researchers across the United States, United Kingdom, Spain, Singapore, and Japan performed more than 1,600 sets of experiments on 147 types of tissue using technologies standardized across the consortium. In total, ENCODE generated more than 15 trillion bytes of raw data, and the data analysis consumed the equivalent of more than 300 years of compute time.

"We've come a long way, and we have learned an incredible amount by integrating the different types of data that ENCODE produced, which was done at a scale never before achieved in biology. This data integration was one of the keys to the success of the project," said Ewan Birney of the European Bioinformatics Institute in the United Kingdom, lead analysis coordinator of the ENCODE data.

For Kent and his data coordination team at UCSC's Center for Biomolecular Science and Engineering, the scale of the project presented many challenges. To start with, they had to coordinate a small army of researchers who were producing data in labs around the world. "We had five data wranglers who traveled around to the labs, probably four conference calls a week at the height of it, plus large group meetings twice a year, and countless emails and skype calls," Kent said.

Researchers were able to map more than 4 million regulatory regions in the human genome where proteins specifically interact with the DNA. These findings represent a significant advance in understanding the precise and complex controls over how and when genes are active within a cell.

"The regulatory elements are responsible for ensuring that you get crystalline protein in the lens of your eye and hemoglobin in your blood, and not the other way around," Kent said. "It's quite complex. The information processing and the intelligence of the genome reside in the regulatory elements. With this project, we probably went from understanding less than five percent to now around 75 percent of them."

The ENCODE data are rapidly becoming a fundamental resource for researchers working to understand human biology and disease. More than one hundred papers using ENCODE data have already been published by investigators who were not part of the ENCODE project. For example, researchers studying the genetic basis of human diseases use genome-wide association studies to identify disease-associated variants, or markers, in the genome, and they are using the ENCODE resource in an effort to determine which of the many specific variants identified in a study actually contribute to disease. These disease-associated variants map not only to protein-coding regions of the genome, but more often to the non-coding regions of the genome, the vast tracts of sequence between genes where ENCODE has identified many regulatory sites.

"As much as nine out of 10 times, disease-linked genetic variants are not in protein-coding regions," said Mike Pazin, an ENCODE program director at NHGRI. "Far from being 'junk' DNA, this regulatory DNA clearly makes important contributions to human disease."

The coordinated publication set includes one main integrative paper and five other papers in the journal Nature; 18 papers in Genome Research; and six papers in Genome Biology. The ENCODE data are so complex that the three journals have developed a pioneering way to present the information in an integrated form that they call "threads." Since the same topics were addressed in different ways in different papers, a new website will allow anyone to follow a topic through all of the papers in the ENCODE publication set in which it appears. In addition to the "threaded papers," six review articles are being published in the Journal of Biological Chemistry, and other affiliated papers in Science, Cell, and other journals.

Despite the enormity of the data set described in this historic set of publications, it does not comprehensively describe all of the functional elements in all of the different types of cells in the human body. Much additional work needs to be done, and ENCODE is about to be renewed for an additional four years. During the next phase, ENCODE will increase the depth of the catalog with respect to the types of functional elements and cell types studied. It will also develop new tools for more sophisticated analyses of the data.

Share this story on Facebook, Twitter, and Google:

Other social bookmarking and sharing tools:

|

Story Source:

The above story is reprinted from materials provided by University of California - Santa Cruz. The original article was written by Tim Stephens.

Note: Materials may be edited for content and length. For further information, please contact the source cited above.


APA

MLA

Note: If no author is given, the source is cited instead.

Search ScienceDaily

Number of stories in archives: 137,088

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily's archives for related news topics,
the latest news stories, reference articles, science videos, images, and books.

Recommend ScienceDaily on Facebook, Twitter, and Google:

Other social bookmarking and sharing services:

|

 
  more breaking science news

Social Networks


Recommend ScienceDaily on Facebook, Twitter, and Google +1:

Other social bookmarking and sharing tools:

|

Breaking News

... from NewsDaily.com

In Other News ...

Science Video News


Doggy Genes

Molecular biologists have completely sequenced the first dog genome. Understanding how genetics plays a role in canine diseases could lead to new. ...  > full story

Strange Science News

 

Free Subscriptions

... from ScienceDaily

Get the latest science news with our free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Feedback

... we want to hear from you!

Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?

Post this page to your favorite social bookmarking site:
Include this item in your blog or web site:
Cite this article in your essay, paper, or report:
Email this page's link to a friend or colleague: