The DNA in a human cell is 2 yards long and wraps around millions of bead-like histone proteins to fit inside the cell's nucleus. Researchers at Rice University and Baylor College of Medicine showed that examining the chemical state of these proteins makes it possible to predict how an entire DNA chromosome will fold.
Researchers based at Rice's Center for Theoretical Biological Physics (CTBP) have constructed computer models to analyze epigenetic marks, which include proteins bound to DNA as well as chemical modifications to histone proteins. They harvested the information encoded in these markings to predict how the chromosomes fold in three dimensions.
Their findings move the field of genetics closer to the ability to predict the folded structure of entire genomes, which could someday help identify misfolding-related genetic diseases.
The work appears this week in the Proceedings of the National Academy of Sciences.
Packed into the nucleus, DNA folds into a functional form that differs in various types of cells. Because every cell in an organism contains the same DNA, epigenetic marks help it find the right form for the type of cell it inhabits.
"Something on top of the genetic code tells the cell what it's supposed to be and determines which parts of the chromosome are going to be read at any given time," said biophysicist Peter Wolynes, a co-author of the paper. "These are the so-called epigenetic marks."
Collectively, epigenetic marks help package the genome into the loose but highly organized compartments it adopts during interphase, the working "middle age" in the life of a cell. These compartments bring transcription-related genes into close proximity and allow them to communicate and function.
Epigenetic marks can be revealed by an established technique called ChIP-sequencing, which maps protein-binding sites along DNA.
"We don't understand exactly how the genome gets marked, but we can measure it through ChIP-sequencing, which has become a fairly straightforward experiment," Wolynes said. "In the same way that we can view genetic code (the DNA), we can also measure these marks directly in many different cells. They've become the next layer of sequence on the genome."
"It's another tier of information," said co-author and biophysicist José Onuchic. "Every one of your cells' DNA is the same. However, different kinds of cells have different epigenetics, so their expression patterns are different."
Co-lead authors and Rice postdoctoral fellows Michele Di Pierro and Ryan Cheng used ChIP-sequencing data for a human lymphoblast cell that probes 84 different DNA-binding proteins and 11 chemical modifications of histones. Histone proteins help organize the genome by acting as spools around which DNA wraps.
Using data from just some of the chromosomes, they trained a custom neural network called MEGABASE (Maximum Entropy Genomic Annotation from Biomarkers Associated with Structural Ensembles) to output a sequence of chromatin types. That revealed how the epigenetic marks were related to the compartments, they said. Once trained, they validated the MEGABASE model by feeding it data from the remaining chromosomes. That produced a fresh set of structural types for analysis by the Rice team's MiChroM program, a cousin of the lab's AWSEM energy landscape algorithm that predicts the structures of proteins. The MiChroM algorithm predicted the 3-D structures of the chromosomes.
"Our findings support the idea that compartmentalization in chromosomes arises from the phase separation of different chromatin types in the nucleus, like the separation of oil and water," Cheng said.
When the researchers reduced the original dataset to just the 11 histone markings and ran the calculations again, the results were only marginally different. Ultimately, they determined histone data alone are sufficient to predict a chromosome's form. "There's a well-defined code that connects the histone markings to the structure," Di Pierro said. "It's well-conserved, so it's likely that it has a function."
To validate their theory, the researchers compared their results with contact maps of lymphoblast cells generated by Hi-C. This experimental technique, which uses high-throughput sequencing to identify folding patterns in DNA, was developed by co-author Erez Lieberman Aiden, director of Baylor's Center for Genome Architecture and a senior investigator at the CTBP.
"This paper says we can take one-dimensional information about histones and use it with our big-data tools to predict three-dimensional structure," Wolynes said.
Their success gets the team closer to the ultimate goal of a theory that predicts the architecture of an entire genome. However, a chicken-or-the-egg problem remains: Does chromatin fold because of the markers, or do the markers appear because of the folding?
"It's all part of our fascination with how life works," Di Pierro said. "It's a beautiful problem."
Materials provided by Rice University. Note: Content may be edited for style and length.
Cite This Page: