Computer scientists at two research centers affiliated with the University of California have teamed with biologists from Perlegen Sciences, Inc., to map key genetic signposts across three human populations. Their study – published in the Feb. 18 issue of Science – could make widely accessible the analysis of human variation based on whole-genome data, and speed efforts to pinpoint DNA variations that are associated with disease or with how patients respond differently to drugs.
“This project sets a new milestone in the search for genetic elements linked to complex genetic diseases such as Alzheimer's, cancer and multiple sclerosis,” said co-author David R. Cox, Chief Scientific Officer at Mountain View, CA-based Perlegen. “Genome-wide analysis may soon become a standard methodology in the search for more effective, individualized treatments.”
Researchers at Perlegen sequenced the single-letter variations (called single-nucleotide polymorphisms, or SNPs) in the DNA of 71 individuals of European American, African American, and Han Chinese American ancestry. Subsequently, scientists at the California Institute for Telecommunications and Information Technology (Calit2) at the University of California, San Diego, and the UC Berkeley-affiliated International Computer Science Institute (ICSI) helped analyze the set of over 100 million genotypes from the over 1.5 million SNPs sequenced in each sample by Perlegen.
“This is the first time that a SNP data set of that scale is being sequenced,” said Eran Halperin, a research scientist at Berkeley-based ICSI. “For each of the 23 pairs of chromosomes in human DNA, the resulting data set consisted of 71 genotypes, which mix together the information from both copies of the chromosome. To see a clearer picture of a variation, we really want to know the variation on each chromosome, and we can do that by inferring haplotypes – the sequences of nucleotide bases in each copy of the chromosome.”
Halperin and Calit2 researcher Eleazar Eskin, who co-authored the study with Perlegen scientists, have pioneered a method for translating genotypes into haplotypes, using the HAP software tool they co-developed For this study, the bioinformatics researchers had to process more than 190 million data points. “Using other programs, haplotyping would require at least a few months of CPU time,” said Eskin, an assistant professor in Computer Science and Engineering at UC San Diego’s Jacobs School of Engineering. “Using HAP on a regular laptop, this work would take only 200 CPU hours. But we were able to use a cluster of computers from Calit2’s OptIPuter project, and that allowed us to perform our final entire analysis in less than 12 hours.”
Until now, due to the high cost of sequencing technology, disease association studies have traditionally been performed over short genomic regions. The Science study indicates that genome-wide association studies will now be possible for a considerably reduced budget, as scientists build on the publicly-available data and tools made available by Perlegen, ICSI and Calit2.
The researchers in San Diego and Berkeley also used the HAP tool to partition the human genome into ‘blocks’, or regions, of limited diversity. These are regions where only a few common patterns account for the majority of the variation in the population. The resulting haplotype ‘maps’ across the three populations appeared qualitatively similar to the maps compiled by Perlegen using a different technique called ‘linkage disequilibrium’ (LD). LD involves correlations of DNA variants in physical proximity along a chromosome, and results from a combination of processes including mutation, natural selection, and genetic drift. Linkage disequilibrium is complex and varies from one region of the genome to another, as well as between different populations. According to the study, “LD maps and haplotype maps represent somewhat different aspects of the local structure of genetic variation.”
“The partitioning of genomes into highly correlated regions may be extremely useful for geneticists worldwide,” added ICSI’s Halperin. “They could choose to sequence a small subset of SNPs in each region, and use the high correlations between the different SNPs in order to predict the SNPs that were not sequenced.”
The HAP study found substantially more blocks in the African American map than in the European American and Han Chinese maps, indicating that the greatest genetic diversity was in samples of African American descent (a finding consistent with previous studies).
Other findings in the Science paper, titled "Whole Genome Patterns of Common DNA Variation in Three Diverse Human Populations," include:
* Most functional human genetic variation is not population-specific;
* The majority of the 1.58 million SNPs with high-quality genotypes were common in all three populations; and
* “Private SNPs” – those SNPs segregating in only one population sample – were only 18% of the total.
Maps of the haplotype structure and the variants that are common in each region can be downloaded from the Calit2 HAP site, which is hosted by the National Biomedical Computational Resource at UCSD (see Related Links below). “We hope that researchers interested in specific regions of the genome will use this site to obtain information on the human variation in those regions,” said Calit2 director Larry Smarr. “This is a great example of the revolution in computational biology and its potential benefits to society in the study of cardiovascular disease, mental illness and other conditions thought to result from a complex interplay of multiple genetic and environmental factors.”
The SNPs analyzed in the Science study represent only a fraction of the more than 10 million common SNPs expected to exist in the human genome. But researchers at Perlegen developed a mathematical algorithm to identify so-called ‘tag SNPs’ that provide guideposts for finding common variants in the human genome. “This study and software tools mean that you no longer have to wait to do whole-genome association studies,” said Perlegen scientist David A. Hinds, lead author on the study. “We've effectively figured out how to reduce the genotyping burden by identifying a reduced set of tag SNPs, thus decreasing the difficulty and cost of association studies. That said, even when reducing to tag SNPs, we still need to be able to genotype at least several hundred thousand SNPs to have a comprehensive whole-genome association study.”
“This research provides a tool for exploring many questions remaining regarding the causal role of common human DNA variation in complex human traits and for investigating the nature of genetic variation within and between human populations," the Science paper concludes.
Perlegen is also cooperating with the public-sector International HapMap Project, which is expected to release more detailed descriptions of genetic variations later this year. “We see these two efforts as complementary,” said Perlegen’s Hinds. “The HapMap project will yield a denser map, with more SNPs across a deeper set of individuals." HapMap will describe variation across individuals of Japanese, Chinese, Nigerian and European ancestry.
Cite This Page: