Mathematicians at Michigan Technological University have developed powerful new tools for winnowing out the genes behind some of humanity’s most intractable diseases.
With one, they can cast back through generations to pinpoint the genes behind inherited illness. With another, they have isolated 11 variations within genes—called single nucleotide polymorphisms, SNPs or "snips"—associated with type 2 diabetes.
"With chronic, complex diseases like Parkinson's, diabetes and ALS [Lou Gehrig's disease], multiple genes are involved," said Qiuying Sha, an assistant professor of mathematical sciences. "You need a powerful test."
That test is the Ensemble Learning Approach (ELA), software that can detect a set of SNPs that jointly have a significant effect on a disease.
With complex inherited conditions, including type 2 diabetes, single genes may precipitate the disease on their own, while other genes cause disease when they act together. In the past, finding these gene-gene combinations has been especially unwieldy, because the calculations needed to match up suspect genes among the 500,000 or so in the human genome have been virtually impossible.
ELA sidesteps this problem, first by drastically narrowing the field of potentially dangerous genes, and second, by applying statistical methods to determine which SNPs act on their own and which act in combination. "We thought it was pretty cool," Sha said.
To test their model on real data, Sha’s team analyzed genes from over 1,000 people in the United Kingdom, half with type 2 diabetes and half without. They identified 11 SNPs that, singly or in pairs, are linked to the disease with a high degree of probability. Their work was published in the journal Genetic Epidemiology.
ELA is used to compare the genetic makeup of unrelated individuals to sort out disease-related genes. The team has also developed another approach, which uses a two-stage association test that incorporates founders' phenotypes, called TTFP, that can examine the genomes of family members going back generations.
"In the past, researchers have dealt with the nuclear family, parents and children, but this could go back to grandparents, great-grandparents . . . as far back as you want."
The team has published their findings in the European Journal of Human Genetics.
Now that they’ve developed the software, the analysis is relatively simple, says Sha. But getting the genetic data to work on is not. "We don’t have the data sets yet to work with," she says, clearly frustrated. "That’s the problem with having no medical school."
Those who do have data sets, however, can use the team’s software to help find the genes associated with a panoply of illnesses. ELA is available in Windows and Linux versions at http://www.math.mtu.edu/~shuzhang/software.html, and TTFP is available by request.
- Zhang et al. An ensemble learning approach jointly modeling main and interaction effects in genetic association studies. Genetic Epidemiology, 2008; 32 (4): 285 DOI: 10.1002/gepi.20304
- Tao Feng, Shuanglin Zhang and Qiuying Sha. Two-stage association tests for genome-wide association studies based on family data with arbitrary family structure. European Journal of Human Genetics, 15, 1169%u20131175; DOI: 10.1038/sj.ejhg.5201902
Cite This Page: