Using a sophisticated computer algorithm, a team of scientists at the Whitehead Institute has designed a new technique to analyze the massive amounts of data generated by DNA microarrays, also known as DNA chips. This technique will help scientists decipher how our 100,000 genes work together to keep us healthy and how diseases result when they fail.
"DNA arrays have revolutionized DNA analysis by allowing us to observe the activities of thousands of genes simultaneously," says Todd Golub, research scientist at the Whitehead/MIT Center for Genome Research. "But until now, it's been really difficult to interpret this extraordinarily complex raw data. Our technique is among the first in a new generation of tools that will speed up the analysis of the enormous amounts of genetic data emerging from laboratories worldwide."
Dr. Golub and his colleagues at the Whitehead Institute, Dana-Farber Cancer Institute, Dartmouth Medical School, and the Massachusetts Institute of Technology, report their technique in the March 16 issue of the Proceedings of the National Academy of Sciences. The Whitehead/MIT Center for Genome Research is one of the flagship centers of the U.S. Human Genome Project, the effort to determine the 3 billion letters that make up the human blueprint.
"The core of the technique is an algorithm, called a self-organizing map (SOM), that takes advantage of the fact that many genes in a cell behave similarly," explains Pablo Tamayo, the lead author of the paper and research scientist at the Whitehead Institute. "Instead of having 2,000 individual genes, all doing different things, you might have 25 groups of genes doing similar things."
Tamayo compares the final product of the SOM to an executive summary for CEOs. Rather than having to read every page of a 1,000-page report, CEOs can get an overview of the report by simply reading the summary. "It's impossible to visually inspect every gene," he says. "This method produces a quick scan of what's going on with thousands of genes."
The researchers created a computer package called GENECLUSTER, which organizes the activities of thousands of genes in only minutes. To test GENECLUSTER, they analyzed the genes expressed in several models of leukemia cell growth. In many cases, the algorithm identified genes known to be important in this process, but occasionally it also identified unexpected genes. This finding suggests that the method might be useful in helping to identify the function of unknown genes. "Because genes that have similar functions are generally expressed in the same basic pattern, knowing the expression pattern of a gene could help identify its function," explains Tamayo.
SOMs have been used widely in data mining, particularly for large or messy datasets like stock market data, but this study is the first to apply them to gene analysis.
The study was supported in part by consortium of three companies -- Bristol-Myers Squibb Company; Affymetrix, Inc.; and Millennium Pharmaceuticals Inc.-- that formed a unique corporate partnership to fund a five-year research program in functional genomics at the Whitehead/MIT Genome Center. It was also supported by grants from the National Institutes of Health to the Lander and Dmitrovsky labs.
The paper is titled "Interpreting patterns of gene expression with self-organzing maps: Methods and applications to hematopoietic differentiation." The authors are: Pablo Tamayo, Donna Slonim, and Jill Mesirov, of the Whitehead Institute; Qing Zhu, of the Dana-Farber Cancer Institute; Sutisak Kitareewan and Ethan Dmitrovsky, of the Department of Pharmacology and Toxicology at Dartmouth Medical School; Eric Lander, of the Whitehead Institute and the Massachusetts Institute of Technology; and Todd Golub, of the Whitehead Institute and the Dana-Farber Cancer Institute.
The above story is based on materials provided by Whitehead Institute For Biomedical Research. Note: Materials may be edited for content and length.
Cite This Page: