Gene expression profiling is among the most commonly used analytical tools in biomedical research and is applied to predict preclinical and clinical endpoints, e.g. diagnosis of disease, risk assessment and response to treatment. However, the reliability of these predictions has not yet been established.
Johan Trygg and Max Bylesjö, researchers at Umeå University, have participated in a large international project (MAQC-II) aimed to examine and generate "best practice" protocols in data analysis for predicting clinical endpoints based on gene expression data. This project was coordinated by the United States Food and Drug Administration (FDA) and is part of its recent launch of a "Critical Path Initiative" to medical product development. The Umeå University researchers contributed with their expertise in the multivariate data analysis technique known as chemometrics.
The results have been published in the latest issue of the journal Nature Biotechnology.
Gene expression data can be used for diagnosis, early detection (screening) and prediction of response to treatment. However, the reliability of the predicted clinical endpoint can profoundly influence the results. In this project, gene expression profiles for 13 different endpoints from more than 3100 samples, including breast and lung cancer, were analyzed by 36 independent analysis teams that generated more than 30,000 prediction models for these 13 endpoints. This provides a unique resource for regulatory agencies and scientists.
"Even though the primary goal was not to evaluate individual contributions, I was very happy to see that our OPLS prediction models did so well, and ranked highest for one of the 13 endpoints," says Johan Trygg, associate professor, Computational Life Science Cluster (CLiC) at Umeå University, coordinator of the Swedish effort.
A large effort was put into the structure and review of the data analysis protocol, generation of 36 candidate models and the statistical validation, including blinded validation sets. Three observations were particularly highlighted. (1) The performance of the prediction models depend largely on the quality and relevance of data (2) The experience and proficiency of the data analysis team are crucial factors for success (3) Different prediction methods yield similar prediction results.
Understanding the limitations using gene expression data for predicting clinical endpoints is critical to the formulation of general guidelines and procedures for safe and effective use, e.g. development of diagnostic tests. The "best practice" guidelines provided by this unprecedented collaboration provide a solid foundation for other types of high-dimensional biological data such as proteins and metabolites to be applied for personalized medicine.
- Leming Shi, Gregory Campbell, Wendell D Jones, Fabien Campagne, Zhining Wen, Stephen J Walker, Zhenqiang Su, Tzu-Ming Chu, Federico M Goodsaid, Lajos Pusztai, John D Shaughnessy, André Oberthuer, Russell S Thomas, Richard S Paules, Mark Fielden, Bart Barlogie, Weijie Chen, Pan Du, Matthias Fischer, Cesare Furlanello, Brandon D Gallas, Xijin Ge, Dalila B Megherbi, W Fraser Symmans, May D Wang, John Zhang, Hans Bitter, Benedikt Brors, Pierre R Bushel, Max Bylesjo, Minjun Chen, Jie Cheng, Jing Cheng, Jeff Chou, Timothy S Davison, Mauro Delorenzi, Youping Deng, Viswanath Devanarayan, David J Dix, Joaquin Dopazo, Kevin C Dorff, Fathi Elloumi, Jianqing Fan, Shicai Fan, Xiaohui Fan, Hong Fang, Nina Gonzaludo, Kenneth R Hess, Huixiao Hong, Jun Huan, Rafael A Irizarry, Richard Judson, Dilafruz Juraeva, Samir Lababidi, Christophe G Lambert, Li Li, Yanen Li, Zhen Li, Simon M Lin, Guozhen Liu, Edward K Lobenhofer, Jun Luo, Wen Luo, Matthew N McCall, Yuri Nikolsky, Gene A Pennello, Roger G Perkins, Reena Philip, Vlad Popovici, Nathan D Price, Feng Qian, Andreas Scherer, Tieliu Shi, Weiwei Shi, Jaeyun Sung, Danielle Thierry-Mieg, Jean Thierry-Mieg, Venkata Thodima, Johan Trygg, Lakshmi Vishnuvajjala, Sue Jane Wang, Jianping Wu, Yichao Wu, Qian Xie, Waleed A Yousef, Liang Zhang, Xuegong Zhang, Sheng Zhong, Yiming Zhou, Sheng Zhu, Dhivya Arasappan, Wenjun Bao, Anne Bergstrom Lucas, Frank Berthold, Richard J Brennan, Andreas Buness, Jennifer G Catalano, Chang Chang, Rong Chen, Yiyu Cheng, Jian Cui, Wendy Czika, Francesca Demichelis, Xutao Deng, Damir Dosymbekov, Roland Eils, Yang Feng, Jennifer Fostel, Stephanie Fulmer-Smentek, James C Fuscoe, Laurent Gatto, Weigong Ge, Darlene R Goldstein, Li Guo, Donald N Halbert, Jing Han, Stephen C Harris, Christos Hatzis, Damir Herman, Jianping Huang, Roderick V Jensen, Rui Jiang, Charles D Johnson, Giuseppe Jurman, Yvonne Kahlert, Sadik A Khuder, Matthias Kohl, Jianying Li, Li Li, Menglong Li, Quan-Zhen Li, Shao Li, Zhiguang Li, Jie Liu, Ying Liu, Zhichao Liu, Lu Meng, Manuel Madera, Francisco Martinez-Murillo, Ignacio Medina, Joseph Meehan, Kelci Miclaus, Richard A Moffitt, David Montaner, Piali Mukherjee, George J Mulligan, Padraic Neville, Tatiana Nikolskaya, Baitang Ning, Grier P Page, Joel Parker, R Mitchell Parry, Xuejun Peng, Ron L Peterson, John H Phan, Brian Quanz, Yi Ren, Samantha Riccadonna, Alan H Roter, Frank W Samuelson, Martin M Schumacher, Joseph D Shambaugh, Qiang Shi, Richard Shippy, Shengzhu Si, Aaron Smalter, Christos Sotiriou, Mat Soukup, Frank Staedtler, Guido Steiner, Todd H Stokes, Qinglan Sun, Pei-Yi Tan, Rong Tang, Zivana Tezak, Brett Thorn, Marina Tsyganova, Yaron Turpaz, Silvia C Vega, Roberto Visintainer, Juergen von Frese, Charles Wang, Eric Wang, Junwei Wang, Wei Wang, Frank Westermann, James C Willey, Matthew Woods, Shujian Wu, Nianqing Xiao, Joshua Xu, Lei Xu, Lun Yang, Xiao Zeng, Jialu Zhang, Li Zhang, Min Zhang, Chen Zhao, Raj K Puri, Uwe Scherf, Weida Tong, Russell D Wolfinger. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology, 2010; 28 (8): 827 DOI: 10.1038/nbt.1665
Cite This Page: