A new website, http://biogeowarehouse.cse.psu.edu, offers a prototype for online access to an analytical toolbox that enables biomedical researchers to integrate dissimilar data from a variety of sources and extract the most useful information from it by posing queries.
Dr. Raj Acharya, professor of computer science who headed the site development project, says, "Right now, the prototype focuses on prostate cancer data but our online toolbox could be used for dissimilar data sets for any disease."
For example, using the prostate cancer data sets, researchers can pose questions such as the following: What percentage of the patients recorded have a family history of prostate cancer? or How many patients have been categorized with different pathologic T stages? or Give me the average expression vector for patients with Gleason sum score of 4.
To come up with answers, the toolbox applies information fusion techniques to integrate multiple and dissimilar data sets so that all of the relevant data can be used simultaneously in advanced analysis.
Acharya says information fusion is new to the biological sciences as well as some of the other tools in the online toolbox, including software he and his research group developed to combine gene information with gene sequence information.
The toolbox is detailed in a paper, "An Online Analysis and Information Fusion Platform for Heterogeneous Biomedical Informatics Data," presented Thursday, June 23, at the IEEE Conference for Computer Based Medical Systems in Dublin, Ireland. The software will also be demonstrated during the International Symposium on Intelligent Systems for Molecular Biology on Wednesday, June 29, in Detroit, MI. The authors are Srivatsava Ranjit Ganta, doctoral candidate in computer science; Jyotsna Kasturi, doctoral candidate in computer science; Dr. John Gilbertson, M.D., assistant professor of cellular and molecular pathology, University of Pittsburgh, School of Medicine; and Acharya, who is also head of Penn State's Department of Computer Science and Engineering.
The online toolbox uses data fusion techniques originally developed by the military to fuse laser radar, heat sensor and TV images as well as other information. The fusion software puts the data together in a way that makes it possible to consider all of it that is relevant to a particular question.
Current biomedical research requires analysis of patient demographics, clinical and pathology data, treatment history, and patient outcomes as well as gene expression, sequence and gene ontologies. Acharya says the extent of knowledge that can be extracted from any of the individual data sets is limited. However, using the online toolbox researchers can perform analyses in an integrated manner that could lead to better disease diagnosis, prognosis, treatment and drug discovery.
The toolbox performs information fusion using multidimensional analysis and clustering techniques. For example, to answer the question, Give me the average expression vector for patients with Gleason sum score of 4, the software classifies the data sets into categories from which the user chooses the facts and dimensions. Based on this selection, the system presents the user with an initial view of the information subset. The user is then allowed to explore this subset and further focus on the knowledge of interest by using the operations: Summarize and Detail.
The toolbox was developed for the Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC), a collaboration including the Penn State Cancer Institute at Penn State Hershey Medical Center, the University of Pittsburgh Cancer Institute, the Wistar Institute, Fox Chase Cancer Center and the Thomas Jefferson University Kimmel Cancer Center. The Consortium and the website project are supported by grants funded by Pennsylvania's share of the national tobacco settlement fund.
Cite This Page: