Understanding health data: Counterintuitive approach yields big benefits

Date:: May 15, 2015
Source:: The Agency for Science, Technology and Research (A*STAR)
Summary:: Extracting meaningful information out of clinical datasets can mean the difference between a successful diagnosis and a protracted illness. However datasets can vary widely both in terms of the number of 'features' measured and the number of independent observations taken. Now, researchers have developed an approach for targeted feature selection from datasets with small sample sizes, which tackles the so-called class imbalance problem.
Share:: Facebook Twitter Pinterest LinkedIN Email

FULL STORY

Extracting meaningful information out of clinical datasets can mean the difference between a successful diagnosis and a protracted illness. However datasets can vary widely both in terms of the number of 'features' measured and the number of independent observations taken. Now, A*STAR researchers have developed an approach for targeted feature selection from datasets with small sample sizes, which tackles the so-called class imbalance problem [1].

The class imbalance problem -- when the common, or 'majority class' data, overwhelm the rare, or 'minority class' data -- is a significant hurdle in data mining. This is particularly evident for datasets that have lots of features, known as high-dimensional data, or have few samples -- both of which are common to gene expression analysis and clinical data.

Feng Yang and colleagues from the A*STAR Institute of High Performance Computing took an unconventional approach to this problem. They began with a common pattern classification method called linear discriminant analysis (LDA). But to make feature selection tractable, the dataset had to be 'regularized'.

"After we analyzed the different forms of regularization," Yang recalls, "we found that one intrinsic difference of the existing forms of regularization is the class emphasis."

Existing regularization methods favored the majority class: "intuitively, the majority class should be given more emphasis weight since it has more samples," acknowledges Yang, "however, our study proved that this is not true in the high-dimensional, small-sized situation with class imbalance."

Indeed, their study showed that when the minority class was more heavily emphasized, that both the classification accuracy and the robustness performance improved.

"From the view of sample distribution in the subspace, minority class emphasis will actually 'squeeze' the samples in the minority class to form a compact 'nucleus' in the subspace of selected features, which would be easier to be classified," Yang explains.

The approach was tested experimentally on five gene microarray datasets, which suffered class imbalance -- with the number of samples ranging from 60 to 136 and the number of features from 2,000 to 12,600. By using an incremental approach, Yang and his team were able to significantly reduce the computational load related to feature selection from 4,215 seconds to 49 seconds.

"Due to some practical limitations, such as the very specific case of a rare disease in clinic data, many practical problems will be of high dimensionality, small sample size and class imbalance," Yang notes. "There are still issues that need to be addressed to deal with these kinds of problems."

The A*STAR-affiliated researchers contributing to this research are from the Institute of High Performance Computing

Story Source:

Materials provided by The Agency for Science, Technology and Research (A*STAR). Note: Content may be edited for style and length.

Journal Reference:

Feng Yang, K.Z. Mao, Gary Kee Khoon Lee, Wenyin Tang. Emphasizing Minority Class in LDA for Feature Subset Selection on High-Dimensional Small-Sized Problems. IEEE Transactions on Knowledge and Data Engineering, 2015; 27 (1): 88 DOI: 10.1109/TKDE.2014.2320732

Cite This Page:

The Agency for Science, Technology and Research (A*STAR). "Understanding health data: Counterintuitive approach yields big benefits." ScienceDaily. ScienceDaily, 15 May 2015. <www.sciencedaily.com/releases/2015/05/150514132813.htm>.

The Agency for Science, Technology and Research (A*STAR). (2015, May 15). Understanding health data: Counterintuitive approach yields big benefits. ScienceDaily. Retrieved July 24, 2026 from www.sciencedaily.com/releases/2015/05/150514132813.htm

The Agency for Science, Technology and Research (A*STAR). "Understanding health data: Counterintuitive approach yields big benefits." ScienceDaily. www.sciencedaily.com/releases/2015/05/150514132813.htm (accessed July 24, 2026).

Explore More

from ScienceDaily

Understanding health data: Counterintuitive approach yields big benefits

Explore More

Breaking

Trending Topics

Strange & Offbeat