The Cancer Genome Atlas (TCGA) dataset provides us more opportunities to systematically and comprehensively learn some biological mechanism of cancers formation, growth and metastasis. Since TCGA dataset includes heterogeneous data, it is one of the bioinformatics bottlenecks to mine some meaningful information from them. In this paper, to improve the performance of Robust Principal Component Analysis (RPCA) analyzing these heterogeneous data, a modified RPCA-based method, Block-Constraint Robust Principal Component Analysis (BCRPCA), is proposed. Since different categories data have different peculiarities, BCRPCA enforces different constraint intensities on different categories to improve the performance of RPCA. Firstly, the observation matrix of TCGA data is decomposed into two adding matrices A and S by using BCRPCA. Secondly, we use a ranking scheme to evaluate every feature and project these features to the genes. Then, the genes with high scores will be identified as differentially expressed ones. The main contributions of this paper are as following: firstly, it proposes, for the first time, the idea and method of BCRPCA to model TCGA data; secondly, it provides a BCRPCA-based framework for integrated analysis of TCGA data. The results show that our method is effective and suitable to analyze these data.
Sign-in or become an IEEE member to discover the full contents of the paper.