S. Brdar, V. Crnojević1, B. Zupan
Recent developments in molecular biology and techniques for genome-wide data acquisition have resulted in abundance of data to profile genes and predict their function. These data sets may come from diverse sources and it is an open question how to commonly address them and fuse them into a joint prediction model. A prevailing technique to identify groups of related genes that exhibit similar profiles is profile-based clustering. In this paper we propose a technique that develops separate gene clusters and fuses them by means of non-negative matrix factorization. Gene clusters are inferred from gene networks that are built from each of available data sources by applying various estimates of gene profile similarity. We use gene profile data on the budding yeast S. cerevisiae to demonstrate that this approach can successfully integrate heterogeneous data sets and yield high-quality clusters that could otherwise not be inferred by simply merging the gene profiles prior to clustering. The main contributions of our work include the proposed algorithm for extracting final clusters after non-negative factorization of ensemble gene-cluster membership matrix, in-depth view on different integration scenarios and evaluation of proposed algorithm within the scope of functional genomics.
Read full article at IEEE Xplore.
Tags: Bioinformatics, Clustering algorithms, DH-HEMTs, Data integration, Gene expression, Informatics, Matrix decomposition