Identification of Functional Modules by Integration of Multiple Data Sources Using a Bayesian Network Classifier
Background—Prediction of functional modules is indispensable for detecting protein deregulation in human complex diseases such as cancer. Bayesian network is one of the most commonly used models to integrate heterogeneous data from multiple sources such as protein domain, interactome, functional annotation, genome-wide gene expression, and the literature.
Methods and Results—In this article, we present a Bayesian network classifier that is customized to (1) increase the ability to integrate diverse information from different sources, (2) effectively predict protein–protein interactions, (3) infer aberrant networks with scale-free and small-world properties, and (4) group molecules into functional modules or pathways based on the primary function and biological features. Application of this model in discovering protein biomarkers of hepatocellular carcinoma leads to the identification of functional modules that provide insights into the mechanism of the development and progression of hepatocellular carcinoma. These functional modules include cell cycle deregulation, increased angiogenesis (eg, vascular endothelial growth factor, blood vessel morphogenesis), oxidative metabolic alterations, and aberrant activation of signaling pathways involved in cellular proliferation, survival, and differentiation.
Conclusions—The discoveries and conclusions derived from our customized Bayesian network classifier are consistent with previously published results. The proposed approach for determining Bayesian network structure facilitates the integration of heterogeneous data from multiple sources to elucidate the mechanisms of complex diseases.
- computational biology
- gene expression
- models, statistical
- protein interaction domains and motifs
- systems biology
- Received March 3, 2013.
- Accepted February 6, 2014.
- © 2014 American Heart Association, Inc.