# Network Biology in Medicine and Beyond

## Jump to

- Article
- Introduction
- Physical Interactions and Biological Network Databases
- Gene Regulatory Networks
- Signaling Pathways and Metabolic Networks
- Protein–Protein Interactions
- Biological Network Models
- Network Visualization and Network Analysis
- Conclusions and Future Perspectives
- Sources of Funding
- Disclosures
- Footnotes
- References

- Figures & Tables
- Info & Metrics
- eLetters

- computational biology
- gene regulatory networks
- models, statistical
- protein interaction networks
- systems biology

## Introduction

In the past decade, we have witnessed great advances in high-throughput genomic and proteomic profiling technologies, such as DNA microarrays, next-generation sequencing and mass spectrometry–based proteomics and metabolomics. These advances, capable of generating massive amounts of genomic, transcriptomic, proteomic, and metabolomic data, provide new opportunities to understand human diseases, identify potential biomarkers, and develop new treatments.^{1} This data-intensive paradigm has fundamentally transformed biomedical science and holds great promise for the betterment of human health.^{2}

These high-throughput biotechnologies have also heightened the challenges of how to distill biological knowledge and novel insights from the sea of data. New computational approaches and statistical models are needed to effectively model and better interpret these high-dimensional, multiplatform data.^{1} Reductionism has led to tremendous success in molecular biology: we can now zoom in to study each individual genes or proteins, being able to study their composition and aberrations at the resolution of single nucleotide or amino acid and inquire their structural and chemical properties. However, genes and proteins in cells do not work in isolation: they communicate and coordinate with each other to carry out various biological functions. Systems approaches have become an important and promising alternative to unravel the mechanisms that orchestrate the activities of genes and proteins in cells. Systems approaches are particularly valuable to study complex diseases like cancers and cardiovascular diseases. Unlike many Mendelian diseases, where we can often pinpoint genetic culprits in single or a few genes, causes of complex diseases are multifaceted, involving various molecular aberrations and environmental factors.^{3} This complexity is further amplified by the interconnected nature of the biomolecules in the cells, which propagates these aberrations or erroneous signals throughout the system, thus posing a great challenge to elucidate the true causes and underlying mechanisms.

Biological networks provide a conceptual and intuitive framework to investigate, model, characterize, and understand complex interactions of different components in a biological system. By employing a holistic approach, network biology studies the interactome, a set of direct or indirect molecular interactions, of the biological system. A biological network hence represents the molecular wiring diagram of a cell’s information processing system. Biological networks are useful representations to visualize and understand the functions and interactions of biomolecules. It is challenging to discern patterns and distill knowledge from massive amount of data in a high-dimensional space. An informative network model and a graphical representation reveal the relationships among different cellular components and help to detect subtle patterns by connecting the dots. Biological networks reveal high-level relationships, enrichment patterns, and system-wide properties, which are lacking in univariate analyses. Network theory (and graph theory), as a subfield of computer science and mathematics, has established rich theory and found many applications in the World-Wide Web, social networks, particle physics, etc., some of which are readily applicable to biological networks and disease networks.^{4} In addition, network motifs, recurrent and statistically significant sub-graphs in the networks, have shown to be conserved in evolution and associated with certain biological functions.^{5}

Some biological network models are capable of simulating a biological system’s dynamical behaviors and properties. The complexity of a biological system is reflected in part by a large number of interacting variables, the dynamics of which are governed by numerous linear or nonlinear relationships, chemical kinetics, feedback loops, and stochasticity. Like many physical and engineering systems, biological scientists now have mathematical or computer models at their disposal to perform numerical simulations to understand their temporal behaviors and properties.

Biological networks are a natural and versatile framework to incorporate different sources of data and prior knowledge. The advances in molecular profiling technologies enable data collection at different levels of the biological system and empower scientists with new tools to probe the system. However, because of differences in these technology platforms, data sets are often heterogeneous in nature. Further, significant efforts have also been made to manually curate and document molecular interactions in cells, such as protein–protein interactions (PPIs) and biological pathways, providing rich domain knowledge. The computational challenge we are facing is how to optimally fuse these heterogeneous data and prior knowledge to answer meaningful biological questions, and biological networks present a flexible framework for multilevel data representation and integration.

Because of the complexity of cardiovascular systems, network-based approaches are particularly promising and applicable in cardiovascular research. Successful applications of network biology in cardiovascular literature have been recently reviewed.^{6–9} In this review, we focus our discussion on computational approaches for biological network modeling and analysis. We first summarize some publicly available biological databases representing a variety of biological networks. Then we introduce statistical and dynamical network models widely used for biological network modeling and inference from experimental data. We also discuss how integrative analysis may bridge the gap between prior knowledge on physical interactions in biological databases and mathematical models inferred from experimental data. Finally, we demonstrate the applicability and effectiveness of network biology approaches with examples of network visualization and analysis algorithms and software tools.

## Physical Interactions and Biological Network Databases

Tremendous efforts have been made to curate biological interactions identified in wet-lab experiments and reported in literature and to organize them in biological databases. These databases represent current knowledge on molecular interactions and are indispensable resources for biological data analysis. Biological networks are a broad and inclusive concept. They can refer to gene regulatory networks, PPI networks, signaling networks, metabolic networks, etc., each of which has its own specialized biological network database resources.

## Gene Regulatory Networks

A gene regulatory network is a collection of interactions of genes, usually through their products mRNAs and proteins, governing the expression levels of the mRNAs and proteins of these genes (Figure 1A). Transcription factors (proteins that bind to specific DNA sequences) are typical examples of regulators in a gene regulatory network. For humans, only a small fraction of genes are DNA binding (2000 to 3000 sequence-specific DNA binding transcription factors), and 200 to 300 of them define the basic transcriptional machinery.^{10} By binding enhancer or promoter or silencer regions of DNA alone or with other proteins, transcription factors either activate or suppress the expression of the nearby genes. There are also other regulation mechanisms that modulate gene expression (eg, DNA methylation,^{11} histone acetylation,^{12} and microRNAs^{13}).

### The Encyclopedia of DNA Elements Project

The Encyclopedia of DNA Elements (ENCODE) was an international research effort to find functional elements in the human genome, and understanding the DNA’s regulatory elements and their regulatory relationships is one of the aims of the ENCODE project.^{14} The ENCODE project has generated chromatin immunoprecipitation and high-throughput sequencing (ChIP-seq) data sets for 119 distinct transcription factors of ≥5 main cell lines, revealing complex regulation mechanisms in the human genome and providing a wiring diagram of the regulatory network in a cell.^{15} These connections are curated and downloadable at http://encodenets.gersteinlab.org. Also, information on the ChIP-seq peaks, discovered motifs, and associated histone modification patterns are deposited and publicly accessible at the Web portal Factorbook (http://www.factorbook.org).^{16}

### Databases for Transcription Factors and Their Binding Sites

TRANSFAC, JASPAR, and Universal Protein Binding Microarray Resource for Oligonucleotide Binding Evaluation are all manually curated databases of eukaryotic transcription factors and their binding sites. TRANSFAC database focuses on single–factor–site interactions with information on experimentally proven binding sites, consensus-binding sequences (positional weight matrices), and regulated genes.^{17} TRANSFACompel database characterizes composite elements, where 2 (or more) transcription factors bind to 2 (or more) neighboring binding sites and jointly regulate gene expression in either a synergistic or an antagonistic manner. TRANSFAC and TRANSFACompel databases have a public version (http://www.gene-regulation.com/pub/databases.html), which is an older version and free of charge, and also provide a Professional version, which contains the most up-to-date data and requires a license. Similar to TRANSFAC, the JASPAR database features annotated, high-quality, matrix-based transcription–factor binding site profiles, derived from published collections of experimentally defined transcription–factor binding sites for eukaryotes.^{18} JASPAR core database is freely accessible at http://jaspar.genereg.net/.

The Universal Protein Binding Microarray Resource for Oligonucleotide Binding Evaluation database hosts data generated with universal protein-binding microarray technology. Protein-binding microarray technology is an efficient way to interrogate DNA-binding preferences, complementary to technologies such as ChIP-chip and ChIP-seq. The Universal Protein Binding Microarray Resource for Oligonucleotide Binding Evaluation database is available at http://thebrain.bwh.harvard.edu/uniprobe/, allowing both online viewing and data downloading.

## Signaling Pathways and Metabolic Networks

A signaling network (or signaling pathway) models the information flow and communication that governs and coordinates basic cellular activities in a cell (Figure 1B). A metabolic network, consisting of metabolites and their interactions, depicts the chemical reactions of metabolism, the metabolic pathways, as well as the regulatory interactions that guide these reactions.

### Databases for Signaling and Metabolic Pathways

Kyoto Encyclopedia of Genes and Genomes pathway provides a reference knowledge base of the wiring diagrams of interaction networks and reaction networks for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development.^{19} Kyoto Encyclopedia of Genes and Genomes pathway database can be accessed via Web (http://www.genome.jp/kegg/pathway.html). Reactome is an open-source, open-access, manually curated, and peer-reviewed pathway database, which ranges from metabolic processes to hormonal signaling and includes 2975 human proteins, 2907 reactions, and 4455 literature citations^{20} (http://www.reactome.org). BioCarta is a similar pathway database, which uses an open-source, community-supported approach to catalog and summarize important resources providing information for both classical pathways as well as current suggestions for new pathways (http://www.biocarta.com/genes/index.asp). NCI-Nature Pathway Interaction Database provides biomolecular interactions and cellular processes assembled into authoritative human signaling pathways (http://pid.nci.nih.gov/). Currently, this database has 137 human pathways with 9248 interactions curated by NCI-Nature and 322 human pathways with 7575 interactions imported from BioCarta/Reactome.

There are also ongoing efforts to make these signaling and metabolic pathway databases more integrated with easy interactive and programmable access. Pathway Commons is a collection of publicly available pathways, aggregated from sources such as Reactome and NCI-Nature Pathway Interaction Database^{21} (http://www.pathwaycommons.org). WikiPathways is a community-based, collaboratively edited Website for contributing and maintaining content dedicated to biological pathways^{22} (http://wikipathways.org). Each pathway is represented by an online editable diagram and also includes a description, bibliography, pathway version history, and list of component genes and proteins with link-outs to public resources.

Ingenuity IPA (Ingenuity Systems; http://www.ingenuity.com) is a commercial software tool to model, analyze, and understand complex omics data in the contexts of biological networks and pathways, which are built on the Ingenuity Knowledge Base, a repository of curated biological pathways and functional annotations. Typical analysis tasks include network enrichment analysis, data visualization, pathway comparison study, upstream regulator analysis, and causal network analysis.^{23}

## Protein–Protein Interactions

A PPI network consists of proteins and their interactions (Figure 1C). PPI occurs when ≥2 proteins bind together, either transiently to modify one another or to trigger signal transduction or for a long time to form a protein complex to carry out certain biological functions.

### PPI Databases

PPIs play essential roles in almost all cellular functions, and hence significant efforts have been made to assemble protein interaction maps in the cells. Human Protein Reference Database (http://www.hprd.org/),^{24} Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu/),^{25} Molecular INTeraction database (http://mint.bio.uniroma2.it),^{26} IntAct (http://www.ebi.ac.uk/intact/),^{27} and Biological General Repository for Interaction Datasets (http://thebiogrid.org)^{28} are examples of publicly accessible resources that catalog experimentally determined and scientific–literature reported interactions between proteins.

Besides experimental means to interrogate PPIs, computational approaches have also proven to be powerful tools to predict PPIs. For example, STRING (http://string-db.org/) is a database including both experimentally determined and predicted PPIs, which include direct (physical) and indirect (functional) associations, derived from genomic context, high-throughput experiments, conserved co-expression, and previous knowledege.^{29} Recently, a new algorithm, pre-PPI, was proposed to use 3-dimensional structural information to predict PPIs with good accuracy and coverage.^{30} The predicted PPIs by pre-PPI are available at http://bhapp.c2b2.columbia.edu/PrePPI/.

## Biological Network Models

Generally speaking, there are 2 ways to obtain biological network structures: one is to retrieve network interactions from databases as aforementioned; the other is to infer the network topology and model their interactions directly from experimental data.

### Network Representation and Notations

Mathematically, a network (also called a graph) is a representation of a set of objects connected by links, denoted by *G*. The objects in a graph are called vertices or nodes, and the links are termed edges. We define the vertex set *V* and edge set *E*, and the graph is then represented by an ordered pair *G=(V,E*).

One of the mathematical ways to represent a graph structure is using an adjacency matrix *A*. In a graph of *p* vertices, the adjacency matrix is a *p*×*p* matrix, where an entry *a*_{ij}=1 indicates an edge from node *i* to node *j* and *a*_{ij}=0 otherwise. In most biological networks, adjacency matrices are sparse, which means most entries are zeros and each node is only connected to a small number of nodes in the network. If the edges in *E* have directions associated with them, the graph *G* is called a directed graph. On the contrary, if an edge from node *i* to node *j* is equivalent to an edge from node *j* to node *i*, the graph *G* is an undirected graph, and in this case, the adjacency matrix *A* is a symmetric matrix.

### Statistical and Dynamical Network Models

In this section, we describe several statistical and mathematical network models widely used in biological network inference. By and large, network models fall into 2 major categories: statistical models and dynamical models. Statistical models characterize the interdependence of nodes in the graph using some statistical dependence measures or their joint probability distributions. These models reflect statistical associations independent of time. Some examples of statistical models include correlation networks, mutual-information networks, and probabilistic graphical models. On the other hand, dynamical models characterize mathematical rules that describe the time-dependent transition of one cellular state to the next. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), state space models, and probabilistic Boolean networks (PBNs) belong to this category.

#### Correlation Networks and Mutual Information-Based Networks

Correlation networks and mutual information-based networks are inferred from biological data by evaluating the expression patterns between every pairs or every triples of nodes using similarity criteria like correlation coefficients, partial correlation, and mutual information (Figure 2A). Both correlation coefficients and mutual information are symmetric, and therefore, the resulting networks are undirected. The connections inferred in these types of networks only indicate statistical associations and do not imply causal relationships among these nodes.

A correlation network, also called a gene co-expression network, is a weighted network where each edge is associated with a value—the correlation coefficient between the 2 random variables. Pearson correlation is the most widely used similarity measure in correlation networks, while robust alternatives like Spearman correlation are used as well.

A simple and straightforward way to construct a correlation network is to first compute the correlation coefficients of all pairs of nodes and then set up an appropriate threshold to determine the presence or absence of an edge. Weighted correlation network analysis generalizes such an approach.^{31} Instead of using a hard threshold, weighted correlation network analysis employs an adjacency function that assigns a weight to each edge to soft-threshold the correlation coefficients. Dewey et al^{32} constructed a co-expression network using myocardial transcript data, analyzed the network topological features, and identified gene co-expression modules related to cardiac development, hypertrophy, and failure.

De la Fuente et al^{33} proposed to use partial correlation to construct gene regulatory networks. Partial correlation measures the degree of association between 2 random variables, independent of the third variable. Therefore, this approach is able to remove indirect connections with relatively strong correlation due to a third intermediate variable. Qiu et al^{34} used dependence models and their eigenvalue patterns to explore the dependence relationships of every 3 proteins (or genes), from which a dependence network is constructed.

As Pearson correlation only captures linear relationships between genes, some researchers proposed to use information-theoretic approaches, such as mutual information, to detect similarity between nodes. Mutual information is flexible to model nonlinear relationships. Algorithm for the Reconstruction of Accurate Cellular Networks exemplifies such an information theoretic approach.^{35} Given a set of observations, {*x*_{i},*y*_{i}}, of random variables *X,Y*, their joint probability and marginal distributions are approximated by a Gaussian kernel estimator, and the mutual information of *X,Y* is then estimated. The threshold is determined by randomly shuffling the samples to create a null distribution of the mutual information and select the threshold value at the given *α*. A final step is to use data-processing inequality to trim edges.

Further, Reverter and Chan^{36} proposed PCIT algorithm to combine the partial correlation and information theory (data-processing inequality) approaches to infer co-expression networks. Instead of using a stringent global threshold as in Algorithm for the Reconstruction of Accurate Cellular Networks, PCIT algorithm uses local thresholds for every trio of genes, enabling it to capture some significant associations even when their correlation or mutual information is not strong.

Several recent developments in statistics are promising new association measures to infer linear and nonlinear relationships in biological networks from noisy data sets. Zero Pearson correlation between 2 random variables does not imply independence, because it only captures linear relationship. Distance correlation was introduced to address this deficiency of Pearson correlation, where it is zero if and only if the random variables are independent.^{37} In addition, Reshef et al^{38} recently proposed a new measure of dependence for pairwise relationships—the maximal information coefficient. Maximal information coefficient is rooted in maximal information-based nonparametric exploration statistics and is able to capture linear, nonlinear, and even nonfunctional associations in the data. Another nice property of maximal information coefficient and distance correlation is that their values fall in the range [0,1], similar to coefficient of determination (*R*^{2}) in the regression case, whereas mutual information itself does not have a standard range.

#### Probabilistic Graphical Models

Probabilistic graphical models are diagrammatic representations of probability distributions for a set of random variables (Figure 2B). In a probabilistic graphical model, each node represents a random variable (or a group of random variables), and edges (either directed or undirected) express dependent relationships between these variables.

Two features of probabilistic graphical models made them suitable for modeling biological networks. First, because biological data are noisy, the probabilistic nature of graphical models automatically takes into account the noise in the data and intrinsic uncertainties in the models. Second, diagrammatic representations of graphical models naturally visualize the relationships of genes, which can facilitate new insights and motivate new biological hypotheses.

Typical examples of probabilistic graphical models are Bayesian networks (directed graphs) and Markov networks (undirected graphs). Probabilistic graphical models are widely used in many fields, such as computer vision, natural language processing, and artificial intelligence. The structure of a graphical model encodes how the joint probability distribution is decomposed using their conditional independence relationships. In a Bayesian network, the joint distribution is decomposed into

(1)where *pa*^{i} is the set of parent nodes of *X*_{i}.

In a Markov network, also referred to as a Markov random field, the joint distribution is expressed as

(2)where *X*_{C} is a clique in the graph (a subset of vertices in which every 2 vertices are connected), *f*_{C}(*X*_{C}) is a potential function over the clique *X*_{C}, and *Z* is a normalizing factor. A special case of a Markov network is a Gaussian graphical model, where the distribution of the variables in the graph is assumed to be multivariate Gaussian.^{39}

Learning graphical models from data includes 2 major (and sometimes iterative) steps: structural learning and parameter estimation. Structural learning is to infer the most likely interdependent and independent relationships of these variables, while parameter learning is to estimate the parameters that specify the distribution, *P*(*X*_{1,} *X*_{2,…,} *X*_{p}), of the random variables encoded by the graphical model. In biological networks, learning network structures is of particular interest, and yet is a challenging task.

Unfortunately, learning the exact structure of a Bayesian network is an NP-complete problem.^{40} Therefore, it is computationally infeasible to infer an exact Bayesian network on a large number of variables. One approach proposed by Segal et al^{41} was the module network. In a module network, to alleviate the computational problems, highly correlated variables are clustered and partitioned into modules, and the variables in each module share the same parents and the same conditional probability distribution. There are also heuristic algorithms for approximate Bayesian network structure learning, for example, using dynamic programming and Markov Chain Monte Carlo method.^{42,43}

As each node in a biological network only connects to a small fraction of nodes in the network, sparse constraints are often applied to network structure learning. Recently, *l*_{1}-regularization has attracted great interest in statistics and machine-learning communities. The sparsity of *l*_{1}-regularization refers to the fact that the *l*_{1}-norm constraint tends to make some coefficients exactly zeros, leading to a parsimonious solution and naturally performing variable selection. Representative examples of this class of algorithms include lasso,^{44} elastic net,^{45} and Dantzig selector.^{46} For example, Lee et al^{47} converted the Markov network-learning problem into a convex optimization problem, which can be solved using efficient gradient methods. Schmidt et al^{48} extended *l*_{1}-regularization-based algorithm to Bayesian network learning. On the algorithmic side, several efficient algorithms have been proposed to solve optimization problems with *l*_{1} constraint.^{49} On the theoretical side, it has been shown that lasso-type algorithm asymptotically recovers the true sparse pattern,^{50} and in Gaussian graphical models, *l*_{1}-regularization-based algorithm selects correct neighboring nodes asymptotically.^{51}

#### ODEs and SDEs

In physics, chemistry, and engineering systems, ODEs and state-space models are fundamental modeling tools (Figure 2C). These models are different in nature from statistical models like correlation networks and Bayesian networks we discussed previously. They characterize the dynamics of the levels of biomolecules in biological systems as functions of time: how their concentrations change along the time course. Mathematically, in a network of *p* nodes, the expression levels (or concentrations) of these *p* nodes at time *t* are represented by *X*_{1}(t), *X*_{2}(t),…, *X*_{p}(t). As *t* increases, the dynamics of *X*_{1}(t), *X*_{2}(t),…, *X*_{p}(t) in the system is modeled by

where *i*=1,2,…,*p* and *u*(t) is an optional external input to the network. The argument variables of *f*_{i}(·) determine the parent nodes of *X*_{i} in the network and the form and parameters of *f*_{i}(·) dictate the temporal evolution of the system.

In simple organisms like yeast, ODEs have proven to be a useful tool to model and predict the dynamical properties and interactions of various biomolecules.^{52} Recently, ODEs have been applied to characterizing estrogen signaling in breast cancer with promising results.^{53} There are alternative representations of ODEs for dynamical systems. For example, a state-space model, widely used in control systems, characterizes a dynamical system by a set of first-order differential equations on state, input, and output variables.^{54} Additionally, in biochemical systems theory, the S-system, an alternative mathematical model for dynamical biochemical systems, represents the biological network as a set of differential equations.^{55}

Biological systems are inherently stochastic with randomness originated from diverse sources including biological noise in transcription, translation and network signaling, measurement inaccuracies, and heterogeneity of cell populations.^{56} SDEs have been interrogated and applied extensively as an important mathematical tool for modeling biological systems. For example, Chen et al^{57} used SDEs to characterize the mRNA dynamic transcription and degradation process, where the noise in the biological system and other sources of uncertainties were jointly formulated as the standard Brownian motion. When applying this SDE model to *Saccharomyces cerevisiae* cell cycle data, the predictions of the model agree well with the observed expression pattern, illustrating the great potential of this approach.

#### Probabilistic Boolean Networks

PBNs are another class of dynamical models for characterizing biological networks (Figure 2D). The first Boolean network for genetic networks was proposed by Kauffman.^{58} At that time, the Boolean networks were used as a theoretical model to inquire the complex dynamical behavior of a biological system. Recent developments in high-throughput molecular profiling technologies reignite the interest in this model.

The values of each node, *X*_{1}(t), *X*_{2}(t)*,…, X*_{p}(t), in a Boolean network is discrete, usually binary {0,1} (in some cases tertiary {−1,0,1}). Therefore, it is often necessary to discretize the acquired biological data, which are continuous in their original measurement space. For node *i*, the Boolean network defines the state transition and the state at time *t*+1 based on the states of *X*_{1}(t), *X*_{2}(t),…, *X*_{p}(t) at time t

where *f*_{i}(·)is a Boolean function, defined by a combination of simple Boolean operations like AND, OR, and NOT or a truth table (a rule-based table used in Boolean logic), and *u(t*) is an optional input to the network.

Shmulevich et al^{59} extended Boolean networks to PBNs, which share the rule-based properties of Boolean networks and incorporate probabilistic characteristics into their transition functions. The major difference of PBNs from conventional Boolean networks is that it allows for ≥1 transition function for each node, and the selection of each transition function is associated with a probability. Therefore, the dynamics of a PBN is equivalent to a discrete-time, discrete-state Markov chain.

Similar to ODEs and SDEs, PBNs model the temporal dynamics of biological systems and require time course data for model inference. For population-based data, correlation networks, mutual information-based networks, and probabilistic graphical models are often a better choice for characterizing the statistical associations of biological variables. Because of the complexity of learning many parameters in dynamical models from limited time-series samples, current application of ODEs, SDEs, and PBNs mostly deals with focused sub-networks instead of constructing a global, genome-wide network.

### Integrated Network Inference: Bridging the Gap Between Prior Knowledge and Experimental Data

Above we briefly reviewed 2 general approaches to obtain and construct biological networks for data analysis by (1) retrieving prior knowledge of physical interactions in biological databases and (2) inferring network models directly from high-throughput experimental data. These 2 approaches are complementary in nature. Biological interactions identified experimentally are often designed to detect true physical interactions and causal relationships. However, biological databases are neither disease-specific nor condition-specific; they are accumulated evidence from diverse experimental settings. Mathematical network models and interactions inferred from data are specific to the experiment in which the data are generated but are often limited by the small sample size and do not guarantee physical interactions.

Werhli and Husmeier^{60} used a Bayesian approach to reconstruct gene regulatory network by integrating expression data with multiple sources of prior knowledge. Each source of the prior knowledge is encoded via a separate energy function, from which a prior distribution over network structures is constructed. Similarly, Mukherjee and Speed^{61} proposed Bayesian network inference using informative priors. The prior distributions on graphs are able to capture different types of information on the network structures including edges, classes of edges, degree distributions, and sparsity.

CNORfeeder is a computational approach to integrate literature-constrained and data-driven methods to infer signaling networks.^{62} This method extends a data-driven network model as discussed in Section 3, Statistical and Dynamical Network Models, and uses information on physical interactions of proteins to guide and validate the integration of links.

Recently, Tian et al^{63} proposed an effective approach to incorporate biological prior knowledge into the network-learning algorithm through re-weighting the penalties for the potential connections in the network. To minimize the adverse effects of false positive edges induced by directly incorporating imperfect and nonspecific prior knowledge in specific problems, the prior knowledge incorporation scheme carefully evaluates and controls the impact of false positives in the prior knowledge on the network inference results and automatically selects the optimal degree of information fusion between the evidence in the prior knowledge and the evidence in the data. On the other hand, the algorithm can still identify novel connections between genes without prior knowledge if there is strong evidence in the data supportive of these connections, making it capable of gaining new biological knowledge and insights from experimental data.

## Network Visualization and Network Analysis

Biological networks, built with interactions from biological databases, inferred from data, or constructed by integrating different sources, give rise to novel and informative network-based analysis and provide new insights into biological systems. In this section, we review a few examples of these computational approaches to demonstrate the effectiveness and usefulness of network-based methods. Network-based analysis is still an active research area in bioinformatics, and the methods we discuss here are only a tip of the vast literature in this area.

### Network Visualization

Biological networks are useful representations to visualize and understand the functions and interactions of biomolecules. Many software tools have been developed to make network visualization easy, intuitive, and interactive. Cytoscape is an excellent example of versatile network visualization platforms and has gained tremendous popularity in network biology research community^{64} (http://www.cytoscape.org/). Cytoscape is a cross-platform, open-source Java application and supports major network file formats. It is flexible and powerful to integrate and visualize different layers of genomic and proteomic information on the networks by customizing node-and-edge properties and network layouts. Cytoscape also provides application programming interface for bioinformaticians and software developers to implement their network analysis algorithms and integrate them with Cytoscape. Cytoscape App Store is a central repository for all publicly available plugins known to the Cytoscape development team.^{65}

VisANT is an online visualization tool based on Java Applet, which integrates, mines, and displays hierarchical information in biological networks and pathways^{66} (http://visant.bu.edu/). Two prominent features of VisANT are (1) it is a Web-based application with user-friendly interface; (2) it provides data integration services driven by the Predictome database. NetGestalt (http://www.netgestalt.org) is another network visualization Web application by exploiting the hierarchical architecture of a biological network and provides an easy online tool to integrate large-scale, multiplatform data on a 1-dimensional layout of the genes.^{67} NetGestalt aligns the nodes of a network along the horizontal dimension based on the network hierarchical structure and makes it possible to visualize additional data sources and prior knowledge as custom data/annotation tracks.

Hive plots provide an alternative approach to visualize large networks, which often appear like hairballs using traditional network layouts. In a hive plot, nodes are placed on radially oriented linear axes, and the axis (or axis segment) assignment and the coordinates of nodes can be determined by various network structural parameters (eg, connectivity, clustering coefficient) and user-defined rules. After the axis assignment and the coordinates of the nodes are computed, the edges are drawn as curves connecting corresponding nodes. There are several software packages for creating Hive plots, such as a Perl script (http://www.hiveplot.net/), an R package HiveR (http://academic.depauw.edu/~hanson/HiveR/HiveR.html), and a Java application JHIVE (http://www.bcgsc.ca/wiki/display/jhive/home).

### Analysis of Topological Features in Biological Networks

Network theory studies nontrivial topological features in networks, and local and global structural features of biological networks reveal properties of the biological systems beyond single genes and single interactions.

Network motifs refer to recurring circuits of interactions in biological networks, exhibiting patterns such as negative autoregulation, positive autoregulation, feedforward loops, single-input modules, and dense overlapping regulons.^{5} Recent network analysis of ENCODE data also found similar network motifs in human transcriptional networks.^{15}

In network theory, the importance of a node is often characterized by measures of the centrality of the node. For example, the degree of a node is the number of links connected to and from this node; betweenness centrality is the number of times a node acts as a bridge along the shortest path between 2 other nodes. Dysfunction of a gene with high degree or high betweenness implies more drastic impacts on neighboring nodes and the network as a whole.

Global analysis of a biological network deals with network properties such as the distributions of the connections in networks. For example, the degree of nodes in a scale-free network follows a power law distribution. A feature of a scale-free biological network is that some hub genes connect some dense sub-graphs, implying their functional importance.

The structural features of biological networks provide another dimension and a biologically meaningful context to interrogate these otherwise isolated genes. For example, Jin et al^{68} used PPIs from Human Protein Reference Database, signaling pathways from Kyoto Encyclopedia of Genes and Genomes, and protein annotations to build a cardiovascular-related network and identified network biomarkers for major adverse cardiac events. Zhang et al^{69} defined 6 network features, namely degree, neighbor count of disease genes, ratio of disease genes in neighbor, betweenness centrality, clustering coefficient, and mean shortest path length to disease gene, and trained a support vector machine classifier to predict candidate genes for coronary artery disease.

### Differential Network Analysis

Biological networks are context-specific and dynamic in nature. Under different conditions, different regulatory components and mechanisms are activated, and the topology of the underlying biological network changes accordingly (Figure 3). For example, in response to diverse conditions in the yeast, transcription factors alter their interactions and rewire the signaling networks.^{70}

It is important to focus on the topological changes in biological networks between disease and normal conditions, or across different stages of cell development. For example, a deviation from normal regulatory network topology may reveal the mechanism of pathogenesis, and the genes that undergo the most network topological changes may serve as biomarkers for the disease state or as targets for drug discovery or therapeutic intervention. Differential network analysis can also help identify key genetic players or disease markers. Differential network biology has become an active research area in recent years.^{71}

Bandyopadhyay et al^{72} showed widespread changes in genetic interaction among yeast kinases, phosphatases, and transcription factors as the cell responds to DNA damage using the technique of epistatic miniarray profiles and differential epistasis mapping. In a breast cancer study, time course data on signaling networks, gene expression, and cell phenotypic responses demonstrated that sequential application of anticancer drugs rewired apoptotic signaling networks and enhanced cell death.^{73}

Some computational methods have been proposed to learn condition-specific biological networks. Zhang et al^{74,75} proposed a differential dependency network analysis to detect statistically significant network rewiring in biological networks and pinpoint the key genes involved in network topological changes. Further, an efficient learning algorithm was proposed to jointly infer condition-specific network topology between 2 conditions using *l*_{1}-regularization–based convex optimization and the block coordinate descent algorithm.^{76}

### Network Enrichment and Pathway Impact Analysis

Next-generation sequencing technologies and high-throughput genotyping arrays are capable of detecting many novel germline and somatic mutations. An important and yet challenging task is to understand the functional implications of these genomic alterations. Network enrichment analysis and pathway impact analysis provide network and functional contexts to these aberrations and help discern random passenger mutations from (sometimes low-prevalence) driver mutations.

HotNet is an effective computational approach to identify statistically significant subnetworks (connected subnetworks whose genes have more mutations than expected by chance) in disease studies.^{77} Network structures in the analysis can be PPI networks or signaling pathways retrieved from databases. Then the method defines a local neighborhood of influence for each mutated gene in the network and uses a 2-stage multiple hypothesis test to estimate the false discovery rate associated with the identified subnetworks.

PAthway Recognition Algorithm using Data Integration on Genomic Models (PARADIGM) is a method for inferring patient-specific genetic activities incorporating curated pathway interactions.^{78} Each pathway is converted into a distinct probabilistic model, which is a factor graph with variable nodes describing the states of entities in a cell (DNA, mRNA, or proteins). The method predicts the degree to which a pathway’s activities are altered using genomic data. An extension of this approach is PARADIGM-SHIFT, which predicts whether a mutational event is neutral, gain- or loss-of-function in a tumor sample.^{79}

### Network Simulations and Attractors in the Biological Networks

Dynamical models, such as ODEs, state-space models, and PBNs, are amicable for numerical simulation of temporal patterns of a biological system. Biological networks that drive biological decision processes are highly dynamic. By perturbing the biological systems or in silico knocking down certain nodes, these biological networks may attract cells to new signaling and phenotypic states, which are termed network attractors.^{80}

Barik et al^{81} used a dynamical model consisting of 60 differential equations and 71 kinetic parameters to characterize yeast cell-cycle regulation. In the study, to understand the molecular fluctuations on cell-cycle progression in budding yeast cells, this ODE model was simulated both deterministically and stochastically, and the simulation results indicated the bistable switching behavior on which proper cell-cycle progression depends. This line of research was further extended to more complex systems. Tyson et al^{53} proposed to use ODE network models to study the decision circuits in breast cancer cells, and network simulations can help reveal estrogen-signaling mechanisms and shed light on breast cancer susceptibility and resistance to endocrine therapy.

PBNs are also useful in identifying attractors in disease networks. Simulations of PBNs are equivalent to simulating discrete Markov chains, which will settle into one of a collection of state cycles when running for a long time. The steady-state probabilities of the attractors reveal how perturbations or intervention alter the long-term network states.^{82}

## Conclusions and Future Perspectives

In this review, we briefly discussed biological databases, statistical and dynamical network models, and computational approaches for biological network modeling and analysis. Network biology is a new paradigm to understand the complex interactions of the molecules in the cells using an integrative and systems approach. The rapid developments in data acquisition, network models, and computational tools make network biology a promising approach in biology and medicine.

Despite rapid advances in genomic and proteomic technologies and the increasing availability of large biological data sets, network inference from experimental data remains challenging. Effective network inference algorithms need to address the so-called large *p*, small *n* problems, in which the number of variables and parameters in a biological system is orders of magnitude larger than the number of samples collected. To constrain the solution space and avoid overfitting, biological prior knowledge, parsimonious models (Occam’s razor), and statistical methods like cross-validation and regularization may be applicable.

Biological networks are dynamic. The dynamics of biological networks are 2-fold: (1) the dynamic structures of the networks, and (2) the dynamics of the temporal patterns produced by the networks. The dynamic nature of the biological networks ensures the adaptability, robustness, and flexibility of the biological systems. But this dynamic nature often makes the network model unidentifiable. We are often exposed to only a single snapshot of the biological network, and because of limited experimental conditions, the biological systems are not perturbed enough to exhibit all possible dynamic behaviors. How to derive a personalized biological network and infer the temporal activation patterns of the network remains an open problem.

Finally, mathematical modeling and biological experiments in network biology is an iterative process. The mathematical models learned from data are used to make predictions and generate new hypotheses, and these models and their predictions need to be rigorously tested and evaluated by carefully designed biological experiments.

## Sources of Funding

This work was supported by grant U24CA160036 from the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC), National Cancer Institute Early Detection and Research Network grant U24CA115102, National Institutes of Health (NIH)/National Institute of Neurological Disorders and Stroke grant R01NS29525, NIH/National Cancer Institute grant U54CA149147, and NIH/National Heart, Lung, and Blood Institute grant R01HL111362.

## Disclosures

None.

## Footnotes

Guest Editors for this series are David M. Herrington, MD, MHS & Yue (Joseph) Wang, PhD.

The Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.113.000123/-/DC1.

- © 2014 American Heart Association, Inc.

## References

- 1.↵
- 2.↵
- Bell G,
- Hey T,
- Szalay A.

- 3.↵
- 4.↵
- 5.↵
- 6.↵
- Lusis AJ,
- Weiss JN.

- 7.↵
- Arrell DK,
- Terzic A.

- 8.↵
- 9.↵
- Mayr M.

- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- Matys V,
- Kel-Margoulis O V,
- Fricke E,
- Liebich I,
- Land S,
- Barre-Dirrie A,
- et al

- 18.↵
- Sandelin A,
- Alkema W,
- Engström P,
- Wasserman WW,
- Lenhard B.

- 19.↵
- Kanehisa M,
- Goto S,
- Hattori M,
- Aoki-Kinoshita KF,
- Itoh M,
- Kawashima S,
- et al

- 20.↵
- Matthews L,
- Gopinath G,
- Gillespie M,
- Caudy M,
- Croft D,
- de Bono B,
- et al

- 21.↵
- Cerami EG,
- Gross BE,
- Demir E,
- Rodchenkov I,
- Babur O,
- Anwar N,
- et al

- 22.↵
- Kelder T,
- van Iersel MP,
- Hanspers K,
- Kutmon M,
- Conklin BR,
- Evelo CT,
- et al

- 23.↵
- Krämer A,
- Green J,
- Pollard J,
- Tugendreich S.

- 24.↵
- Keshava Prasad TS,
- Goel R,
- Kandasamy K,
- Keerthikumar S,
- Kumar S,
- Mathivanan S,
- et al

- 25.↵
- Salwinski L,
- Miller CS,
- Smith AJ,
- Pettit FK,
- Bowie JU,
- Eisenberg D.

- 26.↵
- Ceol A,
- Chatr Aryamontri A,
- Licata L,
- Peluso D,
- Briganti L,
- Perfetto L,
- et al

- 27.↵
- Kerrien S,
- Aranda B,
- Breuza L,
- Bridge A,
- Broackes-Carter F,
- Chen C,
- et al

- 28.↵
- Chatr-Aryamontri A,
- Breitkreutz B-J,
- Heinicke S,
- Boucher L,
- Winter A,
- Stark C,
- et al

- 29.↵
- Franceschini A,
- Szklarczyk D,
- Frankild S,
- Kuhn M,
- Simonovic M,
- Roth A,
- et al

- 30.↵
- 31.↵
- 32.↵
- Dewey FE,
- Perez M V,
- Wheeler MT,
- Watt C,
- Spin J,
- Langfelder P,
- et al

- 33.↵
- De la Fuente A,
- Bing N,
- Hoeschele I,
- Mendes P.

- 34.↵
- Qiu P,
- Wang ZJ,
- Liu KJR,
- Hu Z-Z,
- Wu CH.

- 35.↵
- 36.↵
- Reverter A,
- Chan EKF.

- 37.↵
- 38.↵
- Reshef D,
- Reshef Y,
- Finucane H.

- 39.↵
- 40.↵
- Fisher D,
- Lenz H-J

- Chickering DM

- 41.↵
- 42.↵
- Eaton D,
- Murphy K.

- 43.↵
- Grzegorczyk M,
- Husmeier D,
- Edwards KD,
- Ghazal P,
- Millar AJ.

- 44.↵
- Tibshirani R.

- 45.↵
- 46.↵
- 47.↵
- Lee S,
- Ganapathi V,
- Koller D.

- 48.↵
- Schmidt M,
- Niculescu-Mizil A,
- Murphy K.

- 49.↵
- 50.↵
- Zhao P,
- Yu B.

- 51.↵
- 52.↵
- Kar S,
- Baumann W,
- Paul MR,
- Tyson JJ.

- 53.↵
- 54.↵
- Rangel C,
- Angus J,
- Ghahramani Z.

- 55.↵
- Voit E.

- 56.↵
- 57.↵
- Chen K-C,
- Wang T-Y,
- Tseng H-H,
- Huang C-YF,
- Kao C-Y

- 58.↵
- 59.↵
- Shmulevich I,
- Dougherty ER,
- Kim S,
- Zhang W.

- 60.↵
- 61.↵
- Mukherjee S,
- Speed TP.

- 62.↵
- Eduati F,
- De Las Rivas J,
- Di Camillo B,
- Toffolo G,
- Saez-Rodriguez J.

- 63.↵
- Tian Y,
- Zhang B,
- Shih I-M,
- Wang Y.

- 64.↵
- 65.↵
- 66.↵
- Hu Z,
- Snitkin ES,
- DeLisi C.

- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- Bandyopadhyay S,
- Mehta M,
- Kuo D,
- Sung M-K,
- Chuang R,
- Jaehnig EJ,
- et al

- 73.↵
- 74.↵
- Zhang B,
- Li H,
- Riggins RB,
- Zhan M,
- Xuan J,
- Zhang Z,
- et al

- 75.↵
- Zhang B,
- Tian Y,
- Jin L,
- Li H,
- Shih I-M,
- Madhavan S,
- et al

- 76.↵
- Zhang B,
- Wang Y.

- 77.↵
- 78.↵
- Vaske C,
- Benz S,
- Sanborn J,
- Earl D.

- 79.↵
- Ng S,
- Collisson E,
- Sokolov A.

- 80.↵
- 81.↵
- 82.↵

## This Issue

## Jump to

- Article
- Introduction
- Physical Interactions and Biological Network Databases
- Gene Regulatory Networks
- Signaling Pathways and Metabolic Networks
- Protein–Protein Interactions
- Biological Network Models
- Network Visualization and Network Analysis
- Conclusions and Future Perspectives
- Sources of Funding
- Disclosures
- Footnotes
- References

- Figures & Tables
- Info & Metrics

## Article Tools

- Network Biology in Medicine and BeyondBai Zhang, Ye Tian and Zhen ZhangCirculation: Cardiovascular Genetics. 2014;7:536-547, originally published August 19, 2014https://doi.org/10.1161/CIRCGENETICS.113.000123
## Citation Manager Formats