# Toward Breaking the Histone Code

## Bayesian Graphical Models for Histone Modifications

## Jump to

## Abstract

**Background—**Histones are proteins that wrap DNA around in small spherical structures called nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and the combinatory patterns of the presence or absence of multiple HMs, or the histone codes, are believed to coregulate important biological processes. We aim to use raw data on HM markers at different genomic loci to (1) decode the complex biological network of HMs in a single region, and (2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these differences in network attributes form a significant link between histones and genomic functions.

**Methods and Results—**We develop a powerful graphical model under the Bayesian paradigm. Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows for easy but important extensions for inference on differential networks under various conditions, such as the different annotations of the genomic locations (eg, promoters versus insulators). We applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed many existing findings and provided a unified tool to generate various promising hypotheses. Differential network analyses revealed new insights on coregulation of HMs of transcriptional activities in different genomic regions.

**Conclusions—**The use of Bayesian graphical models and borrowing strength across different conditions provide high power to infer histone networks and their differences.

## Introduction

We use model-based Bayesian inference for an unknown dependence structure as a step toward breaking the histone code.^{1} Histones are small proteins that package DNA around dense structures called nucleosomes, like a thread around a spool. The thread is 147 base pairs of DNA, and the spool consists of an octamer of the 4 core histones (2 sets of H2A, H2B, H3, and H4; Figure 1). The tails of the histone proteins undergo posttranslational modifications, which play an influential role in transcription,^{2,3} nucleosome assembly, higher-order chromatin packing, DNA damage repair, and chromosomal segregation.^{4}

Recent research has been able to assign certain transcriptional roles to some well-known histone modifications (HMs). Methylation on H3K9, for example, promotes a cascade of sequential events that leads to transcriptional repression. Trimethylation of H3H3K4 is found in abundance at the promoter sites of active genes. H3K36 trimethylation is associated with RNA polymerase II and is also indicative of transcriptional activation. Gene–environment interactions are mediated by HMs as well. It has been experimentally demonstrated that shear stress can regulate cardiovascular markers in mouse embryos by inducing some specific HMs, such as H3K14 acetylations and H3K79 methylations.^{5} A genome-wide HM profile revealed a differential abundance of H3K4me3 and H3K9me3 in cardiomyocytes during heart failure in animals and humans.^{6} The effect of reducing H3K4me3 on cardiomyocytes through knockout experiments revealed significantly altered downstream signaling cascades.^{7} It was recently shown that an epigenetic silencing of the eNOS-promotor controlled angiogenesis, and the regulation was through H3K27me3.^{8} Specifically, during hypoxia, eNOS expression was increased through the reduction of H3K27me3 markers. Certain enzymes, such as lysine methyltransferase, which are associated with H3K4 methylations, have also been found to influence endothelial cell functions. Patterns of HMs have been found to be important predictors of cancer prognosis,^{9} and subsets of HMs are used as potential informants of clinical decisions.^{10}

Despite the accumulating knowledge about HMs and their functions, much remains unknown, especially the mechanism of colocalization of multiple HMs and its relation to transcription. HM colocalization is caused by concurrent activities of several enzymes on different parts of the histone tails. The hypothesized histone code^{1} suggests that the presence or absence of these HM colocalizations regulates gene transcription in a combinatory fashion. The exact pattern is still unknown, and cracking this code partially depends on unraveling the complex network of HMs in different regions of the genome. For example, it is of great interest to study the network pattern of HMs for different types of genomic regions, such as promoters and insulators.

Because of the emergence of the histone code hypothesis, several experimental results have pointed to strong evidence of a crosstalk mechanism between different HMs.^{11,12} It has been shown that H3K4 methylation can promote acetylation of H3 by p300 enzymes.^{13} It has also been conclusively demonstrated that disruption of the ubiquitin-conjugating enzyme Rad6/Ubc2 in yeast can produce a loss of methylation at H3K4 and H3K79.^{14,15} Thus, ubiquitination of H2B almost determines the methylation of H3 on K4 and K79. Similarly, the enzymes that acetylate H3K14 have been found to work together to mediate gene activation.^{16}

Our work aims to unravel the crosstalk mechanism between different HMs by analyzing data on HMs at different genomic loci. We address the problem in 2 steps: (1) a base model for a single HM network, and (2) multiple networks. The base model provides a formal method of extracting a dependence structure from a cross-section of the HM data. A straightforward approach here would compute pair-wise correlations among the markers.^{17} However, the peculiar attributes of the HM data make such inferences inappropriate. Some major challenges include a nonconventional distribution of the marker counts and the problem of mapping the marker counts to the presence or absence of these modifications. The second step builds multiple networks in different genomic regions. Here, we attempt to decipher the histone code. Specifically, we achieve this by demonstrating that functionally diverse genomic regions (like promoters, insulators, and enhancers) are characterized by different crosstalk mechanisms. Our model-based approach even allows us to formalize this dependence through difference graphs between 2 networks. Studying the network pattern of HMs for different types of genomic regions is itself of much importance. Moreover, the small differences between them could shed light on important regulatory functions of HM colocalizations and lead us toward the histone code.

We summarize the single network model^{18} and extend the inference procedures described therein. In the Methods Section, we briefly describe and motivate this approach involving a Bayesian graphical model that estimates dependence among latent binary indicators. However, the extent of functional genetic heterogeneity makes the assumption of such a single global network unrealistic. It also provides less insight into how the histone code actually operates. Here, we modify the base framework suitably to jointly model multiple HM networks in distinct types of genomic regions, borrow strength across different regions, and, thus, achieve high power in the detection of small differences between networks.

## Methods

### Bayesian Graphical Model

#### Single Graph

A graph *G* consists of a pair *G*=(*V, E*), where *V* is a set of vertices and *E* is a set of undirected edges. For future reference, we define a clique as a set of vertices, of which all pairs in the set are connected through edges, that is, for all *i*_{1}, *i*_{2} in the set. The vertices correspond to the variables, which are HMs in our case. The absence or presence of edges in the graph denotes the conditional independence (CI) or dependence between the variables. We formalize the relationship between variables by the notion of CI. CI between 2 nodes *i* and *j* implies that the random variables *i* and *j* are independent conditional on the remaining variables (nodes). Figure 2 illustrates these concepts of CI and cliques.

CI contrasts the notion of marginal independence where we average out over the other variables. For example, in the case of multivariate normal distributions, the covariance matrix codes for marginal dependence, whereas the inverse covariance matrix denotes CI. There are 2 main reasons that CI structure is used to depict HM relationships. The first reason is that we get a more biologically meaningful interpretation of the relationships. To see this, consider an HM *i* that simultaneously influences HMs *j* and *k* and is the sole cause for inducing a statistical association between them. Although the 3 variables seem to be marginally associated, the CI structure would be aptly lacking an edge between *j* and *k*. Secondly, a CI structure has a one-to-one relationship with the joint distribution of the random variables, which accounts for the problem of identifiability.

The data set to be analyzed is arranged in a count matrix ** y**=[

*y*]. Each row,

_{it}*i*=1,…,

*m*, represents an HM, and each column,

*t*=1,…,

*n*, represents a genomic location. A genomic location refers to a segment of DNA of ≈2000 base pairs. That is, each

*y*reports the count of observed HMs of type

_{it}*i*in location

*t*. Our goal is to estimate this unknown graph from the raw data that describe the dependence structure of the HMs.

We initiate the model-building process by formulating a data-generating mechanism that traverses from graph to the data. Figure 3 outlines this mechanism as a flow chart. At the top lies the graph, our object of interest that implies a CI structure among HMs. At the bottom is the data matrix. Just above the data matrix, we introduce a layer of latent indicators ** e**, which has the same dimension as

**. Let**

*y**e*

_{it}denote an indicator for the presence of HM

*i*at location

*t*. The use of latent indicators

*e*

_{it}formalizes the biologically meaningful notion of the presence versus absence of HMs. The reduction to a latent binary signal is similar to the probability of expression model

^{19}for microarray data. It makes more sense biologically to model the dependence structure among these latent indicators than the columns of

**, given that the latter is the actual signal corrupted with noise.**

*y*An easy alternative to this complex model building would be to remove the additional layer of ** e**. Conditioned on a graph

*G*, we could directly model the joint distribution of the columns of

**as a multivariate Gaussian distribution. The nonzero entries of**

*y**G*would then correspond to the nonzero entries of the inverse covariance matrix of the Gaussian distribution. This would be the more common and popular approach based on Gaussian Graphical Models.

^{20–23}Yet as previously mentioned, it would misguide us about the form of actual dependence we are interested in. We describe how such model misspecifications could lead to erroneous results using a set of simulations.

Finally, we emphasize the fact that we do inference in a Bayesian framework. As a result, all the unknown model parameters {*β, θ, G*} are random variables having prior and posterior distributions. For example, we assume *p*(*G*) to be a uniform distribution over all possible graphs with vertex set *V*. Earlier, we referred to *G* as the unknown graph to be estimated. This interpretation still holds to the extent that we present a single posterior estimate of the unknown graph at the end of inference. However, technically, we shall have more than a point estimate, a full posterior distribution . Posterior inference is done through Gibbs sampling, which only requires working out the conditional distributions from the data-generating model. Apart from implementation, Bayesian inference also has an inherent advantage over other frequentist network inferences like Bayesian Networks and Directed Acyclic Graphs, which rely on greedy search algorithms. We performed a comparison with 1 such method available in R package. A primary disadvantage of the latter approaches is that the quantification of model uncertainty is almost impossible. By contrast, in the Bayesian approach, a summary of Markov chain Monte Carlo (MCMC) samples readily yields a full posterior distribution.

We define a joint prior probability model for following the conditional dependence structure in *G*. The Hammersley Clifford Theorem assures us that any given CI structure can be represented by a joint distribution. The autologistic model^{24} precisely specifies such a distribution when the random variables are discrete. This is applicable to our case as the ** e** are binary. We write the joint probability model indexed by a set of parameters

**as follows:**

*β*where is a deterministic function of *β*_{i}. The centering of *e _{it}* with

*ν*improves the convergence of posterior simulation.

_{i}^{25}The edges in

*G*impose a restriction on

*β*: for any pair of vertices not connected by an edge (which is not a size-2 clique),

*β*

_{ij}is 0. The autologistic model described here lends itself to a nice conditional interpretation. Conditional on the other variables, the distribution of a node

*i*is a logistic regression with 2-way interaction coefficients. This is a desirable property of the joint distribution and is exploited in the Gibbs sampling of

**. A natural extension of the model is addressed in the Discussion section.**

*e*We complete the model construction with a sampling distribution for the observed counts *y*_{it}. The sampling model assumes CI of ** y** given

**across both HMs**

*e**i*and loci

*t*. This simplifying assumption reflects a preference for parsimony. As a simple residual diagnostic check, we evaluated the empirical variance-covariance matrix of imputed residuals and found no evidence of serious violations of independence.

Figure 4 shows the empirical distributions of the counts for an arbitrarily selected HM. This led us to propose a mixture model with a Poisson distribution (Poi) for the low counts, say , and a mixture of 2 lognormal (LN) distributions for moderate-to-high counts. We assume

(3.2)The curve in Figure 4 is a toy fit using the mixture model.

Denoting , the parameter vector for the sampling model, the complete hierarchical model for a single graph can be summarized as

(3.3)This unified hierarchical structure facilitates an easy extension from the single graph to multiple dependent graphs, shown next.

#### Multiple Graphs

The flexible and coherent nature of the hierarchical Bayesian inference becomes critical when we extend the network inference to multiple graphs. In the case of HMs, it is reasonable to expect that the HM networks in different regulatory regions share an underlying global mechanism but differ in small but important aspects. The formal representation of such prior information requires a joint model over multiple graphs. Let *K* denote the number of networks under consideration. For our application, we considered HM graphs, corresponding to *K*=3 different types of genomic regions: promoters, insulators, and enhancers. Now, we add a common latent graph *G*^{0} to the model and construct a hierarchical prior for *G*^{k}. Let denote the event that an edge connects the nodes *i* and *j* in graph *G*^{k}. We assume

The probabilities *ρ*_{0}, *ρ*_{1}, and *ρ*_{2} follow independent uniform priors.

The model implements the desired dependence and borrows strength across the 3 conditions by assuming a hierarchical prior across *G*^{k} and a common latent graph *G*^{0}. The terms {*G*, ** β**,

**,**

*e***,**

*y**θ*} in the single graph model are now replaced by {

*G*

^{k},

*β*^{k},

*e*^{k},

*y*^{k}

*, θ*

^{k}}.

For reference, we state the overall model structure as

(3.5)The first factor is the prior on *G*^{0}. The next term denotes the prior on the individual graphs conditional on the common network, as in Equation 3.4. The dependence is induced at the level of the graphical models. Each of the subsequent terms can be written as product over *k* factors.

The factor * defines a prior on latent binary indicators **e*_{it}* ^{k}* for presence of HMs. The next term is the sampling model likelihood for the count data. The last 2 terms are the prior distributions of the sampling model parameters and the autologistic coefficients.

We combined Gibbs and Metropolis-Hastings steps to implement posterior Monte Carlo simulation. The MCMC simulations are computationally feasible for a number of nodes, ≈15, and a number of observations up to 40 000. As the number of nodes decreases to <10, the computational feasibility still holds for larger sample sizes. However, larger nodes and smaller sample sizes pose a problem in producing consistent and accurate estimation. In such cases, we might have to rely on informative priors^{18} and sparsity considerations. For Gibbs sampling, we require full conditional distribution. The complete conditional posterior distribution for *ρ*_{1} and *ρ*_{2} are beta distributions. For sampling *e*^{k}, we exploited the fact that the autologistic density corresponds to a set of logistic distributions as its full conditionals. We updated by sampling from for *t*=1, …, *n*. Here *e*_{−it} denotes the set of latent indicators at locus *t* for all HMs excluding *i*. The conditional posterior distribution for *e _{itk}* is conditionally independent across

*t*given all other parameters and

*y*^{k}. The parameters of the lognormal mixture are efficiently sampled by introducing additional binary variables

*z*that act as indicators of the 2 mixture components. This technique of using latent indicators is common in Bayesian mixture models. MCMC simulations for the other sampling model parameters are straightforward.

Updating *β*^{k} and *G*^{k} is a critical challenge. The full posterior conditionals for these parameters require the evaluation of the normalization constants in Equation 3.1, which in turn is a sum over all possible *m*-dimensional binary vectors . We implement an importance sampling approximation.^{18} Using this approximation, *G*^{k} is sampled through a Reversible-Jump Metropolis Hastings scheme using birth and death steps.

#### Simulations

We assessed the model performance on synthetic data sets generated from the proposed autologistic model. We used 2 inference techniques: (1) our approach, and (2) the Gaussian Graphical Model. We generated 10 data sets in total, each containing 10 HMs and 6000 observations. For each of the 10 simulations, we simulated a true network *G*_{1} by setting up vertices for *m*=10 variables. For each pair of vertices {*i, j*}, we included an edge between them with probability 0.3. Then for each imputed edge {*i, j*}, we generated values of *β*_{ij} in Equation 3.1 using a discrete uniform prior over 3 possible values, *β*_{ij} ~ Unif({log(2), log(3), −log(1.5)}. The autologistic intercepts *β _{i}* in Equation 3.1 were generated as

*β*

_{i}~

*N*(0, 0.3). Then, we generated

**from the autologistic prior model Equation 3.1 using a Gibbs sampler simulation with 8000 iterations. We kept the last draw as the simulation truth for**

*e***. Conditional on**

*e***we imputed hypothetical HM counts**

*e***from Equation 3.2 using the following HM-specific parameters**

*Y***. We fixed and generated and . The thresholds**

*θ**c*

_{i}were fixed at 3. The Poisson rates were generated as . For each simulated data set, we implemented the described posterior MCMC simulation to compute posterior summaries. The posterior estimates were computed using MCMC samples with an initial burn-in of 3500 iterations and a total of 8000 iterations for all 10 simulations. We obtained posterior estimates for all the parameters by averaging over

*k*=4500 post–burn-in MCMC samples. Specifically, we computed the posterior inclusion probability for each possible edge {

*i, j*} in the graph, defined as

substituting the edge set *E* of the imputed graph for each iteration of the MCMC. Then, we estimated the graph *G* by including all edges with 0.5. For five of the simulations, the posterior estimated graph matched the simulation truth exactly. The average mismatch rate between the simulation truth and the estimated graphs was merely 0.04.

Next, we used the R package g-lasso to perform inference under the Gaussian Graphical Model for the same data sets. Because the inference is sensitive to the penalization parameter in glasso, we set it such that the estimated number of nonzero entries is closest to the number of nonzero entries in the simulation truth, noting that this choice favors the glasso. The error rate for the glasso estimate is substantially higher (0.15) than under the proposed model. For the HM data, we believe that the autologistic model better reflects the underlying biology because modifications of histones are truly binary.

## Results

### ChIP-Seq Data Analysis

We analyzed data from a ChIP-Seq experiment for CD4+ T lymphocytes.^{17,26} ChIP-Seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing (seq) to identify genome-wide binding patterns of DNA-associated proteins, such as histone modifications (HMs). Counts of HM-binding DNA fragments are measured and mapped to specific loci on the genome, indicating the binding affinity of the specific HM at these loci. A large DNA fragment count *y _{it}* indicates a high propensity of the presence of the corresponding HM

*i*at location

*t*. Exploratory analysis has revealed a backbone consisting of 17 HMs that influence gene expression in the promoter regions.

^{17}We implemented the proposed graphical model for these selected HMs in

*k*=3 different types of genomic regions: promoters, enhancers, and insulators. For each edge {

*i, j*} in HM graph

*G*

^{k}, we obtained the posterior inclusion probability . Next, we obtained posterior estimates of the unknown graphs

*G*

^{k}by thresholding the posterior probabilities to achieve a posterior expected false discovery rate (FDR) close to 0.05 in all 3 networks. Specifically, we obtained a graph

*G*by including all edges with . The posterior expected FDR corresponding to any given threshold

*c*is calculated by

Figure 5 shows the estimated HM networks. The following are some common edges that are observed with high probabilities (>0.7) in all 3 networks: (H3K4me3, H2A.Z), (H2BK5ac, H3K27ac), (H2BK120ac, H4K91ac), and (H3K9ac, H3K4me3).

Most of our findings corroborate existing knowledge on HMs. For example, H3K4me3 modification is found to be strongly correlated with active transcription and is often colocalized with H2A.Z enrichment.^{26} An almost perfect colocalization of H2A.Z, H3K9ac, and H3K4me3 in the plasmodium genome has been reported, and it is suggested that these marks are preferentially deposited on H2A.Z-containing nucleosomes.^{27} H2BK120ac and H4K91ac have been found to be the most correlated pair of HMs,^{17} which also feature in our list. Furthermore, H2BK5ac and H3K27ac were found to be the most important predictors for gene expression levels,^{28} colocalizing according to our prediction. Some of the edges in the promoter network that do not appear in the enhancers include (H4K8ac, H3K4me3), (H3K36ac, H2BK20ac), and (H2BK12ac, H2BK5ac). The network for the insulators is slightly sparser than the first 2. Some edges that differ between the enhancer and insulator networks are (H3K9ac, H4K91ac) and (H3K36ac, H4K8ac).

Perhaps more important are our findings on the relationships of the inferred HM networks to human diseases and drug development. Regions with H3K4 methylations and H3K27ac are associated with enhancers,^{29} and these 2 marks correlate strongly with the timing of gene expression. Our inference refines the already available information on these marginal associations by estimating an edge between these 2 HMs in the posterior network for enhancers.

Knowledge of the interdependence of HMs could facilitate therapies for some major diseases. HIV-I, the retrovirus causing AIDS, has been a popular target for drug development. Retroviruses generate copies of their RNA genome and integrate themselves into the host chromosomal DNA. The target-binding sites for this virus in the genome have been found to be associated with the simultaneous abundance of the markers H3K18ac, H2BK5ac, H3K27ac, and H2A.Z.^{30} Our inference adds another important piece of information to these marginal associations. We find that in all 3 estimated networks, H2BK5ac, H3K27ac, and H2A.Z are connected by edges to H3k18ac.

Estrogen receptor signaling is a key player in the progression of hormonal cancer. Recent research indicates that the enzyme KDM1, which simultaneously demethylates H3K4 and H3K9, is associated with a substantial portion of ER-alpha target genes and also demethylates neighboring histones to promote ER-mediated transcription.^{31} H3K4 and H3K9 methylations are believed to play a key role in important epigenetic processes, especially in cancer epigenetics.^{32} Again, our inference confirms these known marginal associations and adds to existing research with the finding that these 2 HMs are included among the strongest edges in all 3 posterior estimated networks.

H2A.Z is now being recognized as an important target for breast cancer therapy.^{33–35} The domains of H2A.Z abundance have been found to be enriched with colocalization of K4 methylations.^{35} Once more, we find high posterior probability for edges between H2A.Z and H3K4me.

In summary, a collection of major histone markers that are targeted for drug therapy seems to be connected in the posterior estimated networks. Although this is reassuring, we also expect that the estimated networks will provide future targets for novel epigenetic drugs. For example, acetylation on H2B Lysine 120 (H2BK120ac) is identified in our networks as a node with high degree. Its function is still largely unknown. Only recently has it been shown that H2BK120ac is a post-translational modification preceding ubiquitination of H2B, which is an important marker for expression regulation.^{36} Therefore, a hypothesis to be tested is that perturbation of H2BK120ac could affect subsequent ubiquitination of H2B and consequently affect chromatin states and gene expression. Although this hypothesis needs further validation, our analysis based on ChIP-Seq data provides a powerful exploratory tool to generate these important hypotheses. Without these tools, scientists would not be able to examine the entire HM network from a systematic view, and it would take years to experimentally explore the relationships between different HMs.

## Discussion

Epigenetics has now emerged as an exciting frontier in the science of heredity. HMs are known to overwrite the inscribed genetic code to regulate important biological processes, which has led many scientists to hail research on HMs as a major revolution in modern biology. This research has been increasingly suggestive of a fundamental association between HMs and pathology of some major diseases. Cancer, for example, is being redefined as a predominantly epigenetic disease. Both global patterns in HMs and their cellular heterogeneity have been considered in building therapeutic regimens. In this article, we used Bayesian models to formalize the coexistence pattern of HMs by a formal structure of CI. Our work is a first step in the direction of a principled model-based inference and is a significant departure from approaches that rely on ad hoc reduction of the noisy data to discrete signals. The inference algorithm works directly on the raw data and is attuned to nuances from different technical platforms. Importantly, the proposed Bayesian framework across different conditions allows scientists to address important questions under coherent probabilistic inference. We treat parameters and HM networks as random quantities and implement joint posterior inferences on both. This approach leads to substantial advantages over traditional statistical methods when the data-generating mechanism is complex and involves several latent layers of unknowns. Describing the model hierarchically via different conditional probabilities allows for a tractable MCMC scheme. Moreover, incorporating multiple graphs allows us to explore posterior distributions on comparisons. For example, posterior probabilities *Pr* reveal which HM pairs {*i, j*} are differentially colocalized between conditions 1 and 2. Sorting these posterior probabilities and controlling posterior expected FDR, we can identify colocalizations of HMs that are specific to certain types of genomic regions, such as promoter regions or insulators (Figure 5). In addition, we can extract uncertainties for any subset of graphical features (eg, subgraphs, degrees) solely from MCMC summaries. Much of the computational complexity in our model derives from running secondary MCMC chains for sampling *G*^{k} and *β*^{k}. Larger sample size *n* is desirable because it provides more information on the dependence structure; larger number of nodes *m* exponentially increases the complexity of the problem because the number of graphs is in the magnitude of *m*^{2}. However, as previously discussed, parallelization techniques in the sampling of full conditional would reduce the cost considerably.

The autologistic model can be easily extended to accommodate inference for cliques of size ≥3. The extended model would include additional terms of the form with the corresponding restrictions imposed by G. The coefficients have interpretations as many-way interaction coefficients. Cliques of size 3, for example, would model 3-way interactions between triplets of HMs that are connected to one another. This would produce differential relationships between 2 HMs depending on the value of the third HM in a clique.

In conclusion, a key feature of the proposed inference is the borrowing of strength across graphs and different conditions in the hierarchical model (Equation 3.4). As shown in Figure 5, the 3 networks seem to be quite similar, and key differences among the 3 networks, albeit small, are identified with high power. Another important feature of the hierarchical model is the implicit multiplicity control. Inference for a graphical model can be characterized as a large number of comparisons. For each edge {*i, j*}, we decide whether the edge should be included. In the comparison across conditions, a similar multiplicity problem arises in selecting the edges that should be reported for differential inclusion across conditions. A hierarchical prior on the inclusion probabilities can be argued to adjust the reported probabilities for multiplicities.^{37}

## Acknowledgments

We sincerely appreciate the editors and referees for their helpful review. We also thank Jessica Chan for her excellent work on editing the article.

## Disclosures

Drs Ji and Müller’s research is partially funded by National Institutes of Health grant R01 CA132897.

## Footnotes

Guest Editors for this series are David M. Herrington, MD, MHS, and Yue (Joseph) Wang, PhD.

- © 2013 American Heart Association, Inc.

## References

- 1.↵
- 2.↵
- Bernstein BE,
- Humphrey EL,
- Erlich RL,
- Schneider R,
- Bouman P,
- Liu JS,
- et al

- 3.↵
- Roh TY,
- Cuddapah S,
- Zhao K

- 4.↵
- Andersson R,
- Enroth S,
- Rada-Iglesias A,
- Wadelius C,
- Komorowski J

- 5.↵
- Illi B,
- Scopece A,
- Nanni S,
- Farsetti A,
- Morgante L,
- Biglioli P,
- et al

- 6.↵
- 7.↵
- 8.↵
- Ohtani K,
- Vlachojannis GJ,
- Koyanagi M,
- Boeckel JN,
- Urbich C,
- Farcas R,
- et al

- 9.↵
- 10.↵
- Kurdistani S

- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- Lo WS,
- Duggan L,
- Tolga N,
- Belotserkovskya R,
- Lane WS,
- Shiekhattar R,
- et al

- 17.↵
- 18.↵
- 19.↵
- 20.↵
- Yuan M,
- Lin Y

- 21.↵
- Friedman J,
- Hastie T,
- Tibshirani R

- 22.↵
- Scott J,
- Carvalho C

- 23.↵
- Carvalho C,
- Scott J

- 24.↵
- Besag J

- 25.↵
- 26.↵
- 27.↵
- 28.↵
- Karlić R,
- Chung HR,
- Lasserre J,
- Vlahovicek K,
- Vingron M

- 29.↵
- Bogdanovic O,
- Fernandez-Miñán A,
- Tena JJ,
- de la Calle-Mustienes E,
- Hidalgo C,
- van Kruysbergen I,
- et al

- 30.↵
- 31.↵
- 32.↵
- 33.↵
- Sharma S,
- Kelly TK,
- Jones PA

- 34.↵
- Valdés-Mora F,
- Song JZ,
- Statham AL,
- Strbenac D,
- Robinson MD,
- Nair SS,
- et al

- 35.↵
- 36.↵
- 37.↵

## This Issue

## Jump to

## Article Tools

- Toward Breaking the Histone CodeRiten Mitra, Peter Müller, Shoudan Liang, Yanxun Xu and Yuan JiCirculation: Cardiovascular Genetics. 2013;6:419-426, originally published August 20, 2013https://doi.org/10.1161/CIRCGENETICS.113.000100
## Citation Manager Formats