Epigenetic Signatures of Cigarette SmokingCLINICAL PERSPECTIVE
Background—DNA methylation leaves a long-term signature of smoking exposure and is one potential mechanism by which tobacco exposure predisposes to adverse health outcomes, such as cancers, osteoporosis, lung, and cardiovascular disorders.
Methods and Results—To comprehensively determine the association between cigarette smoking and DNA methylation, we conducted a meta-analysis of genome-wide DNA methylation assessed using the Illumina BeadChip 450K array on 15 907 blood-derived DNA samples from participants in 16 cohorts (including 2433 current, 6518 former, and 6956 never smokers). Comparing current versus never smokers, 2623 cytosine–phosphate–guanine sites (CpGs), annotated to 1405 genes, were statistically significantly differentially methylated at Bonferroni threshold of P<1×10−7 (18 760 CpGs at false discovery rate <0.05). Genes annotated to these CpGs were enriched for associations with several smoking-related traits in genome-wide studies including pulmonary function, cancers, inflammatory diseases, and heart disease. Comparing former versus never smokers, 185 of the CpGs that differed between current and never smokers were significant P<1×10−7 (2623 CpGs at false discovery rate <0.05), indicating a pattern of persistent altered methylation, with attenuation, after smoking cessation. Transcriptomic integration identified effects on gene expression at many differentially methylated CpGs.
Conclusions—Cigarette smoking has a broad impact on genome-wide methylation that, at many loci, persists many years after smoking cessation. Many of the differentially methylated genes were novel genes with respect to biological effects of smoking and might represent therapeutic targets for prevention or treatment of tobacco-related diseases. Methylation at these sites could also serve as sensitive and stable biomarkers of lifetime exposure to tobacco smoke.
Cigarette smoking is a major causal risk factor for various diseases, including cancers, cardiovascular disease, chronic obstructive pulmonary disease,1 and osteoporosis.1 Worldwide cessation campaigns and legislative actions have been accompanied by a reduction in the number of cigarette smokers and corresponding increases in the number of former smokers. In the United States, there are more former smokers than current smokers.1 Despite the decline in the prevalence of smoking in many countries, it remains the leading preventable cause of death in the world, accounting for ≈6 million deaths each year.2
Clinical Perspective on p 447
Even decades after cessation, cigarette smoking confers long-term risk of diseases including some cancers, chronic obstructive pulmonary disease, and stroke.1 The mechanisms for these long-term effects are not well understood. DNA methylation changes have been proposed as one possible explanation.
DNA methylation seems to reflect exposure to a variety of lifestyle factors,3 including cigarette smoking. Several studies have shown reproducible associations between tobacco smoking and altered DNA methylation at multiple cytosine–phosphate–guanine (CpG) sites (CpGs).4–15 Some DNA methylation sites associated with tobacco smoking have also localized to genes related to coronary heart disease5 and pulmonary disease.16 Some studies have found differently associated CpGs in smokers versus nonsmokers.8,11 Consortium-based meta-analyses have been extremely successful in identifying genetic variants associated with numerous phenotypes, but large-scale meta-analyses of genome-wide DNA methylation data have not yet been widely used. It is likely that additional novel loci differentially methylated in response to cigarette smoking remain to be discovered by meta-analyzing data across larger sample sizes comprising multiple cohorts. Differentially methylated loci with respect to smoking may serve as biomarkers of lifetime smoking exposure. They may also shed light on the molecular mechanisms by which tobacco exposure predisposes to multiple diseases.
A recent systematic review13 analyzed published findings across 14 epigenome-wide association studies of smoking exposure across various DNA methylation platforms of varying degrees of coverage and varying phenotypic definitions. Among these were 12 studies (comprising 4750 subjects) that used the more comprehensive Illumina Human Methylation BeadChip 450K array (Illumina 450K), which includes and greatly expands on the coverage of the earlier 27K platform. The review compares only statistically significant published results and is not a meta-analysis that can identify signals that do not reach statistical significance in individual studies.17
In the current study, we meta-analyzed association results between DNA methylation and cigarette smoking in 15 907 individuals from 16 cohorts in the CHARGE consortium (Cohorts for Heart and Aging Research in Genomic Epidemiology) using a harmonized analysis. Methylation was measured on DNA extracted from blood samples using the Illumina Human Methylation BeadChip 450K array. In separate analyses, we compared current smokers and past smokers with nonsmokers and characterized the persistence of smoking-related CpG methylation associations with the duration of smoking cessation among former smokers. We integrated information from genome-wide association studies (GWAS) and gene expression data to gain insight into potential functional relevance of our findings for human diseases. Finally, we conducted analyses to identify pathways that may explain the molecular effects of cigarette exposure on tobacco-related diseases.
Materials and Methods
This study comprised a total of 15 907 participants from 16 cohorts of the Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium (Table I in the Data Supplement). The 16 participating cohorts are ARIC, FHS Offspring, KORA F4, GOLDN, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, GTP, CHS European Ancestry (EA), CHS African Ancestry (AA), GENOA, EPIC Norfolk, EPIC, and MESA (Multi-Ethnic Study of Atherosclerosis). Of these, 12 161 are of EA and 3746 are of AA. The study was approved by institutional review committees for each cohort, and all participants provided written informed consent for genetic research.
DNA Methylation Sample and Measurement
For most studies, methylation was measured on DNA extracted from whole blood, but some studies used CD4+ T cells or monocytes (Table I in the Data Supplement). In all studies, DNA was bisulfite converted using the Zymo EZ DNA methylation kit and assayed for methylation using the Infinium HumanMethylation 450 BeadChip, which contains 485 512 CpG sites. Details of genomic DNA preparation, bisulfite conversion, and methylation assay for each cohort can be found in the Data Supplement.
Raw methylated and total probe intensities were extracted using the Illumina Genome Studio methylation module. Preprocessing of the methylated signal (M) and unmethylated signal (U) was conducted using various software tools, primarily DASEN of wateRmelon18 and BMIQ,19 both of which are R packages. The methylation beta (β) values were defined as β=M/(M+U). Each cohort followed its own quality-control protocols, removing poor quality or outlier samples and excluding low-quality CpG sites (with detection P value >0.01). Each cohort evaluated batch effects and controlled for them in the analysis. Details of these processes can be found in the Data Supplement.
Smoking Phenotype Definition
Self-reported cigarette-smoking status was divided into 3 categories. Current smokers were defined as those who have smoked at least 1 cigarette a day within 12 months before the blood draw, former smokers were defined as those who had ever smoked at least 1 cigarette a day but had stopped at least 12 months before the blood draw, and never smokers reported never having smoked. Pack years was calculated based on self-report as the average number of cigarettes smoked per day divided by 20 multiplied by the number of years of smoking, with zero assigned to never smokers. A few cohorts recorded the number of years since each former smoker had stopped smoking.
Cohort-Specific Analyses and Meta-Analysis
Each cohort analyzed its data using at least 2 linear mixed-effect models. Each model was run separately for each CpG site. Model 1 is as follows:(1)
where blood count comprises the fractions of CD4+ T cells, CD8+ T cells, NK cells, monocyte, and eosinophils either measured or estimated using the Houseman et al method.20 The blood count adjustment was performed only in cohorts with whole-blood and leukocyte samples. Familial relationship was also accounted for in the model when applicable (eg, for FHS, see Data Supplement for details). Acknowledging that each cohort may be influenced by a unique set of technical factors, we allow each cohort to choose its cohort-specific technical covariates. Model 2 added to model 1 body mass index because it is associated with methylation at some loci, making it a potential confounder.21 Only 3 cohorts participated in model 2 analysis: FHS, KORA, and NAS. Model 3 substituted smoking phenotypes for pack years. Only 3 cohorts participated in model 3 analysis: FHS, Rotterdam, and Inchianti. The pack-year analysis was performed only on 2 subsets: current versus never smokers and former versus never smokers. Combining all 3 categories would require accurate records of time of quitting, which among the 3 cohorts was available for only FHS. To investigate cell type differences, we removed blood counts from model 1 and called it model 4. Only 3 cohorts participated in this analysis: FHS, KORA, and NAS. All models were run with the lme4 package22 in R,23 except for FHS (see Data Supplement for details).
Meta-analysis was performed to combine the results from all cohorts. Because of the variability of available CpG sites after quality-control steps, we excluded CpG sites that were available in <3 cohorts. The remaining 485 381 CpG sites were then meta-analyzed with a random-effects model using the following formula:(2)
where Ei is the observed effect of study i, μ is the main smoking effect, si is the between-study error for study i, and ei is the within-study error for study i, with both si and ei are assumed to be normally distributed. The model is fitted using the restricted maximum likelihood criterion in R’s metafor24 package. Multiple-testing adjustment on the resulting P values was performed using the false discovery rate (FDR) method of Benjamini and Hochberg.25 In addition, we also report results using the Bonferroni-corrected threshold of 1×10−7 (≈0.05/485 381).
The regression coefficient β (from meta-analysis) is interpretable as the difference in mean methylation between current and never smokers. We multiplied these by 100 to represent the percentage methylation difference where methylation ranges from 0% to 100%.
Literature Review to Identify Genes Previously Associated With Smoking and Methylation
We used the same literature search strategy published previously.26 A broad query of NCBIs PubMed literature database using medical subject heading (MeSH) terms (“((((DNA Methylation[Mesh]) OR methylation)) AND ((Smoking[Mesh]) OR smoking))”) yielded 775 results when initially performed on January 8, 2015, and 789 studies when repeated to update the results on March 1, 2015. Results were reviewed by abstract to determine whether studies met inclusion criteria: (1) performed in healthy human populations, (2) agnostically examined >1000 CpG sites at a time, (3) only cigarette exposure was considered, and (4) with public reporting of P values and gene annotations. A total of 25 publications met inclusion criteria, listed in the fourth supplementary table of Joubert et al.26 CpG-level results (P values and gene annotations) for sites showing genome-wide statistically significant associations (FDR <0.05) were extracted and resulted in 1185 genes previously associated with adult or maternal smoking. All CpGs annotated to these 1185 genes were marked as previously found.
Gene-Set Enrichment Analysis
Gene-set enrichment analysis27 was performed in the website (http://software.broadinstitute.org/gsea/msigdb/annotate.jsp) on significant findings to determine putative functions of the CpG sites. We selected gene ontology biological process (C5-BP) and collected all categories with FDR <0.05 (≤100 categories).
Enrichment Analysis for Localization to Different Genomic Features
Enrichment analysis on genomic features were performed using the annotation file supplied by Illumina (version 1.2; downloaded from manufacturer’s website, http://support.illumina.com/array/array_kits/infinium_humanmethylation450_beadchip_kit/downloads.html), which contains information of CpG location relative to gene (ie, body, first exon, 3′ UTR, 5′ UTR, within 200 base pairs of transcriptional start site [TSS200], and within 1500 base pairs of transcriptional start site [TSS1500], the relation of CpG site to a CpG island (ie, island, northern shelf, northern shore, southern shelf, and southern shore), whether the CpG site is known to be in differentially methylated regions, and whether the CpG site is known to be an enhancer or a DNAse I hypersensitive site. Enrichment analysis was performed using 1-sided Fisher exact set for each feature, using R’s fisher.test.
We intersected our results with single-nucleotide polymorphisms (SNPs) having GWAS P values ≤5×10−8 in the National Human Genome Research Institute GWAS catalog (accessed November 2, 2015).28 The catalog contained 9777 SNPs annotated to 7075 genes associated with 865 phenotypes at P≤5×10−8. To determine the genes, we looked up each significant CpG on the annotation file supplied by Illumina. Enrichment analysis was performed on a per-gene basis using 1-sided Fisher exact test.
For bone mineral phenotype enrichment, we included all SNPs containing terms bone mineral density or osteoporosis. For cardiovascular disease, we included all SNPs containing terms cardiovascular disease, stroke, coronary disease, cardiomyopathy, or myocardial infarction. For cardiovascular disease risk factors, we included all SNPs containing terms blood pressure, cholesterol, diabetes, obesity, or hypertension. For overall cancer enrichment, we included all SNPs containing terms cancer, carcinoma, or lymphoma, while removing those pertaining to cancer treatment effects. For overall pulmonary phenotype enrichment, we included all SNPs containing terms pulmonary disease, pulmonary function, emphysema, asthma, or airflow obstruction.
Analysis of Persistence of Methylation Signals With Time Since Quitting Smoking Among Former Smokers
We examined whether smoking methylation associations were attenuated over time in the FHS cohort, which had ascertained longitudinal smoking status of >35 years. The analysis was performed on 7 dichotomous variables, indicating cessation of smoking for 5, 10, 15, 20, 25, and 30 years versus never smokers. For example, for 5-year cessation variable, those who quit smoking before ≥5 years are marked as ones, whereas never smokers are marked as zeroes, and current smokers are excluded. For this analysis, we used the pedigreemm package29 with the same set of covariates as in the primary analysis. Sites with P<0.002 across all 7 variables were deemed to be statistically significant compared with never-smoker levels.
Methylation by Expression Analysis
To determine transcriptomic association of each significant CpG site, we interrogated such CpG sites in the FHS gene-level methylation by expression database, at genome-wide FDR <0.05. The methylation by expression database was constructed from 2262 individuals from the FHS Offspring cohort attending examination cycle 8 (2005–2008) with both whole-blood DNA methylation and transcriptomic data based on the Affymetrix Human Exon Array ST 1.0. Enrichment analysis was performed using a 1-sided Fisher exact test. We defined that the methylation CpG site and the corresponding transcript are associated in cis if the location of the CpG site is within 500 kilobases of the transcript’s start location.
Analysis of Ethnic Discrepancy Between AA and EA Cohorts
Meta-analysis of the current versus never smoker results of EA cohorts (FHS, KORA, GOLDN, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, EPIC, EPIC Norfolk, MESA, and CHS-EA) was performed separately from those of AA cohorts (ARIC, GTP, GENOA, and CHS-AA).
Analysis of Sample Types for DNA Extraction
Meta-analysis was performed on the results from cohorts with whole blood/buffy coat samples (FHS, KORA, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, GTP, CHS-EA, CHS-AA, ARIC, GENOA, EPIC, and EPIC Norfolk). CD4+ samples in GOLDN and CD14+ samples in MESA, because they comprise single cohorts, are not meta-analyzed. Correlations of results across different cell types were performed on CpG sites with FDR <0.05 in at least one cell type.
Table 1 displays the characteristics of participants in the meta-analysis. The proportion of participants reporting current smoking ranged from 4% to 33% across the different study populations. The characteristics of the participants within each cohort are provided in Table I in the Data Supplement.
Current Versus Never Smokers
In the meta-analysis of current cigarette smokers (n=2433) versus never smokers (n=6956), 2623 CpGs annotated to 1405 genes met Bonferroni significance after correction for 485 381 tests (P<1×10−7). On the basis of genome-wide FDR< 0.05, 18 760 CpGs annotated to 7201 genes were differentially methylated. There was a moderate inflation factor30 λ of 1.32 (Figure I in the Data Supplement), which is consistent with a large number of sites being impacted by smoking. Our results lend support to many previously reported loci,7,8,11,13 including CpGs annotated to AHRR, RARA, F2RL3, and LRRN3 (Table II in the Data Supplement). Not surprisingly, cg05575921 annotated to AHRR, the top CpG identified in most previous studies of smoking, was highly significant in our meta-analysis (P=4.6×10−26; ranked 36, Table II in the Data Supplement) and also had the largest effect size (−18% difference in methylation), which is comparable to effect sizes in previous studies.18 Of the 18 760 significant CpGs at FDR <0.05, 16 673 (annotated to 6720 genes) have not been previously reported to be associated with cigarette smoking—these include 1500 of the 2623 CpGs that met Bonferroni significance. The 25 CpGs with lowest P values for both overall and novel findings are shown in Table 2. Table II in the Data Supplement provides the complete list of all CpGs that were significantly differentially methylated (FDR <0.05) in analysis of current versus never smokers. Adding body mass index into the model did not appreciably alter the results (Figure II in the Data Supplement).
Methylation can be either reduced or increased at CpG sites in response to smoking. For the 53.2% of FDR-significant CpGs with increased methylation in response to current smoking, the mean percentage difference in methylation between current and never smokers was 0.5% (SD=0.37%; range, 0.06–7.3%). For 46.8% of CpGs with decreased methylation in response to current smoking, the mean percentage difference was 0.65% (SD=0.56; range, 0.04–18%) The volcano plot can be found in Figure III in the Data Supplement.
We did not observe correlation between the number of significant CpGs and either the size of the gene or the number of exons or the coverage of the methylation platform. We performed a formal enrichment test for each of the 7201 genes in regard to the length of the gene or number of exons and found only 3 for which associations were observed (AHRR, PRRT1, and TNF). However, given the robust findings for a specific CpG in AHRR in multiple studies in the literature4,7,9 and our own, and its key role in the AHR pathway, which is crucial in response to polyaromatic hydrocarbons, such as are produced by smoking,31 it seems unlikely that the AHRR findings are false positives. Likewise, there is strong support in the literature for PRRT132 and TNF.33 The enrichment results for methylation platform coverage also yielded the same 3 genes.
In a subset of 3 cohorts (1827 subjects), we investigated the association of the number of pack years smoked with the 18 760 CpGs that were differentially methylated (FDR <0.05) between current versus never smokers. Significant dose responses were observed for 11 267 CpGs (60.1%) at FDR <0.05 (Table III in the Data Supplement).
To investigate the pathways implicated by these genes, we performed a gene-set enrichment analysis34 on the annotated genes. The results suggested that cigarette smoking is associated with potential changes in numerous vital molecular processes, such as signal transduction (FDR=2.8×10−79), protein metabolic processes (FDR=1.2×10−43), and transcription pathways (FDR=8.4×10−31). The complete list of 99 enriched molecular processes can be found in Table IV in the Data Supplement.
Former Versus Never Smokers
Meta-analysis of former (n=6518) versus never smokers (n=6956) restricted to the 18 760 CpG sites that were differentially methylated in current versus never smokers identified 2568 CpGs annotated to 1326 genes at FDR <0.05 (Table V in the Data Supplement). There were 185 CpGs (annotated to 149 genes) that also met Bonferroni correction (P<0.05/18760≈2.67×10−6). There was no evidence of inflation30 (λ=0.98) (Figure IV in the Data Supplement). We also confirmed previously reported findings for CpGs annotated to AHRR, RARA, and LRRN3.7,8,11,13 Effect sizes of these CpGs were all weaker than that in the analysis of current versus never smokers (61.2%±15.3% weaker) for the 2568 CpGs that remained significantly differentially methylated in former versus never smokers compared with current versus never smokers. Results for the top 25 CpGs are displayed in Table 3. Adding body mass index to the model did not appreciably alter the results (Figure V in the Data Supplement). A volcano plot can be found in Figure VI in the Data Supplement. In a subset of 3 cohorts (3349 subjects), analyses using pack years confirmed a significant dose response for 1804 of the 2568 CpGs (70%) annotated to 942 genes at FDR <0.05 (Table VI in the Data Supplement).
The gene-set enrichment analysis27 in the former versus never smoker analyses on all 1326 genes revealed enrichment for genes associated with protein metabolic processes (FDR=1.1×10−23), RNA metabolic processes (FDR=1.4×10−17), and transcription pathways (FDR=3.9×10−18; Table VII in the Data Supplement). The gene-set enrichment analysis on the 942 genes for which the 1804 CpGs exhibited dose responses with pack years also revealed similar pathways to those summarized in Table VII in the Data Supplement, except with weaker enrichment FDR values.
In 2648 Framingham Heart Study participants with ≤30 years of prospectively collected smoking data, we examined the 2568 CpGs that were differentially methylated in meta-analysis of former versus never smokers and explored their associations with time since smoking cessation. Methylation levels of most CpGs returned toward that of never smokers within 5 years of smoking cessation. However, 36 CpGs annotated to 19 genes, including TIAM2, PRRT1, AHRR, F2RL3, GNG12, LRRN3, APBA2, MACROD2, and PRSS23, did not return to never-smoker levels even after 30 years of smoking cessation (Figure; Table 4).
The EPIC studies included cancer cases plus noncancer controls analyzed together, adjusting for cancer status. The other studies were population-based samples not selected for disease status. To evaluate residual confounding by cancer status after adjustment, we repeated the meta-analysis without the EPIC studies. The effect estimates were highly correlated: Pearson ρ=0.99 for current versus never smoking and 0.98 for former smoking versus never.
Enrichment Analysis for Genes Identified in GWAS of Smoking-Related Phenotypes
To identify potential relevance of the differentially methylated genes to smoking-related phenotypes, we determined whether these genes had been associated with smoking-related phenotypes in the National Human Genome Research Institute-EBI GWAS Catalog28 (accessed November 2, 2015). The catalog contained 9777 SNPs annotated to 7075 genes associated with 865 phenotypes at P≤5×10−8. Of the 7201 genes (mapped by 18 760 CpG sites) significantly differentially methylated in current versus never smokers, we found overlap with 1791 genes (4187 CpGs are mapped to these) associated in GWAS with 700 phenotypes (enrichment P=2.4×10−52). We identified smoking-related traits using the 2014 US Surgeon General’s report.1 Enrichment results for a selection of smoking-related phenotypes, including coronary heart disease and its risk factors, various cancers, inflammatory diseases, osteoporosis, and pulmonary traits, are available in Table 5. We also performed the same enrichment analysis on the 2568 CpGs associated with former versus never-smoking status. We identified enrichment for coronary heart disease, pulmonary traits, and some cancers (Table 5). More detailed results are available in Tables VIII and IX in the Data Supplement. Differentially methylated genes in relation to smoking status that are associated in GWAS with coronary heart disease or coronary heart disease risk factors are available in Table X in the Data Supplement. We also performed enrichment analyses on phenotypes that have no clear relationships to smoking, such as male pattern baldness (P=0.0888), myopia (P=0.1070), thyroid cancer (P=0.2406), and testicular germ cell tumor (P=0.3602) and did not find significant enrichment.
Enrichment Analysis for Genomic Features
We examined the differentially methylated CpGs with respect to localization to different genomic regions including CpG islands, gene bodies, known differentially methylated regions, and sites identified as likely to be functionally important in the ENCODE project such as DNAse1 hypersensitivity sites and enhancers (refer to the Methods section for details). We performed this analysis separately for the CpGs related to current smoking and past smoking (Table XI in the Data Supplement). Trends were similar for the 2 sets of CpGs, although the power to identify enrichment was much greater for the larger set of 18 760 CpGs related to current smoking. There was no enrichment for CpG islands. In contrast, significant enrichment was observed for island shores, gene bodies, DNAse1 hypersensitivity sites, and enhancers.
Of the 18 760 statistically significant CpG sites associated with current smoking in the meta-analysis, 1430 were significantly associated in cis with the expression of 924 genes at FDR <0.05 (enrichment P=3.6×10−215; Table XII in the Data Supplement) using whole-blood samples from 2262 Framingham Heart Study participants. Of these, 424 CpGs associated with the expression of 285 genes were replicated at FDR <0.0001 in 1264 CD14+ samples from the MESA.35 These genes are associated with pathways similar to those described earlier (Table XIII in the Data Supplement).
Comparison Between AA and EA
Meta-analysis of the current versus never smokers in 11 cohorts with participants of EA (n=6750 subjects) yielded 10 977 CpGs annotated to 4940 genes at FDR <0.05. Meta-analysis of the results of the smaller data set of 4 cohorts with AA participants (n=2639) yielded 3945 CpGs annotated to 2088 genes at FDR <0.05. The effect estimates of the CpGs significant in at least one ancestry (12 927 CpGs) were highly correlated in the combined group of individuals of either ancestry (Spearman ρ=0.89). The results by ancestry are shown in Table XIV in the Data Supplement.
We performed the same ancestry-stratified analysis on former versus never smokers (Table XV in the Data Supplement). Meta-analysis of the results of EA participants yielded 2045 CpG sites annotated to 1081 genes at FDR <0.05. Meta-analysis of the results of AA participants yielded 329 CpG sites annotated to 178 genes at FDR <0.05. The effect estimates of the union of CpGs significant in at least one ancestry (2234 CpGs) were correlated in the combined group of individuals of either ancestry (Spearman ρ=0.75). Of note, one of CpG sites showing differential methylation in ancestry, cg00706683, mapped to gene ECEL1P2, did not return to never-smoker levels 30 years after smoking cessation (Table 4).
To more directly compare results by ethnicity, removing the effect of better statistical power in the larger EA sample size, we performed a meta-analysis on subset of EA cohorts: the Framingham Heart Study, Rotterdam Study, and KORA, such that the total number of smokers, the major determinant of power, would match that of AA cohorts. In this subset, similar correlations of the effect estimates were observed as in the complete analyses, suggesting that the differences in number of statistically significant CpGs are indeed because of better power in the EA cohorts (Spearman ρ=0.87 and 0.79 for current versus never smokers and former versus never smokers, respectively).
Cell Type Adjustment
We adjusted our main analyses for white blood cell fractions, in studies based on either whole blood or leukocytes from the buffy coat of whole blood, either measured or using a published method.20 Reassuringly, results before and after cell type adjustment were highly comparable. The correlation of regression coefficients before and after adjustment is 0.85 for the current versus never-smoker analysis (Figure VII in the Data Supplement). Similarly for the analysis of former versus never smokers, the effect estimates were highly correlated before and after adjustment (ρ=0.93; Figure VIII in the Data Supplement). In addition, in 2 cohorts, we had results from specific cell fractions—CD4+ cells in GOLDN and CD14+ cells in MESA. The correlation of results between buffy coat and CD4+ or CD14+ for former versus never smokers are generally high (ρ>0.74; Table XVI in the Data Supplement).
Methylation Profile Across CpG Sites
We assessed methylation profile in FHS cohort as a representative cohort in the study. The profile of all 485 381 analyzed CpG sites can be found in Figure IX in the Data Supplement. The profile across 18 760 CpG sites significantly associated with current versus never smoking status can be found in Figure X in the Data Supplement. These plots indicate that most CpG sites with less dynamic range are largely not statistically significant in our results.
We performed a genome-wide meta-analysis of blood-derived DNA methylation in 15 907 individuals across 16 cohorts and identified broad epigenome-wide impact of cigarette smoking, with 18 760 statistically significant CpGs (FDR <0.05) annotated to >7000 genes, or roughly one third of known human genes. These genes in turn affect multiple molecular mechanisms and are implicated in smoking-related phenotypes and diseases. In addition to confirming previous findings from smaller studies, we detected >16 000 novel differentially methylated CpGs in response to cigarette smoking. Many of these genes have not been previously implicated in the biological effects of tobacco exposure. The large number of genes implicated in this well-powered meta-analysis might on first glance raise concerns about false positives. However, on further consideration, given the widespread impact of smoking on disease outcomes across many organ systems and across the life span,1 the identification of a large number of genes at genome-wide significance is not surprising. In addition, our findings are robust and consistent across all 16 cohorts (Tables II and V in the Data Supplement) because we accounted for interstudy variability by using random-effect meta-analyses, which is conservative when heterogeneity is present.36 The implicated genes are mainly involved in molecular machineries, such as transcription and translation. Furthermore, differential methylation of a subset of CpGs persisted, often for decades, after smoking cessation.
We found that genes differentially methylated in relation to smoking are enriched for variants associated in GWAS with smoking-related diseases,1 including osteoporosis, colorectal cancers, chronic obstructive pulmonary disease, pulmonary function, cardiovascular disease, and rheumatoid arthritis. We find it noteworthy that there is enrichment of smoking-associated CpGs for genes associated with rheumatoid arthritis because DNA methylation is one of the proposed molecular mechanisms underlying this disease.37 It is also interesting that the most significant association of smoking with methylation was for the gene HIVEP3 (a.k.a. Schnurri3), the mammalian homolog of the Drosophila zinc finger adapter protein Shn.38 This gene regulates bone formation, an important determinant to osteoporosis, which was one of the enriched GWAS phenotypes.
When we examined time since smoking cessation, we found that the majority of the differentially methylated CpG sites observed in analysis of current versus never smokers returned to the level of never smokers within 5 years of smoking cessation. This is consistent with the fact that risks of many smoking-related diseases revert to nonsmoking levels within this period of time. Our results also indicate that cigarette smoking induces long-lasting alterations in DNA methylation at some CpGs. Although speculative, it is possible that persistent methylation changes at some loci might contribute to risks of some conditions that remain elevated after smoking cessation.
In all but 2 of our 14 cohorts, DNA was extracted from the entire circulating leukocyte population. Thus, there is the possibility of confounding by the effects of smoking on differential cell counts. We attempted to adjust for cell type and found that results were generally little changed by the adjustment.
Our significant results are highly enriched for CpG sites associated with the expression of nearby genes (ie, in cis) even though a single measurement of gene expression in blood is probably subject to considerably more within-subject variability than DNA methylation,39 limiting our ability to find correlations. Differential DNA methylation at many of the CpGs we identified in relation to smoking status may have a functional impact on nearby gene expression. Our analysis of genomic regions further supports the potential functional impact of our findings on gene expression. We demonstrated enrichment for sites with greater functional impact, such as island shores, gene bodies, DNAse1 hypersensitivity sites, and enhancers, whereas we found no enrichment for CpG islands. These results reinforce previous findings showing that island shores, enhancers, and DNAse I hypersensitive sites are more dynamic (ie, susceptible to methylation changes) than CpG islands,40 which may be more resistant to abrupt changes in DNA methylation in response to environmental exposures.41 Thus, our results suggest that many of the smoking-associated CpG sites may have regulatory effects.
Although identification of changes in methylation patterns may suggest mechanisms by which exposure to tobacco smoke exerts its effects on several disease processes, DNA methylation profiles can also serve as biomarkers of exposure to tobacco smoke. Cotinine is a biomarker only of recent smoking; DNA methylation signals have the potential to serve as robust biomarkers of smoking history.9,42 Indeed, several studies have identified several of such markers.5,42,43 The large number of persistently modified CpGs may be useful to develop even more robust biomarkers to objectively quantify long-term cigarette-smoking exposure for prediction of risk for health outcomes in settings where smoking history is not available or is incomplete and to validate self-reported never-smoker status. Furthermore, our analyses of both former and current smokers show dose-dependent effects at many CpGs (Tables III and VII in the Data Supplement). Methylation-based biomarkers could be informative for investigating dose–response relationships with disease end points. This is useful because smokers often under-report the amount of smoking, both current and historical.
It is possible that smoking-related conditions or correlated exposures may contribute to some of the methylation signatures identified. However, our studies are nearly all population-based studies composed of predominantly healthy individuals, not selected for smoking-related disease. Given the number, strength, and robustness to replication of findings for smoking across the literature and among our diverse cohorts from various countries, the likelihood that these are confounded by other exposures or conditions related to smoking is greatly reduced.
There are several potential limitations to our study. First, the cross-sectional design limits our ability to study the time course of smoking effects. In addition, we analyzed methylation in DNA samples from blood, which is readily accessible. Although we demonstrated that blood-derived DNA reveals a strong and robust signature of cigarette-smoking exposure, studies in target tissues for smoking-related diseases (eg, heart and lung) would be of additional interest. In addition, our analyses could not distinguish direct effects of smoking from indirect effects of smoking because of smoking-induced changes in cell metabolism, organ function, inflammation, or injury that could in turn influence methylation. However, this is the largest examination to date of the effects of smoking on DNA methylation with 16 studies from different countries contributing.
In conclusion, we identify an order of magnitude more sites differentially methylated in relation to smoking across the genome than have been previously seen. Many of these signals persist long after smoking cessation, providing potential biomarkers of smoking history. These findings may provide new insights into molecular mechanisms underlying the protean effects of smoking on human health and disease.
We would like to thank Bonnie R. Joubert, PhD, and Frank Day, PhD, of the National Institute of Environmental Health Sciences (Research Triangle Park, NC) and Jianping Jin, PhD, of Westat (Durham, NC) for expert computational assistance. Additional Acknowledgments can be found in the Data Supplement.
Sources of Funding
Infrastructure for the CHARGE consortium is provided by the National Heart, Lung, and Blood Institute grant R01HL105756. This work was supported in part by the Intramural Research Program of the NIH; National Institute of Environmental Health Sciences; and the National Heart Lung and Blood Institute. Additional funding sources for each cohort can be found in the Data Supplement.
B.M. Psaty serves on Data Safety Monitoring Board (DSMB) of a clinical trial of a device funded by the manufacturer (Zoll LifeCor) and on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. C.E. Elks is currently employed by Astra Zeneca, although the work was completed before the employment. The other authors report no conflicts.
From the Institute for Aging Research, Hebrew SeniorLife (R.J., D.P.K.), Department of Medicine, Beth Israel Deaconess Medical Center (R.J., D.P.K.), Channing Division of Network Medicine, Brigham and Women’s Hospital (D.L.D.), and Department of Psychiatry (K.J.R.), Harvard Medical School, Boston, MA; Population Sciences Branch, National Heart, Lung, and Blood Institute (R.J., T.H., C.L., M.M.M., C.Y., D.L.) and Laboratory of Neurogenetics, National Institute on Aging (D.G.H., A.B.S.), National Institutes of Health, Bethesda, MD; Framingham Heart Study, MA (R.J., T.H., C.L., M.M.M., C.Y., D.L.); Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, NY (A.C.J.); Centre for Cognitive Ageing and Cognitive Epidemiology (R.E.M., P.M.V., J.M.S., I.J.D.), Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine (R.E.M.), Alzheimer Scotland Dementia Research Centre (J.M.S.), and Department of Psychology, University of Edinburgh (I.J.D.), United Kingdom; Queensland Brain Institute (R.E.M., R.H.S., A.F.M., P.M.V., N.R.W.) and University of Queensland Diamantina Institute, Translational Research Institute (A.F.M., P.M.V.), University of Queensland, Brisbane, Australia; Epidemiology and Public Health Group, Institute of Biomedical and Clinical Science, University of Exeter Medical School, United Kingdom (L.C.P., D.M.); Department of Epidemiology and Prevention, Division of Public Health Sciences (L.M.R., C.J.R., Y.L.), Department of Biostatistical Sciences, Division of Public Health Sciences (K.L.), and Department of Internal Medicine (J.D.), Wake Forest School of Medicine, Winston-Salem, NC; Department of Internal Medicine (P.R.M., A.G.U., J.B.J.v.M.), Department of Clinical Chemistry (P.R.M.), and Department of Epidemiology (A.H.), Erasmus University Medical Center, Rotterdam, The Netherlands; Division of Biostatistics (W.G.) and Division of Epidemiology and Community Health (J.S.P.), School of Public Health, University of Minnesota, Minneapolis; Research Unit of Molecular Epidemiology, Institute of Epidemiology II, Helmhotz Zentrum Muenchen, Munich, Germany (T.X., S.K., A.P., R.W.-S., M.W.); MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, United Kingdom (C.E.E., N.J.W., K.K.O.); Department of Epidemiology, University of Alabama at Birmingham (S.A., J.S., M.R.I., D.K.A.); Autonomous Metropolitan University-Iztapalapa, Mexico City, Mexico (H.M.-M.); International Agency for Research on Cancer, Lyon, France (H.M.-M., S.A., Z.H., I.R.); Department of Epidemiology, School of Public Health (J.A.S., W.Z., E.B.W., S.L.R.K.) and Research Center for Group Dynamics, Institute for Social Research (E.B.W.), University of Michigan, Ann Arbor; Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services (J.A.B., B.M.P., B.R.S.), Center for Lung Biology, Division of Pulmonary and Critical Care Medicine, Department of Medicine (S.A.G.) and Cardiovascular Health Research Unit, Division of Cardiology, Department of Epidemiology (N.S.), University of Washington, Seattle; Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA (R.D.); School of Public Health, University of California, Berkeley (P.Y., E.W.D.); HudsonAlpha Institute for Biotechnology, Huntsville, AL (D.M.A.); Clinical Research Branch, National Institute on Aging, Baltimore, MD (L.F.); Human Genetics Center, School of Public Health (J.B., M.L.G., M.F.) and School of Biomedical Informatics (D.Z.), The University of Texas Health Science Center at Houston; Department of Cardiology, Boston Children’s Hospital, Boston, MA (M.M.M.); Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg (M.B.); MRC/PHE Centre for Environment and Health, School of Public Health, Imperial College London, United Kingdom (P.V.); HuGeF Foundation, Torino, Italy (P.V.); Department of Epidemiology (J.S., A.A.B.) and Department of Environmental Health (A.A.B.), Harvard T.H. Chan School of Public Health, Boston, MA; Department of Preventive Medicine and the Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL (L.H.); VA Normative Aging Study, VA Boston Healthcare System & Department of Medicine, Boston University School of Medicine, Boston, MA (P.S.V.); Geriatric Unit, Azienda Sanitaria di Firenze, Florence, Italy (S.B.); Division of Nephrology & Hypertension, Mayo Clinic, Rochester, MN (S.T.T.); Department of Psychiatry and Behavioral Sciences (E.B.B., A.K.S., K.J.R.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (K.N.C.); Department of Translational Research in Psychiatry, Max-Planck Institute of Psychiatry, Munich, Germany (T.K., E.B.B.); Division of Depression & Anxiety Disorders, McLean Hospital, Belmont, MA (T.K., K.J.R.); Group Health Research Institute, Group Health Cooperative, Seattle, WA (B.M.P.); Institute for Translational Genomics & Population Sciences, Los Angeles BioMedical Research Institute (K.D.T.), Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance (K.D.T.); Departments of Pediatrics, Medicine, and Human Genetics, UCLA, Los Angeles, CA (K.D.T.); Harvard School of Public Health (L.L.); Boston University School of Medicine (G.T.O.); and Epidemiology Branch, Department of Health and Human Services, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC (S.J.L
Guest Editor for this article was Christopher Semsarian, MBBS, PhD, MPH.
↵* Drs Joehanes, Just, Marioni, Pilling, Reynolds, Guan, Xu, Elks, Aslibekyan, Moreno-Macias, J.A. Smith, Brody, Dhingra, and P.R. Mandaviya contributed equally as first authors.
↵† Drs. Conneely, Sotoodehnia, Kardia, Melzer, Baccarelli, van Meurs, Romieu, Arnett, Ong, Y. Liu, Waldenberger, Deary, Fornage, Levy, and London contributed equally as senior authors.
The Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.116.001506/-/DC1.
- Received January 19, 2016.
- Accepted August 16, 2016.
- © 2016 American Heart Association, Inc.
- 1.↵National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Atlanta, GA: Centers for Disease Control and Prevention (US); 2014.
- 2.↵World Health Organization. WHO global report on trends in prevalence of tobacco smoking 2015. Available at http://apps.who.int/iris/bitstream/10665/156262/1/9789241564922_eng.pdf.
- Breitling LP,
- Salzmann K,
- Rothenbacher D,
- Burwinkel B,
- Brenner H
- Wan ES,
- Qiu W,
- Baccarelli A,
- Carey VJ,
- Bacherman H,
- Rennard SI,
- et al
- Shenker NS,
- Polidoro S,
- van Veldhoven K,
- Sacerdote C,
- Ricceri F,
- Birrell MA,
- et al
- Guida F,
- Sandanger TM,
- Castagné R,
- Campanella G,
- Polidoro S,
- Palli D,
- et al
- Wauters E,
- Janssens W,
- Vansteenkiste J,
- Decaluwé H,
- Heulens N,
- Thienpont B,
- et al
- Garg AX,
- Hackam D,
- Tonelli M
- Teschendorff AE,
- Marabita F,
- Lechner M,
- Bartlett T,
- Tegner J,
- Gomez-Cabrero D,
- et al
- Bates D,
- Mächler M,
- Bolker B,
- Walker S
- 23.↵R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Development Core Team; 2010.
- Benjamini Y,
- Hochberg Y
- Su AI,
- Wiltshire T,
- Batalov S,
- Lapp H,
- Ching KA,
- Block D,
- et al
- Hindorff LA,
- Sethupathy P,
- Junkins HA,
- Ramos EM,
- Mehta JP,
- Collins FS,
- et al
- Martey CA,
- Baglole CJ,
- Gasiewicz TA,
- Sime PJ,
- Phipps RP
- Subramanian A,
- Tamayo P,
- Mootha VK,
- Mukherjee S,
- Ebert BL,
- Gillette MA,
- et al
- Liu Y,
- Ding J,
- Reynolds LM,
- Lohman K,
- Register TC,
- De La Fuente A,
- et al
- Jones DC,
- Wein MN,
- Oukka M,
- Hofstaetter JG,
- Glimcher MJ,
- Glimcher LH
- Suderman M,
- Pappas JJ,
- Borghol N,
- Buxton JL,
- McArdle WL,
- Ring SM,
- et al
We combined data from 16 cohorts (15 907 individuals) examining genome-wide methylation, a type of epigenetic modification, in blood DNA, in relation to smoking status. In this large-scale meta-analysis, thousands of DNA methylation cytosine-p-guanine sites were associated with current versus never-smoking status. These methylation signals reside in genes that are associated with numerous diseases caused by cigarette smoking, such as cardiovascular diseases and certain cancers. Of the thousands of cytosine-p-guanine sites differentially methylated in current versus never smokers, >10% also were significantly associated with former versus never-smoking status. Although many of these former smoker methylation signals return to never-smoker levels with 5 years of quitting, a substantial proportion remain elevated even after 30 years of cessation. We also found widespread evidence that many differentially methylated sites also are related to gene expression, showing a functional impact on the genome. Furthermore, in our analyses, these cigarette-smoking DNA methylation signals affect genes important to fundamental molecular pathways, such as molecular signal transduction, protein metabolic processes, and transcription. In conclusion, cigarette smoking has a widespread and long-lasting impact on DNA methylation. DNA methylation is one potential mechanism by which tobacco exposure predisposes to numerous adverse health outcomes.