Gene-Targeted Analysis of Copy Number Variants Identifies 3 Novel Associations With Coronary Heart Disease TraitsClinical Perspective
Background—Copy number variants (CNVs) are a major form of genomic variation, which may be implicated in complex disease phenotypes. However, investigation of the role of CNVs in coronary heart disease (CHD) traits has been limited.
Methods and Results—We examined the use of the cnvHap algorithm for CNV detection, using data for 2500 men from the Second Northwick Park Heart Study (NPHS-II). An Illumina custom chip, including 722 single-nucleotide polymorphisms covering 76 coronary heart disease-trait genes, was used. Common CNVs were significantly associated (at P<0.05, after correction) with coronary heart disease phenotypes in 5 genes. Novel associations of CNVs in toll-like receptor-4 with apolipoprotein AI were replicated (P<0.05) in the Whitehall II cohort (4887 subjects), whereas newly described associations of CNVs in sterol regulatory element-binding protein with apolipoprotein AI and associations of interleukin-6 signal transducer with apolipoprotein B were replicated in the data from 3546 subjects from the North Finnish Birth Cohort 1966 (P<0.05).
Conclusions—This study supports the use of CNV detection algorithms such as cnvHap as potential tools for the identification of novel CNVs, some of which show significant association and replication with coronary heart disease risk phenotypes. However, the functional basis for these associations requires further substantiation.
- cardiovascular disease risk factors
- cardiovascular diseases
- genetic association
- high-density lipoprotein cholesterol
Over the past decade, single-nucleotide polymorphisms (SNPs) have formed the basis of genetic association studies and genome-wide association scans (GWAS),1–3 but since 2004, there has been increasing evidence of genomic regions also exhibiting interindividual heterogeneity in copy number. These copy number variants (CNVs), identified by a range of genome scanning technologies,4–7 range from 1 kb to several megabases in size and can reflect duplications, deletions, or complex multisite rearrangements and have been described over the majority of the human genome.8 CNVs may alter gene function and gene dosage or exert their effects in the noncoding regions of the genome by influencing expression levels without directly affecting the gene function.8,9 Although not always associated with clear phenotypic effects,10,11 CNVs have been implicated in common complex pathologies,8,9 including coronary heart disease (CHD).12 The ubiquity of CNV in individual genomes, as well as at the population level, is highlighted by the catalog of almost 67 000 CNVs, corresponding to > 15 000 loci, detailed on the Centre for Applied Genomics Database for Genomic Variants (DGV http://projects.tcag.ca/variation) as of May 2012.10
Detection and characterization of CNVs remain somewhat complex,9 with the most precise current method being array comparative genomic hybridization. A potentially more cost-effective, convenient, and practical approach is the use of algorithms that detect CNVs using preexisting SNP array data from Illumina13–15 or Affymetrix16 platforms, eg, using data from GWAS. The recently developed CNV detection algorithm used in this study, cnvHap, analyzes data from multiple platforms.17
Clinical Perspective on p 560
In the present study, a cohort of 2500 men in the second Northwick Park Heart Study (NPHS-II) was genotyped using a customized Illumina bundle that included 722 SNPs covering 76 CHD-related genes.18
We analyzed fluorescence intensity data for each locus using cnvHap so that SNP genotypes and CNVs were simultaneously identified and performed CNV association studies for a range of CHD-related measures. Validation of these CNV–phenotype associations was then sought in 2 additional cohorts.
NPHS-II is a prospective study of ≈3300 healthy men, aged 50 to 64 years at recruitment, sampled from 9 UK general practices between 1989 and 1994.19 Phenotype data were available for ≈3200 subjects who were free from disease at the time of recruitment. Information on lifestyle habits, height, weight, and blood pressure was recorded at baseline and on subsequent prospective follow-ups. Measures of at least 15 blood factors associated with CHD risk were obtained before the onset of clinical events, including plasma proteins, lipid fractions, and nonprotein metabolites. A DNA repository of samples from 3012 NPHS-II men was established at the time of recruitment, 2500 of whom were genotyped in this study on the basis of sample availability. By December 2005, after a median follow-up of 13.6 years, there had been 296 definite fatal or nonfatal CHD events (in this sample). Full details (of recruitment, measurements, follow-up, and definitions of incident disease) for the NPHS-II cohort are described elsewhere.15,19
Genotyping the NPHS-II Cohort Using a Customized Gene-Centric Chip
A customized Illumina genotyping array, which contained 722 SNPs chosen to comprehensively capture common genetic variation, including CNVs, in >76 genes (including 5 kb upstream and downstream flanking regions) known to be associated with CHD-associated processes, such as lipid metabolism, coagulation, inflammation, and oxidative stress, was assembled.18 SNPs were submitted to Illumina Inc for scoring and choosing GoldenGate SNPs where possible, and each SNP passed the assay design step. The Illumina GoldenGate assay was performed according to manufacturer’s instructions.14
Detection of CNVs in the NPHS-II Cohort
Full details of cnvHap, the copy number prediction algorithm used in this study, are described elsewhere.17 CNVs were detected using cnvHap on data from the customized Illumina bundle described. Loci with a combined CNV frequency <1% in the NPHS-II sample were excluded from the CNV association studies.
CNV–Phenotype Association Studies in NPHS-II
The associations between CHD-related phenotypes and CNVs detected in NPHS-II were analyzed using linear regression models. The phenotypes studied were as follows: plasma concentrations of triglyceride; total cholesterol; high-density lipoprotein cholesterol (HDL-C); low-density lipoprotein cholesterol; folate; homocysteine; C-reactive protein; apolipoprotein AI (apoAI); apolipoprotein B (apoB); fibrinogen; factor VII; and lipoprotein-associated phospholipase A2, as well as body mass index; systolic blood pressure; and diastolic blood pressure. Cutoffs for discovery P values were set for each trait examined in NPHS-II to ensure that only those associations significant at 0.05 after correction for the false discovery rate, using the method of Benjamini and Hochberg,20 were chosen. Nucleotide positions in NPHS-II, where common CNVs (>1% frequency in NPHS-II) showed significant association with CHD phenotypes, are termed candidate loci throughout. CNV breakpoints were defined at the individual level and where consecutive probes showed the same copy number aberration. Annotations for probe positions and CNV breakpoints were uploaded to the DGV browser showing the genomic region corresponding to a given gene boundary.
For traits that were normally distributed or where a normal distribution could be achieved after transformation, a general linear model was used. Normality was assessed by visual inspection and was tested using the method described by D’Agostino, Belanger, and D’Agostino Jr,21 as adjusted by Royston22 using the sktest command in STATA9.20 Where data were not normally distributed, nonparametric tests were used. P values were adjusted for multiple testing using a measure of false discovery rate in STATA9.20
To confirm that cnvHap was successful in identifying true CNVs in our data, the same CNVs were sought in 6 additional data sets as described in the online-only Data Supplement. Phenotype associations for candidate-validated CNVs were confirmed against data from the literature and the North Finnish Birth Cohort 1966 (NFBC1966) and Whitehall II (WHII) cohorts. Successful replication was defined as the demonstration of the same phenotype association, showing an effect in the same direction, in either the NFBC1966 or WHII cohorts at the probe positions closest to the candidate CNV, up to a maximum distance of 10 kb. A P value threshold of P<0.05 was used in validation studies. Phenotype data for the NFBC1966 cohort were gathered at age 31 years of age, whereas the participants in the WHII cohort were taken between the ages of 35 and 55 years.
Data were available from NFBC1966 to analyze for CNV–phenotype associations for the following phenotypes: apoAI; apoB; apoB/apoA ratio; body mass index; HDL-C; low-density lipoprotein cholesterol; triglyceride; and total cholesterol. Association analysis in NFBC between cnvHap-predicted copy number and quantitative phenotypes was stratified by sex and assessed using linear regression, with per-sample LogR Ratio variance as a covariate. WHII data were also subject to CNV association studies for the same range of CHD-related phenotypes as examined in NPHS-II. Genomic regions corresponding to the boundaries of genes containing candidate loci, glutathione S-transferase µ-1 (GSTM1) (Chr1:110031941…110037890), interleukin 6 receptor (IL-6R) (Chr1:152644293…152706812), interleukin 6 signal transducer (IL-6ST) (Chr5:55266682…55326578), toll-like receptor (TLR-4) (Chr9:119506281…119519587), and sterol regulatory element-binding protein 1 (SREBP1) (Chr17:17655388…17681050) were examined to determine whether the same or similar CHD-related associations had been observed in CNV association studies for the validation cohorts. An arbitrary P value for CNV association of <0.05 was used as a cutoff in all replication studies.
All studies were performed in accordance with the declaration of Helsinki,23 and patients gave written consent. Ethical approval was obtained for all studies from the local ethics committee.
Detection of CNVs by cnvHap
Of 722 SNPs genotyped in NPHS-II, 248 (34.3%) exhibited CNVs within 33 CHD-related genes when analyzed. Chromosomal positions of the CNVs detected in NPHS-II are shown in online-only Data Supplement Figure I. Two hundred fifteen of these CNVs (86.7%) were rare (combined CNV frequency <0.01) in NPHS-II and were excluded from phenotype association studies.
Phenotypic Association Studies for CNVs Detected by cnvHap in NPHS-II
Baseline characteristics of the NPHS-II sample genotyped in this study are presented in the online-only Data Supplement Table I. Six loci where CNVs were detected showed significant associations (using P values adjusted for FDR as detailed in online-only Data Supplement Results) with intermediate CHD traits (Table). CNVs were significantly associated with mean plasma apoAI concentrations within the following genes: 2 positions within GSTM1 (Chr1:110034988 [P=0.0001, effect size for increases in copy number β=−0.0503] and Chr1:110036093 [P=0.0015, β=−0.0345]); Chr1:152693594 within IL-6R (P=0.0009, β=−0.1290); Chr9:119512585 within TLR-4 (P=0.0012, β=−0.0309); and Chr17:17679888 within SREBP1 (P=0.0011, β=−0.0244). CNVs at Chr17:17660045 in SREBP1 were associated with HDL-C (P= 0.0003, β=−0.0664). Finally, CNVs at Chr5:55326203 within IL-6ST were significantly associated with plasma concentrations of apoB (P= 0.0002, β=−0.1780). A description of the breakpoints of these CNVs detected in NPHS-II and the validation cohorts is presented in online-only Data Supplement Table VIII.
Replication of the NPHS-II-Significant CNV Associations
Significant CNV–phenotype associations at candidate loci were compared, where possible, with results from similar association studies in the NFBC1966 and WHII cohorts (summarized as Manhattan plots in Figure and online-only Data Supplement Figure III) and in the literature. Positions showing the same CNV–phenotype associations as candidate loci are summarized in the Table.
Deletions in GSTM1 were associated with lower concentration of apoAI in NPHS-II. This CNV association was not validated in either NFBC1966 or WHII.
Although the association of CNVs in IL-6R (Chr1:152693594) with apoAI is supported in WHII by Chr1:152662463 in the same gene which also shows CNV association with apoAI (P=0.047, β=−0.627), this probe is >31 kb from the candidate CNV identified in NPHS-II and was therefore not considered as evidence of replication. Furthermore, WHII probes that are closer to the candidate CNV do not replicate the association.
Association of IL-6ST CNVs (Chr5:55326203) with apoB is replicated in NFBC1966 at position Chr5:55279586 (P=0.011, β=−0.070).
CNVs in TLR-4 (Chr9:152693594) were associated with apoAI, and this association is replicated in WHII at positions Chr9:119512032 (P=0.006, β=−0.034) and Chr9:119514542 (P=0.016, β=−0.219), with Chr9:119512032 forming part of a deletion CNV in WHII (online-only Data Supplement Table VIII).
The association of SREBP1 CNVs (Chr17:17660045) with apoAI is replicated at Chr17:17874485 (P=0.037, β=−0.094) in NFBC1966, and this position is part of a larger deletion CNV in the region (online-only Data Supplement Table VIII).
The putative association of Chr17:17660045 with HDL-C in NPHS-II was not confirmed at any CNV position in the replication cohorts.
Thus, of the 7 CNV associations originally detected in NPHS-II, associations of candidate CNVs within TRL4 and SREBP1 with apoAI and within IL-6ST with apoB showed evidence of replication, with effects in the same direction, in the validation cohort(s).
We detected CNVs at more than one third of the loci examined in NPHS-II, which emphasizes the ubiquity of CNVs within the human genome. Most CNVs had a low frequency in the discovery cohort, although their presence may still be a confounding factor in traditional SNP association studies at loci where the extent of CNVs is unknown. It is important that this study shows that the examination of CNV phenotype variations uncovers associations not observed using more traditional SNP association studies. None of the candidate genes had shown significant association with CHD phenotypes when SNP association studies were performed in a previous study (online-only Data Supplement Table IX).18
Data from several independent sources confirmed GSTM1 as being CNV-rich, thus acting as a good positive control.24 In contrast, the other genes containing candidate CNVs were not CNV-rich in DGV, perhaps reflecting a lack of data for the gene regions containing candidate CNVs.
In a recent study, >75% of whites exhibited deletion of the entirety of GSTM1, and none had 2 copies of the gene.25 Null mutations of GSTM1 are associated with a diverse set of chronic conditions,26–30 and a reduction in the enzyme product is associated with increased oxidative stress and hypertension in both animal models31 and elderly human patients.32 In NPHS-II, we observed duplication CNVs in GSTM1 at Chr1:110034988 and Chr1:110036093, which were associated with lower plasma apoAI. Although validation cohorts were CNV-rich in this region, these associations could not be replicated. Homozygosity for the GSTM1 deletion has recently been shown to be associated with increased plasma triglyceride and decreased HDL-C concentrations,33 and although we observed borderline association with HDL-C, this was not replicated. Thus, our results with respect to GSTM1 may represent a false positive.
An SNP at the candidate locus in IL-6R, rs8192284, common in European subjects,34,35 is at the proteolytic cleavage site for IL-6R and has been reported to be associated with plasma C-reactive protein concentration in the assessment of diabetes mellitus risk and with plasma IL-6 and IL-6R concentrations.35–37 CNVs at this locus were significantly associated with plasma apoAI in NPHS-II, whereas other CNVs in IL-6R were also associated with apoAI in WHII. Neither in the discovery nor in the validation cohorts was an association with C-reactive protein observed.
Although the large-scale architecture could not be deduced for CNVs within IL-6ST in NPHS-II, the candidate locus in this gene overlaps a large CNV in NFBC1966 and neighbors a CNV in the 1958 British birth cohort (58C) (see online-only Data Supplement Results). The gene product of IL-6ST mediates the proinflammatory effects of IL-6 via the Jak-STAT pathway in the liver.38 Polymorphisms of IL-6ST significantly increase the risk of coronary artery disease and myocardial infarction in humans.39 CNVs at Chr5:55326203 were associated with apoB concentration in NPHS-II, and this association was replicated in the NFBC1966 cohort.
CNVs at the candidate locus in TLR-4 were associated with apoAI concentration in NPHS-II. TLR-4 has previously been associated with the incidence of diabetes mellitus and this may be modulated by an individual’s plasma HDL-C and triglyceride concentration.40,41 Examination of data from several independent sources confirms the region is CNV-rich, although 1 study has suggested no association between TLR-4 expression and plasma lipid concentrations.40 The association of CNVs with apoAI in NPHS-II was replicated at 2 positions in WHII, both of which formed part of a larger CNV.
The candidate positions in SREBP1 (Chr17:17679888 and 17670045) are known to be within an inversion on chromosome 17.42 Whereas the candidate CNV at Chr17:17679888 does not overlap CNVs seen in validation cohorts, the candidate locus at Chr17:17660045 is part of larger deletion and mixed CNVs in NPHS-II, which overlap deletions and duplications in 58C and NFBC1966. The SNP at Chr17:17679888, rs4925118, has been associated with the risk of developing type 2 diabetes mellitus,43 although this has not been replicated in all studies.44 CNVs at Chr17:17660045 and Chr17:17679888 are, respectively, associated with HDL-C and apoAI concentrations in the discovery cohort, which may be expected because SREBP1 is a transcription factor involved in maintaining the homeostasis of low-density lipoprotein cholesterol.45 The association of CNVs at Chr17:17660045 with HDL-C was not reproducible in validation cohorts, whereas the association of CNVs at Chr17:17679888 with apoAI was replicated; a single CNV position in NFBC1966 (Chr17:17674485) itself is part of a large deletion CNV in that cohort which overlaps the candidate locus Chr17:17660045.
This study has several limitations that must be taken into account. First, validation of CNV calls may have been restricted in some cases, as for SREBP1, by the relative paucity of loci investigated in the validation cohorts and also by genotyping difficulties encountered in individual cohorts. The fewer the loci examined in validation cohorts, the lower the probability that CNVs, which overlap loci in close proximity to candidate CNVs, would be uncovered and prove useful for validation purposes. Furthermore, lack of probes in the study allows only conservative definitions of CNV breakpoints.
Second, the quality of the validation data should also be considered. For example, when considering data from the French Caucasian male cohort (FCC-2448) (see online-only Data Supplement Results), it should be borne in mind that the sample size is small and geographically specific.
Last, it is important that we examine only the associations with intermediate phenotypes and not end point measures such as the early onset of MI. Thus, although CNV may contribute to the missing variability predicted from GWAS, their effect on missing heritability remains unclear.46
Our study significantly adds to previous research. A recent GWAS examined whether the unexplained heritability of early-onset myocardial infarct may be a consequence of CNV.47 In that study, neither common (>1%) nor rare (<1%) CNVs, detected using the Canary algorithm,16 showed statistically significant frequency differences between cases or controls after correction for multiple sampling. But coverage in GWAS is very poor compared with a gene-centric, well-tagged array, even if, as in this study, the number of genes covered is small. As such, our analysis serves as a pilot study for the identification of CNVs using gene-centric chips. Here, we describe a more targeted approach. The gene-centric loci chosen for inclusion on the Illumina bundle have proven excellent candidates for this study, identifying novel associations with CHD phenotypes for candidate CNVs in NPHS-II. New associations are described between apoAI and CNVs in IL-6R, TLR-4, and SREBP1 and between apoB and IL-6ST, and similar CNV associations are demonstrable within the respective genes in validation cohorts. Conversely, the candidate CNV association with HDL-C in SREBP1 was not observed in validation cohorts. In general, this study demonstrates that exomic regions may exhibit evidence of CNVs that show association with complex disease phenotypes. Furthermore, the study demonstrates that despite difficulties in the validation of CNV detection and replication of CNV trait associations, the use of population cohorts is ideal for the identification of common CNV trait associations. Such associations are less susceptible to biases such as plate effects that may hinder similar analyses encountered when using data from case-control studies.
The following general practices collaborated in the NPHS-II: The Surgery, Aston Clinton; Upper Gordon Road, Camberley; The Health Centre, Carnoustie; Whittington Moor Surgery, Chesterfield; The Market Place Surgery, Halesworth; The Health Centre, Harefield; Potterells Medical Centre, North Mymms; Rosemary Medical Centre, Parkstone, Poole; and The Health Centre, St. Andrews. In relation to NFBC1966, we thank Outi Törnwall and Minttu Sauramo (DNA biobanking). This study makes use of data generated by the Wellcome Trust Case-Control Consortium (Nature. 2007; 447; 661–78). A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk.
Sources of Funding
Drs Hingorani, Humphries, and P.J. Talmud were funded by the British Heart Foundation under RG08/014, and Dr Kumari was funded by the British Heart Foundation under PG07/133/24260. J.S.E-S. Moustafa was funded by an Imperial College Division of Medicine PhD studentship. Drs Coin and Li received funding from European Community’s Seventh Framework Programme (grant No. 223367, MultiMod). Drs Kivimaki and Kumari were partially supported by the National Heart, Lung and Blood Institute (NHLBI: HL36310). The WHII study was supported by grants from the Medical Research Council; British Heart Foundation; Health and Safety Executive; Department of Health; National Heart, Lung and Blood Institute (NHLBI: HL36310), and National Institute on Aging (AG13196), US, NIH; Agency for Health Care Policy Research (HS06516); and the John D and Catherine T MacArthur Foundation Research Networks on Successful Midlife Development and Socio-economic Status and Health. NFBC1966 received financial support from the Academy of Finland (project grants 104781, 120315, 129269, 1114194, 139900/24300796, Center of Excellence in Complex Disease Genetics and SALVE), University Hospital Oulu, Biocenter, University of Oulu, Finland (75617), the European Commission (EURO-BLCS, Framework 5 award QLG1-CT-2000-01643), NHLBI grant 5R01HL087679-02 through the STAMPEED program (1RL1MH083268-01), NIH/NIMH (5R01MH63706:02), ENGAGE project and grant agreement HEALTH-F4-2007–201413, the Medical Research Council, UK (G0500539, G0600705, G0600331, PrevMetSyn/SALVE, PS0476), and the Wellcome Trust (project grant GR069224, WT089549), UK. Replication genotyping was supported, in part, by MRC grant G0601261, Wellcome Trust grants 085301, 090532, and 083270, and Diabetes UK grants RD08/0003704 and BDA 08/0003775. The DNA extractions, sample quality controls, biobank upkeeping, and aliquoting were performed in the National Public Health Institute, Biomedicum Helsinki, Finland, and were supported financially by the Academy of Finland and Biocentrum Helsinki. Genotyping of the FCC groups had previously been funded by Genome Canada and Genome Quebec.
† These authors are joint last authors on this work.
The online-only Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.111.961037/-/DC1.
- Received August 16, 2012.
- Accepted August 13, 2012.
- © 2012 American Heart Association, Inc.
- de Smith AJ,
- Tsalenko A,
- Sampas N,
- Scheffer A,
- Yamada NA,
- Tsang P,
- et al
- Sebat J,
- Lakshmi B,
- Troge J,
- Alexander J,
- Young J,
- Lundin P,
- et al
- Cambien F,
- Tiret L
- Colella S,
- Yau C,
- Taylor JM,
- Mirza G,
- Butler H,
- Clouston P,
- et al
- 14.↵Illumina Inc. BeadStudio Genotyping Module v3.2 User guide. (11284301). 2007.
- Wang K,
- Li M,
- Hadley D,
- Liu R,
- Glessner J,
- Grant SF,
- et al
- Drenos F,
- Talmud PJ,
- Casas JP,
- Smeeth L,
- Palmen J,
- Humphries SE,
- et al
- Rickham PP
- Rose-Zerilli MJ,
- Barton SJ,
- Henderson AJ,
- Shaheen SO,
- Holloway JW
- Huang RS,
- Chen P,
- Wisel S,
- Duan S,
- Zhang W,
- Cook EH,
- et al
- McBride MW,
- Brosnan MJ,
- Mathers J,
- McLellan LI,
- Miller WH,
- Graham D,
- et al
- Qi L,
- Rifai N,
- Hu FB
- Qi L,
- Rifai N,
- Hu FB
- Heinrich PC,
- Behrmann I,
- Müller-Newen G,
- Schaper F,
- Graeve L
- Luchtefeld M,
- Schunkert H,
- Stoll M,
- Selle T,
- Lorier R,
- Grote K,
- et al
- Grarup N,
- Stender-Petersen KL,
- Andersson EA,
- Jørgensen T,
- Borch-Johnsen K,
- Sandbaek A,
- et al
- Kathiresan S,
- Voight BF,
- Purcell S,
- Musunuru K,
- Ardissino D,
- Mannucci PM,
- et al
Copy number variants (CNVs) are a common form of genomic variation discovered in the past decade. CNVs are stretches of genomic DNA that have different numbers of copies among individuals and usually result from duplications or deletions of genomic regions. Although they may contain coding regions, most CNVs are thought to modulate their effects through noncoding regions, and they have previously been implicated in cardiovascular and other disease processes. Here, we describe a process of CNV identification using the cnvHap algorithm. Three of the novel CNVs identified were shown to significantly associate with CVD-associated phenotypes, and these associations were reproducible in other study cohorts examined. Although their role in disease is, as yet, not fully appreciated, CNVs, such as those described, may make certain individuals more prone to complex pathologies such as CVD. More generally, detection of CNVs, using methods such as that described in this study, may become an important part of genetic association studies.