Pooled DNA Resequencing of 68 Myocardial Infarction Candidate Genes in French CanadiansClinical Perspective
Background—Familial history is a strong risk factor for coronary artery disease (CAD), especially for early-onset myocardial infarction (MI). Several genes and chromosomal regions have been implicated in the genetic cause of coronary artery disease/MI, mostly through the discovery of familial mutations implicated in hyper-/hypocholesterolemia by linkage studies and single nucleotide polymorphisms by genome-wide association studies. Except for a few examples (eg, PCSK9), the role of low-frequency genetic variation (minor allele frequency [MAF]) ≈0.1%−5% on MI/coronary artery disease predisposition has not been extensively investigated.
Methods and Results—We selected 68 candidate genes and sequenced their exons (394 kb) in 500 early-onset MI cases and 500 matched controls, all of French-Canadian ancestry, using solution-based capture in pools of nonindexed DNA samples. In these regions, we identified 1852 single nucleotide variants (695 novel) and captured 85% of the variants with MAF≥1% found by the 1000 Genomes Project in Europe-ancestry individuals. Using gene-based association testing, we prioritized for follow-up 29 low-frequency variants in 8 genes and attempted to genotype them for replication in 1594 MI cases and 2988 controls from 2 French-Canadian panels. Our pilot association analysis of low-frequency variants in 68 candidate genes did not identify genes with large effect on MI risk in French Canadians.
Conclusions—We have optimized a strategy, applicable to all complex diseases and traits, to discover efficiently and cost-effectively DNA sequence variants in large populations. Resequencing endeavors to find low-frequency variants implicated in common human diseases are likely to require very large sample size.
- myocardial infarction
- polymorphism myocardial infarction
- single nucleotide
- coronary artery disease
Coronary artery disease (CAD) and its main clinical manifestation, myocardial infarction (MI), is the leading cause of death and disability worldwide.1 The main risk factors for MI include old age, male sex, tobacco use, dyslipidemia, obesity, diabetes mellitus, arterial hypertension, and chronic stress.2 Epidemiological studies in twins and large pedigrees have also established that a positive family history of MI is a strong predictor, indicating that there is an important genetic component to MI pathogenesis.3,4
Clinical Perspective on p 554
Recent large meta-analyses of genome-wide association study (GWAS) results have identified over 25 genomic regions that carry common single nucleotide polymorphisms (SNPs) associated with MI risk.5,6 For many of these MI loci, the gene(s) and causal DNA sequence variant(s) are unknown. Identification of low-frequency and penetrant nonsynonymous variants through exon resequencing of genes within these loci can help address these 2 questions. This approach is supported by recent successes for type 1 diabetes mellitus,7 fetal hemoglobin,8 age-related macular degeneration,9 and Crohn disease.10 In this study, we sequenced exons from 68 MI candidate genes (394 kb) in 500 early-onset MI cases and 500 matched controls selected from a French−Canadian biobank using pooled sequencing of nonindexed DNA after solution-based target capture. We identified 1852 high-quality single nucleotide variants (SNVs) with low false-positive and negative rates. From the sequence data, we performed gene-based association testing and attempted to replicate the top findings in 2 independent French−Canadian replication panels totaling 1594 MI cases and 2988 controls.
Participants in this study were recruited from the Montreal Heart Institute (MHI) Biobank and the Pharmacogenomics of the Toxicity of Lipid-Lowering Agents Study (thereafter referred to as the MHI Statins Study). All participants had 4 French−Canadian grandparents. Participant characteristics are summarized in Table 1. All cases had documented history of MI. For controls, we excluded patients with MI, percutaneous coronary intervention (PCI), coronary artery bypass graft (CABG) surgery, transient ischemic attack (TIA) or stroke, peripheral vascular disease, congestive heart failure (CHF), and angina. For resequencing, we selected a subgroup totaling 500 early-onset MI cases (<50 years of age for men and <60 years of age for women) because MI events that occur in younger patients are associated with substantially greater heritability11 and 500 controls matched as best as possible for hypertension, diabetes mellitus and lipid-lowering drugs usage. The controls were selected to be older than the cases (age at baseline between 50 and 70 years of age in men and between 60 and 70 years of age in women). All participants gave written informed consent and the MHI ethics committee approved the project.
DNA Sample Preparation, Gene Selection and Targeted Sequence Enrichment
As part of this targeted DNA resequencing project, we sequenced exons of 68 candidate genes (394 kb) for MI in 500 early-onset MI cases and 500 matched controls. These genes were selected because: (1) there is an OMIM entry (http://www.ncbi.nlm.nih.gov/omim) linking them to MI pathogenesis (often through lipid metabolism), (2) they were identified in previous linkage or candidate gene association studies, or (3) are located near (within recombination hotspots) SNPs associated with MI by GWAS (as of January 2010; online-only Data Supplement Table I).
To minimize costs, we selected an approach using pools of nonindexed (no barcodes) DNA, as previously described.10 We prepared 20 pools of 50 DNA samples (10 pools of 50 case DNA samples and 10 pools of 50 control DNA samples) at equimolar concentration. Briefly, genomic DNA samples extracted from blood were quantified using picogreen protocols and concentrations were adjusted to 10 ng/μL. Fifty DNA samples were mixed (200 ng of each DNA sample) to achieve a final pooled DNA concentration of 10 ng/μL in 1000 μL; we verified the final concentration of the pools by picogreen.
For target enrichment, we designed a custom-made Agilent SureSelect library for capture in solution (http://www.genomics.agilent.com); see online-only Data Supplement Table II for the coordinates of the targeted exons. DNA pools were sonicated using a Covaris S2 instrument using default settings to obtain 200 base pairs (bp) fragments size and the sheared DNA was quantified on a Bioanalyzer (Agilent). The Illumina Paired-End DNA Sample prep protocol (including end-repaired, A-tailed, and sequencing adapaters ligation) was used for preparing 3.5 μg of the sheared DNA for each pool, following the manufacturer’s protocol. Following Agilent’s protocol 500 ng of each library was hybridized with our SureSelect library. Captured DNA was eluted in 15 μL of EB buffer and PCR-amplified for 12 cycles.
Next-Generation DNA Resequencing and Data Analysis
We sequenced each library (pool of 50 DNA samples) on 1 lane of an Illumina GAIIx instrument using a 2×76 bp protocol following the manufacturer’s recommendations. Analysis of our data is based on a bioinformatic pipeline built around software developed for the 1000 Genomes Project. Briefly, fastq files generated by the sequencers were aligned to the reference human genome (hg18; NCBI Build 36) using BWA.12 SAMtools13 was used to convert SAM files to the BAM format, and we used PICARD (http://picard.sourceforge.net) to remove PCR or optical duplicates. We then used the GATK suite14,15 to recalibrate base quality scores and perform local realignment around indels. We used the Syzygy software with defaults parameters to call variants and estimate allele frequency.10 Summary statistics relevant to sequence data analysis can be found in online-only Data Supplement Tables III and IV. In this article, we only consider SNVs (we excluded insertions−deletions [indels]) with the highest Syzygy quality score; online-only Data Supplement Table V includes the complete list of the 1852 SNVs identified in this project.
Genotyping for Validation and Replication
We genotyped 45 low-frequency DNA sequence variants (MAF<1%) in the same 1000 DNA samples than those used for sequencing to validate the estimates of allele frequencies from the resequencing results. For independent replication of the MI results, we used a 2-stage design where we attempted to genotype 29 variants in 2474 DNA from the MHI Biobank-replication panel (870 MI cases and 1604 controls; not overlapping with the sequenced DNA samples) and in 2108 DNA from the MHI Statins Study (724 MI cases and 1384 controls; not overlapping with the MHI Biobank samples). For genotyping, we used the Sequenom iPLEX technology. Using the PLINK software,16 we removed from the analysis individuals with a genotyping success rate <90% and markers with a genotyping success rate <95% or a Hardy−Weinberg P<0.001. The genotype concordance rate estimated from DNA triplicates was >99.9%.
Gene-based association statistics, based on allele frequency estimates generated by the Syzygy software, were calculated using the C-alpha statistic as implemented in the Syzygy C-alpha module.10,17 Many of the existing methods are used to identify gene-based associations between low-frequency variants and phenotypes collapse low-frequency variants together and test whether there is a burden of these variants in cases versus controls.18 These tests imply that all genetic variation in a gene act in the same direction on phenotypes. In contrast, the C-alpha statistic considers that there might be risk, protective and neutral low-frequency variants in a given gene that may act together on phenotypes. Instead of focusing on the overall average effect of all low-frequency variants on phenotypes (this is what burden tests do), it considers the distribution (variance) of each low-frequency variant between cases and controls. The C-alpha test quantifies departure from the expected binomial variance across all variants analyzed.17 For example, a gene has 2 low-frequency variants with nonreference allele counts 0:10 and 10:0 in cases:controls. Whereas burden tests would not detect a signal (count of 10 nonreference alleles in both cases and controls), the C-alpha test would identify a gene-based association signal owing to the presence of strong protective and risk variants. One caveat of the C-alpha statistic is that it cannot adjust for covariates. Our analysis of the genotype data to perform gene-based analysis relied on the sequence kernel association test (SKAT) software, which is a generalized C-alpha test that can take into account covariates.19 For each analysis with SKAT, we used the default parameters and corrected for covariates that were correlated with MI status: MHI Biobank-sequencing (diabetes mellitus, hypertension, hypercholesterolemia), MHI Biobank-replication (age, sex, diabetes mellitus, hypertension, hypercholesterolemia), and MHI Statins Study (age, sex, diabetes mellitus, hypertension, recruitment site). All P values reported in this study are uncorrected for the number of hypotheses tested; the significance threshold is set at α=7 × 10−4 (Bonferroni correction for 68 genes tested).
Using Syzygy, we estimated that our power to discover singletons with the pool approach in 1000 samples is on average >60% for 80% of all the targeted exonic sequences, although we note pool-to-pool variation (online-only Data Supplement Figure I).10 We estimated the statistical power of our study design to find gene-based association with MI using the power calculator and reference haplotypes (calibrated based on a coalescent model) provided in the SKAT package for dichotomous traits.19 Under the following assumptions (gene size: 6 kb, MI prevalence: 2%, proportion of cases: 50%, proportion of causal variants: 10%, MAF cutoff: 5%, proportion of causal variants with effect in the opposite direction: 20%) and averaging across 500 simulations, we calculate that we have 35% power to find a gene with a SKAT P<0.05 in the resequencing set (N=1000) and 48% power for the same gene to achieve a SKAT P<7×10−4 (Bonferroni correction for 68 genes tested) in the replication cohorts (N=1594 cases and 2988 controls) (online-only Data Supplement Table VI).
High Sequence Coverage of Targeted Exons
For this project, we targeted 394 kb of exonic sequences from 68 genes selected because they had been linked to MI through linkage, candidate-gene, or GWAS (online-only Data Supplement Tables I and II). For the DNA resequencing phase, we selected 500 early-onset MI cases and 500 matched controls from the MHI Biobank (MHI Biobank-sequencing in Table 1). Our resequencing strategy was based on a protocol recently described to follow-up GWAS results for Crohn disease,10 with the major modification that we used solution-based capture as opposed to PCR for exon enrichment (Materials and Methods). The major impact of this change was to reduce from weeks to days the time necessary to prepare sequencing libraries, thus reducing costs, without affecting the overall quality of the sequence data (see below). A similar approach was recently described.20
Each of the 20 pools of 50 nonindexed DNA samples was sequenced on an Illumina GAIIx lane. After removing sequence duplicates, we generated ≈539 millions paired-end reads, or >76 billions raw bases of sequences. After applying quality-control filters that removed poorly mapped (10%), badly paired (2%), or off-target (60%) reads, we obtained ≈151 millions paired-end reads of high quality. Although the off-target rate is high, this result is not unexpected given the size of the targeted genomic region (394 kb) in comparison with the rest of the human genome (3 gigabases), as previously reported.21 Nevertheless, because of the throughput of the sequencer, we could achieve our target mean coverage per sample (≥30X, that is 15X per chromosome or 1500X per DNA pool of 50 DNA samples) for all but 6 genes (APOA1, APOE, CDKN2BAS, CYP20A1, SLC5A3, TXNDC6) (Figure 1 and online-only Data Supplement Table III). While inspecting these results, we noticed a difference in mean coverage per sample between cases and controls (54X versus 61X, t-test P=0.01; Figure 1). For 3 pools of cases, we had to perform 11 cycles of PCR amplification (instead of 8 for the other pools) to have enough DNA materials for exon capture. This resulted in a higher number of read duplicates for these 3 pools of cases (55% versus 42% for the other 17 pools, Wilcoxon’s rank sum test P=0.02), and therefore overall a lower coverage and reduced discovery power for cases in comparison with controls (online-only Data Supplement Figure I). This systematic difference, which is hard to explain as DNA samples from cases and controls were prepared in the same laboratory using the same protocols, might affect our estimates of allele frequencies in cases and controls from the sequence reads (see below).
Pool Sequencing of Nonindexed DNA Samples is Efficient to Identify Single Nucleotide Variants
We used the Syzygy software to detect high-quality SNVs in the pooled sequence data.10 To minimize the number of false-positive findings, we focused exclusively on sequence variants with the highest Syzygy score and we excluded all indel calls. In total, Syzygy found 1852 SNVs (including 6 nonsense, 363 missense, and 1 splice site variants), including 695 SNVs not present in dbSNP 135 (38%) (online-only Data Supplement Tables IV and V). As expected, the mean MAF of the novel SNVs identified is significantly less than the mean MAF for the SNVs already present in public databases (0.2% versus 7.9%, t-test P<2.2×10−16). The nonsynonymous-to-synonymous and the transition-to-transversion ratios are, respectively, 1.65 (369/223) and 2.43 (1312/540); these numbers are consistent with the analysis of the 1000 Genomes Project (Pilot 3) and the recent Crohn disease exon resequencing experiment (online-only Data Supplement Table IV).10,22 All these metrics suggest that the list of variants is of high quality.
We were also interested in characterizing our false-negative rate, that is the number of true SNVs missed by our approach. As an imperfect proxy to the overall genetic variation present in the targeted exons of our French−Canadian population, we used the latest variant calls (October 2011) from European populations sequenced by the 1000 Genomes Project.23 In the European populations, the 1000 Genomes Project identified 1852 SNVs located within the 394 kb of sequence targeted by our exon resequencing experiment, and 977 of those were also present in our dataset (53%). Because rare markers (eg, singletons) are less likely to be observed in other populations (or in other individuals), we also performed the same analysis but limiting our survey to SNVs with a MAF≥1%. Across the targeted exons, the 1000 Genomes Project found in the European populations 878 SNVs with a MAF≥1%, and 749 of these variants are in our list of 1852 high-quality SNVs (85%). Therefore, the resequencing strategy using pools of nonindexed DNA combined with solution-based capture is efficient (and cost-effective) to discover SNVs down to a frequency of 1%. For rarer variants, the true−false negative rate of our protocol remains to be determined.
Estimation of Allele Frequencies and Gene-Based Association Testing from Sequence Data
The Syzygy software can accurately estimate allele frequencies from pooled sequence data when using PCR as enrichment method.10 Because we modified the original protocol by using instead a solution-based strategy to capture exons, we reevaluated Syzygy’s performance in estimating allele frequencies. We genotyped 45 randomly selected low-frequency (MAF <1%) SNVs in the same 1000 individuals and observed a strong correlation in the estimates of allele frequencies (r2=0.90) (Figure 2). However, this correlation is not as strong as previously reported,10 probably because of 2 main reasons. First, we used the hybridization with RNA baits to capture DNA fragments and this method might be more sensitive than PCR to allele bias. Second, by focusing on low-frequency DNA markers in the validation, even small differences in the number of estimated nonreference alleles will have a strong impact on the correlation; common markers are less sensitive to small differences in allele frequency estimates. For instance, estimating the frequency of a rare variant as a tripleton when the true frequency is a singleton has a more dramatic impact on the correlation of frequency estimates than estimating the frequency of a common marker to 0.5 when its true frequency is 0.51. Because we only validated low-frequency SNVs, our analysis is more sensitive to small allele frequency differences between estimates.
The main aim of our study was to identify new genes involved in MI, or new disease-causing penetrant alleles in known MI genes. For that reason, our variant discovery effort also included a design to perform association testing on sequence data and replication by direct genotyping in large independent cohorts. Because very large GWAS have already tested the role of common genetic variation in CAD and MI,5,6 we focused our follow-up strategy on markers with a MAF<5% in our combined case−control resequencing panel. Also, to increase the likelihood to find causal MI alleles, we limited a priori our analysis to low-frequency nonsense, splice site and missense variants (Strategy 1) or to low-frequency nonsense, splice site, and missense variants predicted to be damaging (Strategy 2).24 Our approach was to: (1) perform gene-based association testing with the C-alpha statistic17 on the sequence data (500 early-onset MI cases and 500 matched controls) and select genes with P≤0.05, (2) validate the sequence results by genotyping the top genes in the same 1000 DNA samples, and (3) replicate the gene-based association results by genotyping in the DNA of 1594 and 2988 independent MI cases and controls, respectively (Table 1). Under specific assumptions, we estimated the statistical power of our study design: we have ≈48% power to find a gene at α=7×10−4 (Bonferroni correction for 68 genes tested, see online-only Data Supplement Table VI for details).
For each of the 2 strategies described above, we found 4 genes with a C-alpha P≤0.05 (Strategy 1: HHIPL2, TXNDC6, ATXN2, ERP29; Strategy 2: ICA1L, ESYT3, ABCA1, CXCL12); the signal in these 8 genes was caused by 29 SNVs (Table 2). Given our results when we considered the correlation of allele frequency estimates for low-frequency variants between sequence and genotype data (Figure 2), we attempted to genotype these 29 SNVs in the same 1000 DNA samples used for resequencing but 5 failed. To facilitate comparison between the sequence- and genotype-based association results, we reran the C-alpha analysis of the sequence data on the 24 SNVs that genotyped successfully (see P value* in Table 2). We used SKAT on genotypes at the 24 SNVs to validate association between MI and these 8 genes; SKAT is a generalized C-alpha test that offers the advantage to control for covariates.19 Not unexpectedly, 6 of the 8 genes did not have nominally significant SKAT P values (P≤0.05) in the validation experiment on genotyped markers, mostly due to small differences in allele frequency estimates and covariate adjustment (Table 2). Even for the 2 nominally significant genes (ESYT3, ABCA1), the SKAT P values were higher than those calculated with the C-alpha statistic on the sequence data, probably because the C-alpha test cannot accommodate covariates and does require permutations to derive accurate P values in the presence of few low-frequency variants, and permutations are not possible with pooled sequence data.17
Replicating Gene-Based Association Results in Independent Cohorts
To replicate our findings, we selected an additional 870 MI cases and 1604 controls from the MHI Biobank, and obtained DNA samples from 724 MI cases and 1384 controls from the MHI Statins Study (Table 1). As for the validation, 24 of the 29 prioritized SNVs genotyped well in the replication samples. We tested association at the gene level between these markers and MI status using SKAT while accounting for significant covariates (Materials and Methods). For the 6 genes that did not reach nominal significance in the validation experiment (HHIPL2, TXNDC6, ATXN2, ERP29, ICA1L, CXCL12), results were consistently nonsignificant in the replication panels (Table 3). Similarly, the 2 genes with nominal P values ≤0.05 in the validation phase (ESYT3, ABCA1) did not replicate in the replication panels (Table 3). However, for ESYT3 the association signal in the MHI Statins study was almost nominally significant (P=0.061), with the nonreference allele at rs6772467 more frequent in controls than in MI cases, consistent with results observed in the sequencing and validation experiments for this variant (Table 2). Since the gene signal at ESYT3 is caused by a single probably damaging missense variant (rs6772467, Gly416Arg), we combined evidence of association with MI for this marker across the 3 genotyped panels using the mega-analysis of rare variant (MARV) method.10 The MARV P value of association between ESYT3 rs6772467 and MI is P=0.0089, warranting further replication attempts to test this association in additional large cohorts.
The focus of this study was to identify new genes and causal DNA sequence variants implicated in the pathogenesis of MI by searching for low-frequency functional genetic variation in the exons of 68 candidate genes. Although linkage studies and targeted resequencing experiments have identified many rare familial mutations associated with MI, and more recently GWAS highlighted many MI-associated common SNPs, few studies have searched specifically the low-frequency allele spectrum (MAF, 0.1%−5%) for MI-associated alleles. One exception is the PCSK9 gene, which is known to harbor familial, low-frequency and common DNA sequence variants associated with low-density lipoprotein cholesterol (LDL-C) levels and risk of coronary heart disease.25−29 We resequenced PCSK9 in our experiment to a mean coverage (per sample) of 50X (Figure 1 and online-only Data Supplement Table III), and identified 5 missense SNVs (online-only Data Supplement Table V). The C-alpha result calculated from the sequence data for these 5 PCSK9 missense variants was not significant (P=0.74), so markers in PCSK9 were not genotyped as part of our replication effort. In particular, we identified the PCSK9 R46L (rs11591147) variant in our sequencing experiment, which has been shown to decrease LDL-C levels and CAD risk in population of European ancestry.27,30 However, in our samples of 500 early-onset MI cases and 500-matched controls, the allele frequency difference for the leucine 46 allele was not significant (2.0% in cases versus 1.8% in controls, P=0.87).
From the sequence data, we selected for validation and replication SNVs from 8 genes in a total of 2094 MI cases and 3488 controls; all participants had 4 French−Canadian grandparents. Many of the gene-based associations found in the sequence data could not be validated using direct genotype information in the same DNA samples (Table 2). Of the 2 genes with a nominal P≤0.05 in the validation experiment (ESYT3, ABCA1), none replicated in the 2 additional French−Canadian case−control panels (Table 3). However, results were modestly encouraging for ESYT3-rs6772467, where the P values were P=0.50 in the MHI Biobank-replication panel and P=0.061 in the MHI Statins Study (Table 3). ESYT3 encodes for the extended synaptotagmin-3 protein, which has no defined biological functions. ESYT3 was included in our resequencing project because it is located near a SNP in the MRAS gene that was found by GWAS to associate with CAD.31 Interestingly, a recent fine-mapping experiment of the CAD association signal at the 3q22-MRAS locus in Han Chinese suggested that SNPs within the ESYT3 gene are more strongly associated with CAD than the original MRAS SNP.32 This observation, combined with our results, emphasizes the need to attempt to replicate the association between MI and ESYT3-rs6772467 in additional large cohorts.
We optimized a protocol, which builds on methods and tools previously described,10,20 to cost-effectively resequence pools of nonindexed DNA samples following solution-based exon capture. We showed that our method is particularly efficient to discover genetic variation and has a low false-positive rate: of the 69 low-frequency SNVs that we genotyped in the validation set, only one was monomorphic. Despite the success of the discovery phase of our experiment, we met several challenges in using pooled sequencing for association testing: (1) uneven coverage between genes, or between cases and controls, is difficult to account for, (2) genotypes are not available, preventing several downstream analyses (eg, permutations), and (3) difficulty in accurately estimating allele frequencies for low-frequency variants. Other limitations of our study include our difficulty to directly genotype 5 of the variants found by sequencing using the Sequenom platform and the small number of early-onset MI cases in the replication panels (12% versus 100% in the sequencing set). Overall, our pooled, nonindexed DNA resequencing protocol is efficient to discover genetic variants across the allelic spectrum and might be particularly useful to characterize genetic variation at reasonable costs in populations that are not well covered by the HapMap or 1000 Genomes Projects. However, we recommend an initial genotyping-based validation step of interesting variants in the resequenced samples before attempting to find genotype−phenotype associations in replication samples. This step is important to avoid false-positive results due to errors in estimating allele frequencies from sequence data alone.
In conclusion, our targeted exon resequencing experiment in 1000 individuals followed by genotyping in 4582 samples did not identify robust associations with MI. Given our statistical power (Online-only Data Supplement Table VI), our design would have found genes/variants with strong effect sizes on MI, suggesting that such alleles are not present in French Canadians in the exons well covered by DNA resequencing. More generally, our results are consistent with the need for very large sample size to find associations between low-frequency DNA sequence variants and complex human diseases or traits.
We thank all participants of the Montreal Heart Institute Biobank and the Pharmacogenomics of the Toxicity of Lipid-Lowering Agents Study, as well as the staff from the Montreal Heart Institute Biobank and Pharmacogenomics Centre. We also thank Mark J. Daly for sharing unpublished protocols and methods.
Sources of Funding
This work was funded by the Center of Excellence in Personalized Medicine (CEPMED), the Canada Research Chair program, the “Fonds de recherche du Québec en Santé (FRQS),” and the Montreal Heart Institute Foundation (to GL). The Pharmacogenomics of the Toxicity of Lipid-Lowering Agents Study was funded by a grant from Genome Canada to Drs Phillips and Tardif.
The online-only Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.112.963165/-/DC1.
- Received March 2, 2012.
- Accepted July 16, 2012.
- © 2012 American Heart Association, Inc.
- Nejentsev S,
- Walker N,
- Riches D,
- Egholm M,
- Todd JA
- Rivas MA,
- Beaudoin M,
- Gardet A,
- Stevens C,
- Sharma Y,
- Zhang CK,
- et al
- Nora JJ,
- Lortscher RH,
- Spangler RD,
- Nora AH,
- Kimberling WJ
- Li H,
- Durbin R
- Li H,
- Handsaker B,
- Wysoker A,
- Fennell T,
- Ruan J,
- Homer N,
- et al
- McKenna A,
- Hanna M,
- Banks E,
- Sivachenko A,
- Cibulskis K,
- Kernytsky A,
- et al
- Erdmann J,
- Grosshennig A,
- Braund PS,
- König IR,
- Hengstenberg C,
- Hall AS,
- et al
Coronary artery disease (CAD) and its main clinical manifestation, myocardial infarction (MI), is the main cause of death and disability in the world. Family history of CAD/MI is a known risk factor, suggesting that this disease has a strong genetic component. Rare familial mutations in genes involved in lipid metabolism and common single nucleotide polymorphisms have been associated with CAD/MI risk, but these genetic variants do not account for all the genetic liability. The role of low-frequency genetic variants, which are not captured by genome-wide association studies (GWAS), in CAD/MI has not been extensively investigated. In our study, we optimized a method to sequence genes in large sample size and applied it to 68 MI candidate genes that were sequenced in 500 early-onset MI cases and 500 matched controls, all of French−Canadian ancestry. The most promising DNA sequence variants were then genotyped in >4500 French Canadians for replication. Our method is efficient and cost-effective to identify low-frequency genetic variants. However, we did not identify new MI-associated variants in these 68 candidate genes in French Canadians. We now have the tools to expand this approach across all the human genes in large populations. Current prevention therapies for CAD/MI consist in controlling lipid levels, heart rate, and blood pressure. The identification of new genes and biological pathways involved in the biology of atherosclerosis using DNA sequencing methods could guide the development of new drugs.