Association of Exome Sequences With Cardiovascular Traits Among Blacks in the Jackson Heart StudyClinical Perspective
Background—The correlation of null alleles with human phenotypes can provide insight into gene function in humans. In individuals of African ancestry, we set out to identify null and damaging missense variants, and test these variants for association with a range of cardiovascular phenotypes.
Methods and Results—We performed whole-exome sequencing in 3223 black individuals from the Jackson Heart Study and found a total of 729 666 variant sites with minor allele frequency <5%, including 17 263 null variants and 49 929 missense variants predicted to be damaging by in silico algorithms. We tested null and damaging missense variants within each gene for association with 36 cardiovascular traits. We found 3 associations that met our prespecified level of significance (α=1.1×10−7). Null and damaging missense variants in PCSK9 were associated with 36 mg/dL lower low-density lipoprotein cholesterol (P=3×10−21). Three individuals in their 50s with complete PCSK9 deficiency (each compound heterozygote for PCSK9 p.Y142X and p.C679X) were identified, with one having a coronary artery calcification score in the 83rd percentile despite a low-density lipoprotein cholesterol of 32 mg/dL. A damaging missense variant in HBQ1 (p.G52A) was associated with a 2 pg/cell lower mean corpuscular hemoglobin (P=9×10−13) and rare damaging missense variants in VPS13A with higher red blood cell distribution width (P=9.9×10–8).
Conclusions—A limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable in ≈3000 black individuals.
A compelling therapeutic target for lowering low-density lipoprotein cholesterol (LDL-C) emerged from human genetic studies—PCSK9 (the proprotein convertase subtilisin/kexin type 9 gene).1 Null alleles (also termed loss-of-function protein-coding sequence variants) in PCSK9 were identified in blacks2 and shown to associate with lower plasma LDL-C levels2–4 and reduced risk for coronary heart disease (≤88% reduction).5,6 On the basis of this human genetic evidence and corroborating functional studies, several pharmaceutical companies have established drug development programs targeting PCSK9,7 and 2 inhibitors have been approved for reducing LDL-C in individuals with heterozygous familial hypercholesterolemia and in individuals with clinical atherosclerotic cardiovascular disease.8,9 On the basis of the PCSK9 example, it has been suggested that low-frequency or rare mutations of large effect may be paradigmatic for therapeutic target discovery.10
Clinical Perspective on p 374
To address whether additional such examples can be readily identified, we sequenced the exomes of 3223 individuals from the JHS (Jackson Heart Study), a prospective cohort of blacks living in Jackson, Mississippi, and catalogued null and damaging missense mutations across 18 465 genes. Subsequently, we performed an association study of these variants with a range of quantitative and qualitative cardiovascular traits.
The JHS is a community-based, longitudinal, cohort study located in the Jackson, Mississippi metropolitan area designed to investigate the determinants of cardiovascular disease in blacks.11 JHS recruited 5301 blacks, aged between 35 and 84 years, between September 2000 and March 2008.11 The Institutional Review Board of the University of Mississippi Medical Center approved the study protocol, and all participants provided written informed consent.
Exome sequencing was performed at 3 sequencing centers (the Broad Institute [n=2317], University of Washington [n=481], and Baylor College of Medicine [n=475]) across 5 projects (The US National Heart, Lung, and Blood Institute’s Exome Sequencing Project, Myocardial Infarction Genetics Consortium Exome Sequencing Project, CHARGE-S, Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples, and Minority Health Genomics and Translational Research Bio-Repository Database; Table I in the Data Supplement). The sequencing reads (ie, fastq files) from exomes were aligned to the human genome reference (hg19) using Burrows-Wheeler Transform on a per-lane basis and bam files were obtained from the 3 sequencing centers. The Genome Analysis Toolkit v3.1 HaplotypeCaller algorithm was used for joint variant discovery and genotyping on both exomes and flanking 50 bp of intronic sequence (http://www.broadinstitute.org/gatk/guide/article?id=3893). Single-sample gVCFs were created using the Genome Analysis Toolkit HaplotypeCaller with the options -emitRefConfidence GVCF, –variant_index_type LINEAR, and –variant_index_parameter 128000. Then, batches of ≈200 gVCFs were merged into a single gVCF using the CombineGVCF command in Genome Analysis Toolkit. Finally, GenotypeGVCFs was run on the combined gVCFs to create the raw SNP and indel VCFs. Because a majority of individuals were sequenced at the Broad Institute, we limited analysis to the sequence intervals captured by the Broad’s exome-sequencing platform.
Variant Quality Control
Genome Analysis Toolkit Variant Quality Score Recalibration (VQSR) was used with the recommended resources to filter variants. The SNP VQSR model was trained using HapMap3.3 and 1KG Omni 2.5 SNP sites and a 99.5% sensitivity threshold was applied to filter variants, whereas the INDEL VQSR model was trained using the Mills 1000G gold standard and Axiom Exome Plus sites for insertions/deletions and a 99.0% sensitivity threshold was applied to filter INDEL sites. Variants were filtered to VQSR PASS and quality depth ≥2. (Table II in the Data Supplement). Individual genotypes were set to missing if depth <5.
Sample Quality Control
We performed quality control on the jointly called samples. Individuals were checked for total number of variants, observed number of singletons and doubletons, Ti/Tv ratio, Het/Hom ratio, missingness, contamination with VerifyBamID,12 and nonreference concordance with available genotype data from the Illumina HumanExome BeadChip v1.0. Individuals that were outliers (>±3*interquartile range) on at least one metric were excluded (Table I and Figure I in the Data Supplement). Population structure was assessed using the multidimensional scaling algorithm in the PLINK software13 and 10 principal components of ancestry were obtained (Figure II in the Data Supplement).
All variant sites were annotated with the Variant Effect Predictor algorithm (VEP; http://useast.ensembl.org/info/docs/tools/vep) and dbSFP14 (https://sites.google.com/site/jpopgen/dbNSFP). Analysis was limited to variants predicted to be null (nonsense, splice, frameshift) plus missense variation predicted to be damaging in at least 5 of the following 7 variation prediction tools15: LRT,16 Mutation Taster,17 PolyPhen218 (HumDiv), PolyPhen2 (HumVar), SIFT,19 MutationAssessor,20 and FATHMM.21
We analyzed 36 cardiovascular traits (Figure) available in the Jackson Heart Study Vanguard Center data package (https://www.jacksonheartstudy.org/jhsinfo/ForResearchers/VanguardCenters/tabid/171/Default.aspx). For participants who were taking antihypertensive medication, we added 10 mm Hg to observed systolic blood pressure values and 5 mm Hg to diastolic blood pressure values.22 We adjusted the total cholesterol values for individuals on lipid-lowering medication by replacing their total cholesterol values by total cholesterol divided by 0.8.23 No adjustment was made on high-density lipoprotein cholesterol or triglycerides. Only fasting lipid measures were used, and LDL-C was calculated using the Friedewald equation for those with triglycerides <400 mg/dL, using the lipid-adjusted total cholesterol for those on treatment.
Individuals with diabetes mellitus were excluded in analyses of fasting plasma glucose, fasting insulin, homeostatic model assessment-IR, homeostatic model assessment-B, and glycated hemoglobin. Individuals with QRS >120, atrial fibrillation, or coronary heart disease were excluded for analysis of QRS interval. Individuals with QRS ≥120, ECG heart rate <40, ECG heart rate >120, or with atrial fibrillation were excluded from the analysis of QT interval. Individuals with end-stage renal disease defined as estimated glomerular filtration rate <15 or reporting being on dialysis, hemoglobinopathy defined as being homozygous for rs334, or myelotoxic drug use were excluded from the blood cell trait analyses.
Nonnormality of the following raw traits was resolved by a natural log transform before analysis: triglycerides, leptin, high-sensitivity C-reactive protein, endothelin, renin, aldosterone, and adiponectin.
We performed gene-based analyses of 36 cardiovascular phenotypes. We limited analysis to null mutations plus missense variants predicted to be damaging by at least 5 of 7 in silico prediction algorithms (LRT, Mutation Taster, PolyPhen2 (HumDiv), PolyPhen2 (HumVar), SIFT, MutationAssessor, and FATHMM).15 We aggregated variants with minor allele frequency (MAF) <5% within each gene using 4 sets of variants: (1) null mutations only, (2) null mutations plus missense variants predicted to be damaging by 7 of 7 in silico prediction algorithms, (3) null mutations plus missense variants predicted to be damaging in at least 6 of 7 in silico prediction algorithms, and (4) null mutations plus missense variants predicted to be damaging in at least 5 of 7 in silico prediction algorithms. All associations were performed using the EPACTS (Efficient and Parallelizable Association Container Toolbox; http://genome.sph.umich.edu/wiki/EPACTS) software. EPACTS is a software pipeline to perform statistical tests of association using sequence data. It implements the EMMAX24 (Efficient Mixed Model Association eXpedited) model, a mixed model association approach that captures pedigree, cryptic relatedness, and population structure by using a covariance matrix estimated from genome-wide data. To apply the EMMAX model, we used the epacts-group command with the emmaxCMC test option to perform collapsing burden gene-based tests. The single command with the q.emmax test option in EPACTS was used to obtain the single variant results for each variant going into the gene-based test. We used an additive genetic model. A kinship matrix of all individuals was created with EPACTS and used in analyses. All analyses were adjusted for age, sex, and 4 principal components of ancestry. Analyses for QT interval and QRS additionally included adjustments for height and BMI.
We excluded results with ≤10 minor alleles contributing to the gene-based test to ensure robust association statistics. We set our significance threshold to 1.1×10–7 (0.05/[36 traits*≈12 500 genes after minor allele count exclusion]).
A Wilcoxon rank-sum test was performed to compare PCSK9 null compound heterozygous carriers to heterozygous carriers using the R software (version 3.1). Coronary artery calcification (CAC) percentiles were calculated with the MESA CAC Score Reference Values web tool (http://www.mesa-nhlbi.org/Calcium/input.aspx).25
We performed power calculations using the Genetic Power calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/) with the “QTL association for sib-ships and singletons” option.
After quality control, 3223 individuals from the Jackson Heart Study were available for analysis (Table 1; Table I in the Data Supplement). We observed 17 263 null variants with MAF <5% and 49 929 missense variants predicted to be damaging in at least 5 of 7 in silico prediction algorithms with MAF <5% (Table II in the Data Supplement). Of the 18 465 genes sequenced, 14 058 have a null or damaging missense variant with MAF <5%. On average, we observe 5 null or damaging missense variants per gene and an average of 7 null or damaging missense alleles per gene. Each individual carries, on average, a total of 153 null or damaging missense variants with MAF <5%.
We found 3 gene-based associations that met our prespecified significance threshold of 1.1×10–7 (Table 2; Tables III and IV in the Data Supplement). The most significant association was between LDL-C and PCSK9. Participants who carried null or damaging missense mutations in PCSK9 had 36 mg/dL lower LDL-C compared with noncarriers (P=2.9×10–21). Of note, we identified 3 individuals with complete PCSK9 deficiency (each compound heterozygote for PCSK9 p.Y142X and p.C679X; Table 3). These individuals had a lower median LDL-C (64.2 mg/dL) compared with individuals who carry only one null mutation (85.7 mg/dL; n=77; P=0.044; Figure III in the Data Supplement). The 3 PCSK9 null compound heterozygotes did not differ from heterozygotes in any other cardiometabolic trait tested except QT interval (Table V in the Data Supplement). Compound heterozygotes had a lower QT interval (mean=369; range=362–380) compared with individuals who carried only one null PCSK9 variant (mean=413; P=0.006 using a Wilcoxon rank-sum test). Individuals carrying one null PCSK9 variant had similar QT intervals compared with noncarriers (mean=413), suggesting a recessive effect. Two individuals carrying both PCSK9 p.Y142X and p.679X had a CAC greater than the 80th percentile for their age and sex. A 52-year-old man had a CAC of 24.9, which is in the 83rd percentile for age and sex, despite an LDL-C of 32 mg/dL (Table 3).
The second most significant gene association was between mean corpuscular hemoglobin and hemoglobin subunit theta 1 (HBQ1). Individuals carrying a damaging missense variant (p.G52A)26 in HBQ1 had lower mean corpuscular hemoglobin compared with noncarriers (P=8.4×10–13). One additional association passed our significance threshold. Rare damaging missense variants in Vacuolar Protein Sorting-Associated Protein 13A (VPS13A) were associated with an increase in red blood cell distribution width (P=7.1×10–8). Of the 9 variants that contributed to the association between VPS13A and red blood cell distribution width, 6 were singletons, 1 a doubleton, 1 with 4 carriers (p.S2673L), and 1 with 22 minor allele carriers (p.K2672N) (Table IV in the Data Supplement). VPS13A showed evidence for association with other hematologic phenotypes, including lower hemoglobin levels (P=7.0×10−04; Table VI in the Data Supplement).
Li et al27 recently reported 10 gene-based associations aggregating null variants with a P<4.4×10–6. Individuals of African Ancestry contributed to 7 of these associations. We attempted to replicate these 7 associations in our data (Table VII in the Data Supplement). We replicated the association of total cholesterol with PCSK9 (β=−39 mg/dL; P=6.6×10−12) and of triglycerides with apolipoprotein C-III (P=1.0×10−5).2,28–30 We found suggestive evidence for the association of fasting glucose with thioredoxin domain containing 5 (TXNDC5), consistent with the report by Li et al; carriers of null alleles in TXNDC5 had higher fasting glucose compared with noncarriers (P=0.07).
For 3223 individuals and a significance level of 1.1×10–7, we had 99% statistical power to detect a 1-SD unit effect with a 1% cumulative MAF, and 64% statistical power to detect a 1-SD unit effect with a 0.5% cumulative MAF. Analysis of Mendelian lipid genes as a positive control shows several genes where a burden of null/damaging mutations alters the expected plasma lipid fraction in the appropriate direction (eg, LDLR and higher LDL-C [P=4.7×10−5], CETP and higher high-density lipoprotein cholesterol[P=0.0001]; Table VIII in the Data Supplement). However, even an analysis of positive controls is limited by the number of carriers, with the majority of the Mendelian lipid genes having <10 observed null alleles.
We set out to discover null or damaging missense variants that lead to a large effect on any of a range of cardiovascular traits. In a study of 3223 blacks, we found 3 associations that met our prespecific significance threshold.
We report 1 new observation that of VPS13A associated with an increase in red blood cell distribution width. Red blood cell distribution width is a measure of the range of variation in red blood cells and higher values can indicate certain disorders such as anemia. Mutations in VPS13A have been reported to cause chorea-acanthocytosis, an autosomal-recessive neurodegenerative disorder that causes red blood cells to appear spiky.31 Ten VPS13A variants are reported in ClinVar with chorea-acanthocytosis listed as the condition. We did not find any of the reported ClinVar variants in our data nor any carriers of rare damaging recessive variants in VPS13A. Here, in a sample of individuals unselected for disease state, we report a milder phenotype resulting from heterozygous mutations in VPS13A. Similar to VPS13A, Mendelian lipid genes having a large effect on plasma lipid levels have been shown to harbor common variants with smaller effects on phenotype.32–34
We found 3 individuals who are compound heterozygous for null mutations in PCSK9. Previously, only 2 individuals with PCSK9 deficiency have been reported.35,36 Both of the previously reported individuals were young (21 and 31 years old) and had low circulating LDL-C (14–16 mg/dL). The 3 individuals we have identified here are older (50–52 years old) and have higher circulating LDL-C (32–72 mg/dL). One of the 3 individuals had a CAC score in the 83rd percentile despite a LDL-C of 32 mg/dL. CAC values over the 75th percentile are considered abnormal.
Some limitations deserve mention. The association between VPS13A and red blood cell distribution width needs to be confirmed in an independent study. Furthermore, sequencing will be required for replication; none of the variants driving the novel gene-based association were available on the widely-used exome genotyping array. The few results passing our prespecified significance level could be explained by statistical power given our sample size and the limited number of observed null alleles per gene. We also note that we have used a stringent significance threshold given the multiple testing burden inherent in our study design.
In conclusion, a limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable from the exome sequences of 3000 black individuals.
Sources of Funding
G.M.P. is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K01HL125751. S.K. is supported by a Research Scholar award from the Massachusetts General Hospital (MGH), the Howard Goodman Fellowship from MGH, the Donovan Family Foundation, R01HL107816, and a grant from Fondation Leducq. The Jackson Heart Study is supported by contracts HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, and HHSN268201300050C from the National Heart, Lung, and Blood Institute and the National Institute on Minority Health and Health Disparities. The MH-GRID Network (Investigators: Rakale C. Quarells, Gary H. Gibbons, Donna K. Arnett, Robert L. Davis, Suzanne M. Leal, Deborah A. Nickerson, James Perkins, Charles N. Rotimi, Joel H. Saltz, Herman A. Taylor, and James G. Wilson) was supported, in part, by a grant from the National Institute on Minority Health and Health Disparities (grant #1RC4MD005964).
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Guest Editor for this article was Päivi Pajukanta, MD, PhD.
The Data Supplement is available at http://circgenetics.ahajournals.org/lookup/suppl/doi:10.1161/CIRCGENETICS.116.001410/-/DC1.
- Received February 17, 2016.
- Accepted July 5, 2016.
- © 2016 American Heart Association, Inc.
- Chen SN,
- Ballantyne CM,
- Gotto AM Jr.,
- Tan Y,
- Willerson JT,
- Marian AJ.
- Robinson JG,
- Kastelein JJ.
- Sabatine MS,
- Wasserman SM,
- Stein EA.
- Cohen JC,
- Hobbs HH.
- Chun S,
- Fay JC.
- Reva B,
- Antipin Y,
- Sander C.
- Peloso GM,
- Auer PL,
- Bis JC,
- Voorman A,
- Morrison AC,
- Stitziel NO,
- et al
- McClelland RL,
- Chung H,
- Detrano R,
- Post W,
- Kronmal RA.
- Auer PL,
- Johnsen JM,
- Johnson AD,
- Logsdon BA,
- Lange LA,
- Nalls MA,
- et al
- Pollin TI,
- Damcott CM,
- Shen H,
- Ott SH,
- Shelton J,
- Horenstein RB,
- et al
- Crosby J,
- Peloso GM,
- Auer PL,
- Crosslin DR,
- Stitziel NO,
- et al
The correlation of null alleles with human phenotypes can provide insight into gene function in humans. Here, we performed whole-exome sequencing in 3223 black individuals living in Jackson, Mississippi, to identify null and damaging missense variants and test these variants for association with 36 cardiovascular traits. We replicated the association of null and damaging missense variants in PCSK9 with low-density lipoprotein cholesterol and found 3 individuals in their 50s each compound heterozygous for PCSK9. Of note, one of these 3 individuals had a coronary artery calcification score in the 83rd percentile despite a low-density lipoprotein cholesterol of 32 mg/dL. We also found that individuals with rare damaging missense variants in VPS13A had higher red blood cell distribution width compared with noncarriers. Mutations in VPS13A have been previously reported to cause chorea-acanthocytosis, an autosomal-recessive neurodegenerative disorder that causes red blood cells to appear spiky. Only a limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable in ≈3000 black individuals.