Genome-Wide Approaches to Finding Novel Genes for Lipid Traits
The Start of a Long Road
The sequencing of the human genome and identification of common human genetic variations have made high-throughput interrogation of the human genome possible. Genome-wide association studies (GWASs) are now an exciting new approach to discovering the genetic variations underlying complex diseases and phenotypes.1 These studies allow “hypothesis-free” interrogation of the entire genome without the biases of candidate gene approaches. The GWAS approach has both highlighted candidate genes previously identified by the study of mendelian disorders or by basic biological investigation and illuminated novel genomic loci clearly associated with the phenotype or disease of interest that were previously unsuspected. These studies are only the beginning of a shift in genetics and genomics that has the potential to alter our understanding of physiology and pathophysiology and to profoundly affect clinical medicine for generations to come. However, relatively few genes identified through GWASs have been rigorously assessed for their function in physiological processes, used for clinical risk assessment and prediction, or validated as bona fide targets for drug development.
Articles pp 10 and 21
The plasma lipid phenotypes of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides have been a fertile early testing ground for GWASs. They are quantitative heritable traits strongly associated with cardiovascular risk and widely measured in clinical practice, making them methodologically, clinically, and logistically attractive for investigation with genome-wide approaches. In the lipid GWASs reported to date, many genes previously identified with mendelian lipid disorders (Table 1) have been significantly associated with lipid phenotypes,2–7 confirming that these genes also are important in influencing lipid traits in the general population. These lipid GWASs also have identified additional genomic loci that are highly significantly associated with plasma lipid levels2–7 (Table 2). Primary discovery in the lipid GWASs to date has largely been performed in previously compiled cohorts wholly or in part ascertained for disease-based phenotypes such as hypertension,7 diabetes,2,3,6 or the metabolic syndrome.4,5 Such disease-based ascertainment could influence GWAS results, particularly when the disease process affects the lipid profile or when a substantial proportion of the study population is on lipid medication. In this context, the lipid GWAS articles presented in this issue by Chasman et al8 and Heid et al,9 involving healthy population-based cohorts, are important. Heid et al report on the results from the GWAS analysis of HDL-C in the Cooperative Health Research in the Region of Augsburg study, a population-based study of 5680 south Germans, of whom 1643 underwent GWAS and 4037 were used for replication studies. Chasman et al report on the results from GWAS analysis of LDL-C, HDL-C, and triglycerides among 6382 white women from the Women’s Genome Health Study, a prospective study of initially healthy women ≥45 years of age. The latter study also tested for association with apolipoprotein (Apo) B and ApoA-I, the primary protein components of the LDL and HDL lipoprotein particles, respectively. Concordant results between LDL-C and ApoB or between HDL-C and ApoA-I provide somewhat greater confidence in newly discovered associations with LDL-C and HDL-C.
These 2 studies largely replicate the findings of previous lipid GWASs and do not report any major new genomic loci in association with lipid phenotypes. However, they are important in that they provide additional confirmation of previous GWAS associations within healthy populations and enable a more generalizable estimate of effect size of the genetic variants in the general population. With lipid GWAS results from cohorts using disease-based ascertainment now validated and largely replicated within cross-sectional studies, it is reassuring, and a bit surprising, how similar the gene regions and effect sizes are across both types of GWASs. It appears that further identification of new genetic loci associated with lipid phenotypes using genome-wide approaches in unselected populations will require much larger numbers of phenotyped individuals. In addition, dense genotyping of lipid GWAS loci with both common and low-frequency single-nucleotide polymorphisms needs to be performed in very large numbers of subjects to refine the signal. This will be facilitated by custom candidate gene arrays such as the ITMAT/Broad/CARe cardiovascular candidate gene array, which was developed in a multicenter collaboration. This array was designed to provide dense coverage using ≈50 000 single-nucleotide polymorphisms of > 2000 candidate genes associated with cardiovascular disease and related phenotypes, and will be used to genotype >200 000 subjects worldwide.
Are there any advantages for analyzing a quantitative trait such as a lipid measure with a case-control study design involving ascertainment based on quantitative phenotypic extremes? Candidate gene resequencing studies suggest that sequencing the phenotypic extremes not only provides the most information on rare variants but also is an efficient way to obtain information on common variants.10,11 One advantage of a case-control GWAS focused on the phenotypic extremes could be added power and reduced genotyping costs. For example, a small study of subjects with extreme hypertriglyceridemia (132 cases, 351 normal controls) replicated several published triglyceride GWAS associations with probability values nearing or exceeding genome-wide significance.12 More important, it could be argued that a study of the extremes of a phenotypic distribution asks a different question than the study of a quantitative trait across a normal population and that such studies might identify new genes that were “in the noise” in population-based GWAS. Using GWASs in phenotypic lipid extremes to identify evidence of association with specific genomic loci, followed by resequencing of the identified regions in the same subjects to identify rare variants, is an attractive paradigm that could complement the existing lipid GWAS efforts.
One of the major next steps for validated GWAS associations is the identification of functional or causative variants. Because of the nature of the GWAS approach and the extensive linkage disequilibrium underlying common variants across the genome, the most significantly associated single-nucleotide polymorphism at a genomic locus is unlikely to be the causative variant. Identification of the causative variant can reveal mechanistic insights into how common variants exert their phenotypic effects and can enable identification of the affected gene when GWAS findings implicate a gene-rich region. One such example is the LDL-C–associated 1p13.3 locus. There currently is uncertainty as to which gene is actually causing the association signal, with some studies presenting evidence for CELSR25 and others presenting evidence for SORT1.2 Resolution of this association is clearly of critical importance.
Causative variant identification is complicated by the fact that the majority of GWAS association signals occur in noncoding regions. The study by Heid et al particularly focuses on association signals within intergenic regions and explores potential hypotheses of how these regions may exert important functional effects on nearby genes. The functional analysis of these intergenic regions will require the use of technologies such as whole-genome RNA sequencing to identify novel transcripts in tissues and primary cells from subjects of defined genotype and the assessment of allelic distortion in which the 2 alleles present in a heterozygous subject are differentially expressed in the same tissue or cell type.
Although causative variant identification is important, it is equally important to assess the biological functions of the genes identified through GWAS, which does not require identification of the causative variant. In fact, causative variant identification is unlikely to provide substantial insight into the role of the gene in producing the associated phenotype, and knowledge of the causative variant is largely irrelevant for assessing the role of the emerging candidate genes in model systems, including cell culture and whole-model organisms. Such basic investigations will have great impact on our understanding of the biological processes of lipid metabolism and will provide new insights that will contribute to the development of novel therapeutics based on these GWAS discoveries.
Human genetics is permitting a more rigorous investigation of the potential causative relationships of LDL-C, HDL-C, and triglyceride levels to coronary heart disease (CHD) risk. For example, a robust finding is the association of the 1p13.3 locus (containing the genes CELSR2, PSRC1, and SORT1) not only with LDL-C levels but also with CHD.13 This finding strongly suggests that this locus influences CHD risk through its effects on LDL-C levels. Similar data are needed for many existing candidate genes (such as CETP and its relationship not just to HDL-C but also to CHD), as well as for new GWAS discoveries in the lipid field.
Lipid GWAS studies have major clinical implications, particularly in risk prediction and in the identification of novel therapeutic targets. The ability to effectively predict the risk of future disease is critically important, particularly when preventive measures are available for the disease, as is the case for CHD. The relative clinical value of a predictive test is directly related to the available interventions and their efficacy, safety, and cost. Even lifestyle interventions, which could theoretically be prescribed to all individuals regardless of a priori risk, may be embraced more avidly by those who know that they are at a substantially greater risk of developing the disease. For instance, traditional risk factors in coronary disease are inadequate to fully refine and personalize the assessment of lifetime risk. However, a 30-year-old man who knows he has a genetically determined 3-fold–increased risk of coronary artery disease by 60 years of age compared with the average 30-year-old man may make a more objective and informed decision in addressing his modifiable CVD risk factors in terms of exercise, diet, and cholesterol-lowering therapies. Interestingly, an analysis recently performed for single-nucleotide polymorphisms in candidate genes associated with LDL-C and HDL-C14 indicated that the combined information from the single-nucleotide polymorphisms predicted CHD better than what could have been predicted from the lipid levels alone. This approach may be able to be expanded with the addition of new gene regions identified from GWASs.
New drug approvals have stagnated over the last several years, and the cost of developing and bringing a new drug to market has soared. Indeed, many compounds that enter into development fail for lack of efficacy; others fail because of toxicity or adverse outcomes that are based on the mechanisms of the drug. “Preclinical” cell model systems and animal models play important roles in target validation—deciding which targets to pursue—and in the development of specific compounds and decisions about whether to take them into the clinic. However, there are many examples of drugs that worked spectacularly in animal models but failed to have adequate efficacy in humans. The bottom line is that, as useful as animal models are, they will never serve to faithfully predict the effects of targeting a particular molecular target. There is a major need for rational selection among these potential targets of those most likely to successfully produce effective and safe therapeutics. Carefully generated human genetic data linked to both “intermediate phenotypes” (such as LDL-C) and “outcomes” (such as CHD events) will likely be more powerful than any amount or type of preclinical data.
In summary, GWASs represent the culmination of years of groundwork, but the studies and results reported in this and similar journals only scratch the surface of the true potential of this approach. In addition to the obvious next steps of replication, signal refinement, and identification of causative variants, much work remains to be performed to define the biological roles of novel genes in metabolic pathways, including studying the roles of these genes in model systems. The use of GWAS data in refining clinical risk prediction and defining novel therapeutic targets is also of critical importance. GWASs represent a new dawn in the genetics of common diseases and quantitative phenotypes, but an even brighter future awaits in the glorious day of scientific discovery ahead.
We would like to acknowledge Sekar Kathiresan and Brendan Keating for their helpful discussions.
Sources of Funding
Financial support provided by the National Heart Lung and Blood Institute (F30 Ruth L. Kirschstein National Research Service Award for individual predoctoral MD/PhD fellows to A.C. Edmondson; R01 HL55323 and R01 HL89309 to Dr Rader) and by the Doris Duke Charitable Foundation (a Distinguished Clinical Scientist Award to Dr Rader).
The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.
Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, Cooper GM, Roos C, Voight BF, Havulinna AS, Wahlstrand B, Hedner T, Corella D, Tai ES, Ordovas JM, Berglund G, Vartiainen E, Jousilahti P, Hedblad B, Taskinen MR, Newton-Cheh C, Salomaa V, Peltonen L, Groop L, Altshuler DM, Orho-Melander M. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet. 2008; 40: 189–197.
Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, Sundvall J, Watanabe RM, Nagaraja R, Ebrahim S, Lawlor DA, Ben-Shlomo Y, Davey-Smith G, Shuldiner AR, Collins R, Bergman RN, Uda M, Tuomilehto J, Cao A, Collins FS, Lakatta E, Lathrop GM, Boehnke M, Schlessinger D, Mohlke KL, Abecasis GR. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008; 40: 161–169.
Sandhu MS, Waterworth DM, Debenham SL, Wheeler E, Papadakis K, Zhao JH, Song K, Yuan X, Johnson T, Ashford S, Inouye M, Luben R, Sims M, Hadley D, McArdle W, Barter P, Kesäniemi YA, Mahley RW, McPherson R, Grundy SM, for the Wellcome Trust Case Control Consortium, Bingham SA, Khaw KT, Loos RJ, Waeber G, Barroso I, Strachan DP, Deloukas P, Vollenweider P, Wareham NJ, Mooser V. LDL-cholesterol concentrations: a genome-wide association study. Lancet. 2008; 371: 483–491.
Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research, Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Boström K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Råstam L, Speliotes EK, Taskinen MR, Tuomi T, Guiducci C, Berglund A, Carlson J, Gianniny L, Hackett R, Hall L, Holmkvist J, Laurila E, Sjögren M, Sterner M, Surti A, Svensson M, Svensson M, Tewhey R, Blumenstiel B, Parkin M, Defelice M, Barry R, Brodeur W, Camarata J, Chia N, Fava M, Gibbons J, Handsaker B, Healy C, Nguyen K, Gates C, Sougnez C, Gage D, Nizzari M, Gabriel SB, Chirn GW, Ma Q, Parikh H, Richardson D, Ricke D, Purcell S. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007; 316: 1331–1336.
Wallace C, Newhouse SJ, Braund P, Zhang F, Tobin M, Falchi M, Ahmadi K, Dobson RJ, Marçano AC, Hajat C, Burton P, Deloukas P, Brown M, Connell JM, Dominiczak A, Lathrop GM, Webster J, Farrall M, Spector T, Samani NJ, Caulfield MJ, Munroe PB. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. Am J Hum Genet. 2008; 82: 139–149.
Chasman DI, Paré G, Zee RYL, Parker AN, Cook NR, Buring JE, Kwiatkowski DJ, Rose LM, Smith JD, Williams PT, Rieder MJ, Rotter JI, Nickerson DA, Krauss RM, Miletich JP, Ridker PM. Genetic loci associated with plasma concentration of low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglycerides, apolipoprotein A1, and apolipoprotein B among 6382 white women in genome-wide analysis with replication. Circ Cardiovasc Genet. 2008; 1: 21–30.
Heid IM, Boes E, Müller M, Kollerits B, Lamina C, Coassin S, Geiger C, Döring A, Klopp N, Frikke-Schmidt R, Tybjærg-Hansen A, Brandstätter A, Luchner A, Meitinger T, Wichmann H-E, Kronenberg F. Genome-wide association analysis of high-density lipoprotein cholesterol in the population-based KORA (Cooperative Health Research in the Region of Augsburg) study sheds new light on intergenic regions. Circ Cardiovasc Genet. 2008; 1: 10–20.
Wang J, Ban MR, Zou GY, Cao H, Lin T, Kennedy BA, Anand S, Yusuf S, Huff MW, Pollex RL, Hegele RA. Polygenic determinants of severe hypertriglyceridemia. Hum Mol Genet. 2008; 17: 2894–2899.
Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, Dixon RJ, Meitinger T, Braund P, Wichmann HE, Barrett JH, König IR, Stevens SE, Szymczak S, Tregouet DA, Iles MM, Pahlke F, Pollard H, Lieb W, Cambien F, Fischer M, Ouwehand W, Blankenberg S, Balmforth AJ, Baessler A, Ball SG, Strom TM, Braenne I, Gieger C, Deloukas P, Tobin MD, Ziegler A, Thompson JR, Schunkert H, for the WTCCC and the Cardiogenics Consortium. Genomewide association analysis of coronary artery disease. N Engl J Med. 2007; 357: 443–453.