To Replicate or Not to Replicate: The Case of Pharmacogenetic Studies
Establishing Validity of Pharmacogenomic Findings: From Replication to Triangulation
Replication of positive findings from genetic association studies has long served as the gold standard for establishing validity of observed genotype–phenotype associations.1–3 In addition to adequate statistical power and sound epidemiological design aimed at minimizing biases, suggested criteria for successful replication studies include: (1) same genetic variant, (2) same direction of association, (3) same definition of the phenotype, and (4) same ethnic group as reported in the discovery study.1 However, in the context of pharmacogenetics, the phenotype requirement is further constrained by the drug and intervention strategy of the initial investigation. As a result, and because many pharmacogenetic and pharmacogenomic studies arise from randomized clinical trials, replication of gene–drug response associations has proven to be a costly and logistically difficult endeavor.4,5 In light of these constraints, this review presents the case for rethinking the replication requirement and proposes several alternative approaches to validating pharmacogenomic findings.
Response by Ioannidis on p 412
The prevailing view is that the requirement of replication, currently imposed or strongly suggested by most peer-reviewed journals, has improved the quality of published studies by reducing false-positive findings and overcoming the winner’s curse.6 At first glance, replication rates for findings from both candidate gene studies and agnostic approaches, such as genome-wide association studies (GWAS) and more commonly whole exome or whole genome scans, appear low. For example, of 70 previously reported candidate loci for common diseases, only 22 had P values <10−7 across 100 subsequently conducted GWAS.7 Similarly, a recent analysis of large-scale genetic studies including at least 50 variants and published between 2007 and 2010 suggests that only 1.2% of initial findings are replicated.8 In pharmacogenetic studies of cardiovascular outcomes, where the phenotype is defined as response to drug treatment, the replication rate (3.4%) is comparable to that of genetic association studies in general (Table 1).9–13 However, such crude replication rates may be misleading and unflattering indicators of the validity of most findings in the field, as false-positives represent only 1 of many reasons for nonreplication. Below we summarize the factors that play a role in reproducibility of pharmacogenomics findings.
One factor contributing to single-digit replication rates is the low ratio of false-positive to false-negative findings inherent to genome-wide and other large-scale genetic studies.8 In pharmacogenetics, other factors include the low minor allele frequency of functional variants, small effect sizes, limited sample size, and often incomparable treatment regimens or phenotype definitions.14 However, a more serious challenge may lie in our definition of successful replication, which may be unnecessarily limiting, reflecting a crude grasp of the underlying biological mechanisms rather than failures of study design. For example, in some cases replication failure may actually be due to the underlying genetic architecture, ie, varying allele frequencies, locus heterogeneity or epistatic interactions between functional polymorphisms.15 Such differences are likely to become more prominent as the field shifts away from the common disease, common variant to place more emphasis on the contributions of many rare, and thus presumably more likely to be causal, genetic variants. Furthermore, the recently published results of the Encyclopedia Of DNA Elements (ENCODE) projects stress the significance of the still poorly understood role of intergenic variants, previously considered junk DNA.16 As high-resolution genotyping becomes more accessible, the number of potentially relevant genetic markers will increase exponentially, making the same-genetic-variant component of the replication requirement not only unfeasible but also meaningless, as the functional role of most newly identified variants is yet unknown.
In this review, we discuss alternative approaches that integrate different methods of testing kindred hypotheses, such as functional validation, combined analysis of several populations, and simulations. These could jointly provide a more comprehensive and rigorous avenue for validating genetic association findings (Table 2).
Functional validation of genetic variants showing significant associations with drug response in the initial screen is the most widely accepted substitute for replication.4 If the mechanisms of drug action are well known, it is often possible to design cell line or animal model experiments that could aid in establishing causality of the observed associations.5,8 One such study used a zebrafish model to screen for genes associated with response to QT-prolonging agents and successfully confirmed a hit from 2 human GWAS, located in the zebrafish ortholog of the GINS3 gene.17 In another study, lymphoblastoid cell line models were used to generate evidence supporting the causal association between a haplotype within HMCGR and decreased response to statin therapy.18 Despite the compelling nature of the results that stem from well-controlled experimental design, the effectiveness of the functional validation approach may be limited by imperfect translation between animal and human organisms, lack of appropriate assays, and ethical considerations.5
Joint Analysis of Several Populations
Combining the discovery and validation studies into a metapopulation has distinct advantages over the classic replication paradigm. Specifically, the 2-stage GWAS design, where the second stage is used as a replication study, has been shown to have consistently lower statistical power than joint analysis, even when more stringent significance levels are applied.19 Nevertheless, the increases in efficiency due to joint analysis do not obviate the concerns surrounding validity, as using lower significance levels is only likely to decrease the already low false-positive/false-negative ratio in large-scale studies.8 Additionally, heterogeneity in allele frequencies between populations may result in biased summary estimates of association, thus sometimes threatening rather than confirming the validity of the findings.20
The most novel group of methods that could constitute viable alternatives to replication of genetic studies comprises computational approaches based on simulation. Briefly, simulation-based methods consist of identifying candidate single-nucleotide polymorphisms using an agnostic, usually genome-wide approach and subsequently comparing the predictive ability of models including the preselected single-nucleotide polymorphisms with those constructed using a large number of random selections of genomic single-nucleotide polymorphisms.21 Although the simulation strategy has not been widely applied to studies of genetic predictors of drug response, it has been used in etiological studies, eg, to evaluate susceptibility genes in sporadic Parkinson disease.21,22 Although the use of this approach may still be limited by the availability of computational resources, it remains a promising avenue of investigation and could present a feasible alternative to replication in pharmacogenetics.
Replication Versus Triangulation
The idea of demanding a simply articulated gold standard for replication is tempting—it elegantly codifies a discipline-wide, acceptable level of skepticism. It also offers journal editors and reviewers a clear and convenient metric of study quality. However, until the definition of successful replication is sufficiently informed by advances in basic and clinical science, imposing the stringent requirement of reproducing associations at the variant level is as likely to obfuscate as clarify gene–phenotype associations. Already others have argued that relaxing the replication requirement from single variant to the gene or pathway level may bring the concept closer to etiologic relevance.23 We have suggested here that using different methods of testing the same pharmacogenetic hypothesis, such as those discussed above, as well as those yet to be developed, be considered in lieu of strict variant-centric replication. This approach is not without precedent: the idea of methodological triangulation is well established in the social sciences, another domain characterized by complex systems, nascent knowledge, and severe logistical and ethical constraints on experimental design. For many in the genomic sciences, embracing triangulation might be an uncomfortable departure from the epistemological binary afforded by a gold standard criterion; it would require reviewers to be intimately familiar with recent research in the area and assess the merits of each combination of studies individually.
Finally, it is important to remember that even when etiologic validity has been satisfactorily established, successful clinical applications of genetic association findings are far from guaranteed. A study of several chronic disease outcomes demonstrated that even replicated SNPs with highly significant measures of genotype–phenotype associations often have poor discrimination and reclassification ability, highlighting the need for better methods of identifying candidate variants.24 Making the necessary albeit inconvenient step from pure replication to multipronged evaluation of evidence would help build a more capable pipeline from genomic discovery to clinical application, charting the course toward the ultimate goal of personalized therapeutic approaches.
Response to Stella Aslibekyan, PhD, Steven A. Claas, MS, and Donna K. Arnett, MSPH, PhD
John P.A. Ioannidis, MD, DSc
Aslibekyan et al discuss some interesting alternatives to current replication standards in pharmacogenomics. I sympathize with several of their ideas, which overlap with what I proposed, but disagree with some others.
Relaxing the standard replication criteria (same genetic variant, same direction of association, same phenotype) is problematic. Of course, genetic variants that are in perfect linkage disequilibrium are exchangeable. Moreover, often genetic variants in the same locus may have independent effects, and it is important to develop efficient methods to capture these incremental pieces of information residing in the same locus. However, unlinked or poorly linked variants in the same gene cannot be used for replication of associations. Associations in different direction in different populations are possible in theory but hardly ever documented; their documentation would require separate replication of the different directions in the varied postulated settings. Phenotype definitions may also vary across studies, but some core components should be shared. Otherwise, we do not really know what phenotype we are talking about. The other criteria listed by Aslibekyan et al as standards are more flexible. For example, replication does not have to entail the exact same ethnic group; in fact, replication across different ancestry groups would make an association more broadly applicable; statistical efficiency may also improve when different ancestry populations are combined after appropriate stratification. Moreover, we have limited evidence whether pharmacogenetic associations pertain to wide drug class effects and diverse therapeutic strategies thereof, or are more restricted to single drugs and strategies. Casting a wider net may allow documenting associations that have broader relevance.
I am worried about relaxing the statistical criteria of replication. We have a miserable history of false-positives in the field. Joint analysis (meta-analysis) of multiple datasets in large consortia is the rule currently, and it should help discover more associations without diluting the statistical thresholds required. Functional validation is likely to offer mostly adjunctive information that cannot replace sound statistical support. There are so many different functional models and experiments that can be conceived, and their results are often incongruent. Moreover, many functional models may simply be insufficiently sensitive to detect what are modest genetic effects. Finally, simulation approaches using pathways or other complex computational means are interesting but have been largely unsuccessful to date. In fact, the examples cited by Aslibekyan et al on Parkinson disease are statistically flawed, and the discovered pathways are simply statistical artifacts: their selection process did not properly adjust for the extent of multiplicity of analyses; not surprisingly, these complex pathway signatures have not moved forward in the translational space.
Using other lines of evidence is reasonable, but no triangulation can overcome the need for rigorous statistical documentation and replication. The notion that somehow we can make major progress by using smart tools in suboptimal studies with small sample size is equivalent to asking for a free lunch. Real progress will require large studies and stringent statistics, not more of the creative data dredging that has already led this field to uncreative stagnation.
- © 2013 American Heart Association, Inc.
- Roden DM,
- Wilke RA,
- Kroemer HK,
- Stein CM
- Thompson JF,
- Hyde CL,
- Wood LS,
- Paciga SA,
- Hinds DA,
- Cox DR,
- et al
- Voora D,
- Shah SH,
- Reed CR,
- Zhai J,
- Crosslin DR,
- Messer C,
- et al
- Milan DJ,
- Kim AM,
- Winterfield JR,
- Jones IL,
- Pfeufer A,
- Sanna S,
- et al
- Medina MW,
- Gao F,
- Ruan W,
- Rotter JI,
- Krauss RM