To Replicate or Not to Replicate: The Case of Pharmacogenetic Studies
Have Pharmacogenomics Failed, or Do They Just Need Larger-Scale Evidence and More Replication?
Over the last 2 decades, pharmacogenetics and, subsequently, pharmacogenomics have generated a highly prolific literature with thousands of promising markers that may be associated with diverse aspects of treatment response (efficacy or harms) of many different drugs. Initial expectations that have been continuously fueled by more editorials and opinion pieces have insisted that research in this area can become the frontrunner and catalyst of personalized, individualized, precision medicine, as drug choice, schedule, dosage, and combination with other drugs could be guided by pharmacogenomics.
Response by Aslibekyan et al on p 418
Despite these promises, it is currently recognized that the large majority of proposed genetic associations (including pharmacogenetics) made in the past, mostly with candidate gene approaches, have not been replicated with larger-scale evidence and stringent statistical criteria.1,2 The old proposed associations were based on weak statistical rules, which are currently considered to select almost entirely false-positives. Certainly, the large majority of them have failed to reach stringent genome-wide significance (P<5×10−8), and most of them have effect sizes close to the null when tested in larger studies. Even with the advent of genome-wide association studies (GWAS), extending also into whole-genome sequencing in the last few years, the yield of new replicated pharmacogenomic markers has been very limited. Is this a sign of failure of pharmacogenomics or simply a reflection of the lack of large-scale evidence with stringent replication in the field?
In a recent review3 of the literature of GWAS until June 15, 2012, there were 139 meta-analyses of GWAS on diverse phenotypes where the total discovery sample exceeded 10 000 participants. None of them pertained to a pharmacogenomic phenotype. An evaluation of the National Human Genome Research Institute catalog of GWAS4,5 as of March 20, 2013, shows that among the most recent 300 entries of GWAS (spanning the time period since June 27, 2012), 23 examined pharmacogenomic phenotypes. Their features and discoveries (or mostly lack of discoveries) are shown in the Table. The most common phenotype examined was response to treatment. However, evaluation of treatment response was often based on surrogate markers or intermediate outcomes, rather than long-term hard clinical endpoints. Surrogate and intermediate outcomes may still be useful to document, but they may not be very informative clinically. The median sample size used in the discovery phase was only n=587, only 1 of the 23 evaluations had a sample size exceeding a total of 5000 participants, and, again, none had >10 000 participants. The largest study6 cataloged during these 9 months actually pertained to a laboratory outcome, ie, change in lipoprotein-associated phospholipase A2 (LPA2) activity with statin treatment, rather than a hard clinical outcome. The only other 2 studies with a sample size also exceeding 3000 again focused on statin treatment,7,8 and they discovered no new gene variant that was anywhere close to nominal genome-wide significance for association with treatment–response outcomes.
Across the 23 pharmacogenomics GWAS evaluations (Table), lack of any genome-wide significant discovery was the norm, occurring in 16 of the 23 studies. Of the other 7, 5 discovered a single gene locus, and 2 discovered 2 gene loci each. Of the 9 gene loci, other than the 2 alluded LPA2 associations and the association of the previously known obesity gene MC4R with weight gain in patients receiving antipsychotics, the other 6 associations were even a bit tenuous, with P-values ranging between 1×10−8 and 5×10−8. These may not even fully qualify for nominal genome-wide significance in some populations, although it is more likely that they are true than false.9 Furthermore, several of the discovered loci were not truly new discoveries. For example, the LPA2 and ABCG2 loci identified for genome-wide association with LPA2 activity in statin-treated patients in the study discussed above6 had already been found to be associated with genome-wide significance with low-density lipoprotein (LDL) change in statin-treated patients in another previous study that used the same Justification for the Use of Statins in Primary Prevention: An Intervention Trial Evaluating Rosuvastatin (JUPITER) trial dataset.10
Overall, against expectations, the yield of new discovered loci is minimal. Moreover, the effect sizes are typically small or modest, and only a very small portion of the variance of treatment response or risk of treatment harm is explained. For example, each of the 3 loci (LPA, ABCG2, APOE) that are rigorously associated with LDL response in statin-treated patients (either absolute or fractional percentage change) corresponds to 4 to 6 mg/dL average difference in absolute LDL reduction per allele, and they have minor allele frequencies in the 0.05 to 0.15 range.10 This is in contrast with the yield of discoveries for many disease phenotypes such as diabetes, coronary heart disease, inflammatory bowel disease, multiple sclerosis, age-related macular degeneration, and lipid levels, where more than a dozen or even more than a hundred genetic loci have been strongly validated for each.3
Moreover, for at least 3 years now, the expectation has been that newer platforms using exome or full-genome sequencing may improve the genome coverage and identify far more variants that regulate phenotypes of interest, including pharmacogenomic ones.11–13 Despite an intensive research investment, these promises have not yet materialized as of early 2013. A PubMed search on May 12, 2013, with (pharmacogenomics* OR pharmacogenetc*) AND sequencing yielded an impressive number of 604 items. I scrutinized the 80 most recently indexed ones. The majority were either reviews/commentary articles with highly promising (if not zealot) titles or irrelevant articles. There was not a single paper that had shown robust statistical association between a newly discovered gene and some pharmacogenomics outcome, detected by sequencing. If anything, the few articles with real data, rather than promises, show that the task of detecting and validating statistically rigorous associations for rare variants is likely to be formidable. One comprehensive study14 sequencing 202 genes encoding drug targets in 14 002 individuals found an abundance of rare variants, with 1 rare variant appearing every 17 bases, and there was also geographic localization and heterogeneity. Although this is an embarrassment of riches, eventually finding which of these thousands of rare variants are most relevant to treatment response and treatment-related harm will be a tough puzzle to solve even with large sample sizes.
Despite these disappointing results, the prospect of applying pharmacogenomics in clinical care has not abided. If anything, it is pursued with continued enthusiasm among believers. But how much of that information is valid and is making any impact?
The Food and Drug Administration has already endorsed 119 pharmacogenetic associations for inclusion in drug labels, and they pertain to 107 different drugs.15 This means that only a tiny fraction of approved drugs currently have any pharmacogenetic tagging because an estimated 2356 distinct molecular entities have been approved for human use by the Food and Drug Administration, and 3936 distinct molecular entities have been approved for human use in major markets worldwide, including the United States.16 The 119 associations are a highly mixed set: some are whole chromosomes or chromosomal markers known for a long time, 56 are classic cytochrome P (CYP)–related markers, and only the minority pertains to recently discovered associations. Importantly, the fact that these associations are listed in the drug labels does not necessarily mean that one should do something about them or even that the bare associations are credible. Only 8 of these 119 associations appear in boxed warnings, which represent the strongest possible sign that proper attention should be paid to avoid major problems when using a specific drug. There is no systematic documentation of the strength of the evidence surrounding each of these associations in the drug labels. Some of the 119 mentions are clearly just negative statements; eg, in the ticagrelor label, there is a mention of CYP2C19, but, reading the exact quote, it is about a negative, null association. “In a genetic substudy of PLATO (n=10 285), the effects of BRILINTA compared with clopidogrel on thrombotic events and bleeding were not significantly affected by CYP2C19 genotype.”
Of course, there are exceptions, and some strong, thoroughly validated associations are also listed. However, even for those it is often unclear what exactly to do with them in clinical practice. Randomized trials to test the utility of routinely used specific pharmacogenomic markers are still uncommon. Exceptions such as those for HLA-B*5701 and abacavir,17 or CYP2C19 and clopidogrel,18 only prove the rule. Even when randomized trials are performed, often the outcomes are surrogate markers; eg, in clopidogrel, the trial evaluating CYP2C19 used as the main outcome platelet reactivity rather than major bleeding.18 One of the most touted early applications of cardiovascular pharmacogenetics, the titration of warfarin dose based on genetic profiling, still remains spurious as to its real clinical utility after a decade of research, including clinical trials, as documentation of clinical impact on major end points is still pending, and the cost-effectiveness seems not so favorable as originally perceived.
The lack of a systematic approach to the pharmacogenetic evidence and the inconsistent use of pharmacogenetic information in Food and Drug Administration labeling may create confusion in clinical practice. For example, vague guidance may sometimes lead to withholding therapy in clinical situations based on the results of spurious genetic testing where evidence is lacking about its clinical utility, or the interpretation of the genetic results may be so cumbersome and esoteric to practitioners that they may delay or discourage therapy that would have been beneficial. Other times, where the offered guidance is strong, as in the case of the black-box warnings, it is unclear whether this is more justified than other cases that have as good or better evidence but no black-box warning. Even then, physicians may not ubiquitously adhere to the offered guidance. It remains a challenge how to introduce testing routinely into clinical practice even for black-box warning associations, such as HLA-B*1502 and carbamazepine. For extremely widely used drugs, such a statins, introduction of routine pharmacogenetic testing (eg, testing for myopathy risk) would affect a large segment of the population under medical care, and it is a challenge how to do this efficiently without adding more cost and complexity to overburdened healthcare systems.
I have previously argued19 that we need more, not fewer, randomized trials to see what we can do with pharmacogenomics or any other —omics markers. Even when some of these markers have undergone extensive replication and satisfied stringent levels of statistical significance for association, this does not mean that they also have clinical utility. The size of the association effect can offer a hint to the possible future use of markers. Markers with subtle associations of small/modest effects are likely to have no clinical impact unless many of them are combined (a possibility that is not currently an option for pharmacogenomics where for most drugs we only know of 1 or a few well-replicated markers). Even when the association effect is sizeable, clinical utility is still to be proven, and it depends on many other factors, including availability of differential management options for people with and without the marker, ease of use, sufficiently error-free information flow from the laboratory to the clinician, and reasonable cost–benefit ratio, among others.
Before investing into expensive clinical trials for testing the new crop of mostly weak pharmacogenomic markers, a more radical decision is whether we should find some means to improve the yield of pharmacogenomics or just call it a day and largely abandon the field. The latter option sounds like a painfully radical solution, but on the other hand, we have already spent many thousands of papers and enormous funding, and the yield is so minimal. The utility yield seems to be even diminishing, if anything, as we develop more sophisticated genetic measurement techniques. Perhaps we should acknowledge that pharmacogenomics was a brilliant idea, we have learned some interesting facts to date, and we also found a handful of potentially useful markers, but industrial-level application of research funds may need to shift elsewhere.
I am yet reluctant to fully adopt this radical stance until we offer pharmacogenomics a real chance. One may argue that we just need to relieve our stringent methods and statistical criteria that evolved in the GWAS era and start again working with the zillions of markers that were proposed in the past based on biological speculation, small epidemiological studies, spurious methods, patchy laboratory corroboration in cell lines or animals, and usually a surrealistic combination of all of these. I am afraid that this would be a desperate scientific suicide, drowning the field into noise, nonreplication, nonutility, and eventually oblivion amid prolific publication of nonsense. I think that if we want to give pharmacogenomics a last chance, it should require better, more rigorous methods and even more stringent replication and clinical validation. The emergence of systematic efforts, such as those led by PharmGKB,20,21 show that the field has gradually recognized the importance of joining data from diverse teams and taking an increasingly careful and integrative look at the available evidence. However, we also need to move to the next step. This next step could include the following.
Large-Scale Consortia and Datasets
Discoveries with agnostic approaches are unlikely to accelerate unless we routinely collect and examine data from large collections of participants. Data from several tens of thousands of participants with particular treatment responses or specific adverse events to drugs of interest would be indispensable. One may argue that we cannot run such large studies. However, given that many drugs are used by tens of millions of patients, asking for pharmacogenomics studies of tens of thousands of treated patients should be realistic. After a number of successful early efforts with biobanks and large databases of electronic health records,22–24 eventually the advent of efficient big data approaches could be transformative in this regard.
Stringent Statistical Criteria for Replication
Criteria of genome-wide significance need to continue to be applied. As we move into whole-genome sequencing and even more complex analyses with an even higher multiplicity burden, genome-wide significance may need to aim for P-value thresholds even less than P<10–8. It is unclear whether more lenient criteria, eg, P<10–5, may be sparingly used for markers that have strong and clearly prespecified biological support25; eg, they pertain to the gene that codes for the main enzyme that metabolizes the drug, and variants are clearly shown to have functional relevance. We need some empirical validation on whether such lenient criteria do not increase substantially the rate of false-positives even for such genes, especially because we cannot know which of the multiple variants in these genes are most pertinent to important phenotypes. The functional relevance of variants can be assessed with multiple methods and systems,3 and it is far from clear that these agree, let alone that they agree with the epidemiological evidence.26,27
Pharmacogenomic Polygenic Signatures
Given that pharmacogenomic markers are likely to have subtle effects when examined 1 at a time, more research is needed in isolating, replicating, and clinically validating pharmacogenomic polygenic signatures that contain many markers. In fact, if prognostic discrimination is all we are interested in (as opposed to knowing exactly what genetic variant does what), we can use methods where thousands of genetic markers (the ones at the top levels of tentative statistical support) are analyzed en bloc as to their predictive ability. These methods have been used successfully for other difficult phenotypes such as schizophrenia.28 The polygenic models would perform well if sufficiently large datasets can be accumulated.29 Even though one cannot pinpoint which are the exact genes that make the difference among the top 1 00 000 variants that are considered, these influential genes are hidden as needles in a hay stack within the top 1 00 000 ones, and cumulatively they explain a substantial proportion of the risk variance. An early application of estimation of polygenic heritability based on GWAS data for bronchodilator response yielded a heritability estimate of 28.5%.30 Such studies would be meaningful only if we can create large consortia with large sample sizes so as to escape from the noise zone.
Handling Rare Variation
If much of the pharmacogenetic profile is explained by rare variants, we need better methods on how to detect and validate these associations.31–33 If the response and harms to drugs are largely genetically private, we may not be able to make much progress even if large studies are assembled. This is a distinctly likely probability.13 Moreover, for rare variants related to rare adverse events or rare treatment responses, we should also probably accept that it will not be realistic to detect and validate them with current approaches, even with large datasets.
Clinical Validation and Utility
Markers and signatures that reach strong statistical support should be considered for clinical validation in randomized trials. Otherwise, they may remain biologically interesting but clinically useless observations. Selection of target markers for clinical experimentation needs to take into account the strength of the association, the potential of clinical utility, and the other factors discussed above. Testing should be done ideally in the setting of large trials with patient-relevant hard clinical end points to ensure definitive answers34 rather than wasting efforts in an endless series of inconclusive pilot studies.
One may argue that with limited research funds and a shrinking research budget, it would be too expensive to launch large studies, consortia, and definitive clinical trials. However, the cumulative cost of continuing to perform hundreds and thousands of small, inconclusive studies that perpetuate ambiguity is likely to be even higher. If we follow the path delineated above, and these enhancements of the pharmacogenomics research agenda also do not work, we may still need to come up with new ideas that are not visible in the horizon currently. However, we should also consider seriously that we have done our best, and it is time to move on, leaving the overrated promises of pharmacogenomics to rest in peace.
Response to John P.A. Ioannidis, MD, DSc
Stella Aslibekyan, PhD; Steven A. Claas, MS; Donna K. Arnett, MSPH, PhD
We read with great interest the article by Ioannidis, in which the author juxtaposed the promises of pharmacogenetic findings with their actual yield in terms of clinical applications. Although we agree with many of the ideas presented in the article, we would like to challenge the author’s suggestionthat more stringent replication criteria are the answer. As we have outlined in our accompanying piece, simply lowering the P-value threshold for single variants may not be biologically justified. If only the most statistically significant variants are considered promising, we are likely to miss polymorphisms that may contribute to the genetic architecture of drug response via epistatic or gene-environment interactions, but have a low effect on their own. Additionally, because many of the relevant polymorphisms are likely to have allele frequencies that are (1) low and (2) heterogeneous between populations, requiring classic single variant replication may be shooting the field of pharmacogenomics in the foot. Therefore, we believe that another idea put forth by Ioannidis—that of creating polygenic signatures, and ostensibly conducting replication at the gene- or pathway-based level—holds more promise for future pharmacogenomic discovery.
The second point of our disagreement involves the inherent clinical value of intermediate phenotypes rather than “long-term, hard endpoints.” In the setting of chronic disease, the utility of easily quantifiable, intuitively understood disease traits such as blood pressure or body mass index is hard to overstate. Such endpoints provide an opportunity for pharmacogenomic applications not only in clinical care, but also prevention; additionally, they often shed light at the underlying biological mechanism.
Finally, although we concur that small-scale, underpowered trials should be abandoned because they are unlikely to produce any conclusive insights, we are more optimistic regarding the cost of large-scale investigations. As the cost of targeted sequencing predictably declines, much information can be gathered from studies ancillary to already funded drug trials, which routinely collect samples that could be used for genotyping or epigenetic phenotyping. One such potential trove of data is the currently ongoing Cardiovascular Inflammation Reduction Trial that aims to test the effects of low-dose methotrexate, an agent characterized by high and heritable variability in treatment response, in 7000 stable coronary artery disease patients. We agree with Ioannidis that novel study formats like consortia, biobanks, and electronical medical records repositories will also provide viable options for cost-effective investigations. Insights from systems biology may also help reduce the costs of pharmacogenetic discovery: to that end, recent work by Fusaro et al illustrates the potential of using simulations to design efficient and informative clinical trials.
Although we respect Dr. Ioannidis's (largely hyperbolic, at least for now) suggestion to abandon the pharmacogenetic agenda, we choose to embrace the words of Franklin D. Roosevelt: “There are many ways of going forward, but only one way of standing still.” As long as we continue to see “ways of going forward,” such as the ideas suggested in Ioannidis’s and our commentaries and the many more that are surely being nurtured within the discipline, we believe the clinical potential of pharmacogenomics will be someday realized.
- © 2013 American Heart Association, Inc.
- Panagiotou OA,
- Willer CJ,
- Hirschhorn JN,
- Ioannidis JPA
- Hindorff LA,
- Sethupathy P,
- Junkins HA,
- Ramos EM,
- Mehta JP,
- Collins FS,
- et al
- Hindorff LA,
- MacArthur J,
- Morales J,
- Junkins HA,
- Hall PN,
- Klemm AK,
- et al
- Chu AY,
- Guilianini F,
- Grallert H,
- Dupuis J,
- Ballantyne CM,
- Barratt BJ,
- et al
- Hopewell JC,
- Parish S,
- Offer A,
- Link E,
- Clarke R,
- Lathrop M,
- et al
- Panagiotou OA,
- Ioannidis JP
- Chasman DI,
- Giulianini F,
- MacFadyen J,
- Barratt BJ,
- Nyberg F,
- Ridker PM
- Nelson MR,
- Wegmann D,
- Ehm MG,
- Kessner D,
- St Jean P,
- Verzilli C,
- et al
- 15.↵United States Food and Drug Administration http://www.fda.gov/drugs/scienceresearch/researchareas/pharmacogenetics/ucm083378.htm. Accessed May 12, 2013.
- Huang R,
- Southall N,
- Wang Y,
- Yasgar A,
- Shinn P,
- Jadhav A,
- et al
- Kho AN,
- Pacheco JA,
- Peissig PL,
- Rasmussen L,
- Newton KM,
- Weston N,
- et al