Prediction of Fetal Hemoglobin in Sickle Cell Anemia Using an Ensemble of Genetic Risk Prediction Models
Background—Fetal hemoglobin (HbF) is the major modifier of the clinical course of sickle cell anemia. Its levels are highly heritable and its interpersonal variability is modulated in part by three quantitative trait loci (QTL) that effect HbF gene expression. Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) in these QTLs that are highly associated with HbF but explain only 10 to 12% of the variance of HbF. Combining SNPs into a genetic risk score (GRS) can help to explain a larger amount of the variability of HbF level but the challenge of this approach is to select the optimal number of SNPs to be included in the GRS.
Methods and Results—We develop a collection of 14 models with GRS composed of different numbers of SNPs, and use the ensemble of these models to predict HbF in sickle cell anemia patients. The models were trained in 841 sickle cell anemia patients and were tested in three independent cohorts. The ensemble of 14 models explained 23.4% of the variability in HbF in the discovery cohort, while the correlation between predicted and observed HbF in the 3 independent cohorts ranged between 0.28 and 0.44. The models included SNPs in BCL11A, the HBS1L-MYB intergenic region and the site of the HBB gene cluster, QTL previously associated with HbF.
Conclusions—An ensemble of 14 genetic risk models can predict HbF levels with accuracy between 0.28 and 0.44 and the approach may prove useful in other applications.
- Received October 19, 2013.
- Revision received January 17, 2014.
- Accepted February 6, 2014.