TO THE EDITOR:

In their recent report in Blood Advances, Martínez-Laperche et al1  proposed a predictive model for the risk of acute and chronic graft-versus-host disease (GVHD) based on the selection of clinical variables and 25 single-nucleotide polymorphisms (SNPs), spanning different cytokines previously found relevant in the biology of GVHD. The study included 509 patients and their sibling donors from the Spanish Group for Hematopoietic Stem Cell Transplantation (GETH). We attempted to validate their SNP associations with acute GVHD (aGVHD) and nonrelapse mortality (NRM) in DISCOVeRY-BMT, a 2-cohort study of almost 3000 8/8 HLA-matched unrelated donor-recipient paired samples from individuals of European American ancestry.2-5 

The analyses of the GETH data were performed using univariate logistic and least absolute shrinkage and selection operator (LASSO) models.6  Univariate logistic regression models for grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, and NRM were constructed via regression of clinical and genetic variables. LASSO models were constructed for all outcomes using genetic and clinical variables from a training set of 85% of patients, with the remaining 15% comprising the test set. Risk scores were calculated using the LASSO outcomes, and recipients were stratified into high- and low-risk groups based on the proportion of events in the total data. To validate the clinical and genetic associations seen with grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, and NRM, we constructed univariate clinical logistic regression models identical to those of Martínez-Laperche et al1  and LASSO models for grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, and NRM. Twenty-four of the 25 SNPs tested were available for univariate and LASSO analyses in DISCOVeRY-BMT; rs9267487 was used as a surrogate for rs361525 in TNF, because the SNPs are in linkage disequilibrium at r2 = 0.82.7 

Clinical characteristics of both GETH and DISCOVeRY-BMT data are listed in Table 1. In GETH data, univariate clinical models showed total-body irradiation and female donor/male recipient increased the odds of aGVHD and NRM, respectively (P < .05). In DISCOVeRY-BMT, total-body irradiation increased the odds of grade 2 to 4 aGVHD (odds ratio [OR], 1.3; 95% confidence interval [CI], 1.12-1.51; P = 5.9 × 10−4) and grade 3 to 4 aGVHD (OR, 1.22; 95% CI, 1.01-1.47; P = .04), whereas female donor/male recipient was not associated with NRM at P < .05.

Table 1.

Characteristics of European Americans in DISCOVeRY-BMT GETH cohorts

Category  
Cohort 1* (n = 2110) Cohort 2 (n = 777) GETH (n = 359) 
Median age (range), y† 45.3 (0.6-74.5) 50 (0-74) 45 (0-68) 
Recipient sex    
 Male 1191 (56.4) 429 (55.2) 225 (63) 
 Female 919 (43.6) 348 (44.8) 134 (37) 
Donor sex‡    
 Male 1396 (68) 554 (72.6) 201 (56) 
 Female 656 (32) 209 (27.4) 158 (44) 
Female donor/male recipient†    
 Yes 332 (15.7) 101 (13.0) 91 (25) 
 No 1778 (84.3) 676 (87.0) 268 (75) 
Disease†    
 AML 1282 (60.8) 488 (62.8) 116 (32) 
 ALL 483 (22.9) 94 (12.1) 49 (13.5) 
 MDS§ 345 (16.4) 195 (25.1) 34 (9.5) 
 Other‖ 0 (0) 0 (0) 160 (44) 
Stem cell source†    
 Peripheral blood 1365 (64.7) 567 (73) 250 (69.6) 
 Bone marrow 745 (35.3) 210 (27) 109 (30.4) 
Conditioning intensity†    
 Myeloablative 1540 (73) 551 (71) 253 (70) 
 Reduced intensity 570 (27) 225 (29) 106 (30) 
Conditioning regimen†    
 TBI 973 (46.1) 280 (36) 94 (26) 
 No TBI 1137 (53.9) 497 (64) 265 (74) 
Outcome    
 Grade 2-4 aGVHD 973 (46.1) 358 (46.1) 115 (32) 
 Grade 3-4 aGVHD 389 (18.4) 168 (21.6) 50 (14) 
 NRM 405 (19.2) 141 (18.1) 86 (24) 
Category  
Cohort 1* (n = 2110) Cohort 2 (n = 777) GETH (n = 359) 
Median age (range), y† 45.3 (0.6-74.5) 50 (0-74) 45 (0-68) 
Recipient sex    
 Male 1191 (56.4) 429 (55.2) 225 (63) 
 Female 919 (43.6) 348 (44.8) 134 (37) 
Donor sex‡    
 Male 1396 (68) 554 (72.6) 201 (56) 
 Female 656 (32) 209 (27.4) 158 (44) 
Female donor/male recipient†    
 Yes 332 (15.7) 101 (13.0) 91 (25) 
 No 1778 (84.3) 676 (87.0) 268 (75) 
Disease†    
 AML 1282 (60.8) 488 (62.8) 116 (32) 
 ALL 483 (22.9) 94 (12.1) 49 (13.5) 
 MDS§ 345 (16.4) 195 (25.1) 34 (9.5) 
 Other‖ 0 (0) 0 (0) 160 (44) 
Stem cell source†    
 Peripheral blood 1365 (64.7) 567 (73) 250 (69.6) 
 Bone marrow 745 (35.3) 210 (27) 109 (30.4) 
Conditioning intensity†    
 Myeloablative 1540 (73) 551 (71) 253 (70) 
 Reduced intensity 570 (27) 225 (29) 106 (30) 
Conditioning regimen†    
 TBI 973 (46.1) 280 (36) 94 (26) 
 No TBI 1137 (53.9) 497 (64) 265 (74) 
Outcome    
 Grade 2-4 aGVHD 973 (46.1) 358 (46.1) 115 (32) 
 Grade 3-4 aGVHD 389 (18.4) 168 (21.6) 50 (14) 
 NRM 405 (19.2) 141 (18.1) 86 (24) 

Data are n (%) unless otherwise noted.

ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; MDS, myelodysplastic syndrome; TBI, total-body irradiation.

*

To be comparable to multivariate analysis in literature, for analysis of grade 2-4 aGVHD and grade 3-4 aGVHD, patients who died without grade 2-4 aGVHD or grade 3-4 aGVHD, respectively, within the 100 d were excluded. After exclusion, difference in number of patients between grade 2-4 aGVHD cohort and grade 3-4 aGVHD cohort was <0.6% in cohorts 1 and 2, so we did not include separate patient characteristics for grade 3-4 aGVHD cohort in table.

Tested in clinical models by Martínez-Laperche et al.1 

There were 2052 donors in cohort 1 and 763 donors in cohort 2 for analysis of NRM, 1906 donors in cohort 1 and 698 donors in cohort 2 for analysis of grade 2-4 aGVHD II-IV, and 1894 donors in cohort 1 and 692 in cohort 2 for grade 3-4 aGVHD.

§

Includes MDS patients in DISCOVeRY-BMT and both MDS and myelofibrosis patients in GETH.

Other includes non-Hodgkin lymphoma, Hodgkin disease, multiple myeloma, chronic myeloid leukemia, and aplastic anemia.

We selected 6 SNPs at P < .05, with 95% CIs that did not include 1, and tested them for association with aGVHD or NRM in DISCOVeRY-BMT. We assumed the same transmission models (dominant, additive, recessive, or codominant) for donors and recipients presented in the report by Martínez-Laperche et al1  and adjusted for significant clinical variables but not multiple comparisons. Donor GG/AG vs AA at rs3819024 was associated with NRM in DISCOVeRY-BMT cohort 1 at P < .05 (OR, 0.79; 95% CI, 0.64-0.99; P = .04) but not cohort 2 (OR, 0.89; 95% CI,0.61-1.29; P = .53), although the effect direction was also risk reducing, as in the GETH data (Figure 1). The dominant models for rs16944, rs1143627, and rs2275913 trended toward a risk reduction in both cohorts in DISCOVeRY-BMT, with an OR of <1, whereas these genetic models showed an OR of >1 in the GETH cohort (Figure 1).

Figure 1.

Validation in DISCOVeRY-BMT of significant SNP associations in Martínez-Laperche et al report. This figure shows the ORs, 95% CIs, and P values as reported by Martínez-Laperche et al1  (literature) and validation of these associations in DISCOVeRY-BMT cohort 1 and cohort 2. Martínez-Laperche et al reported 6 SNPs with P < .05 and 95% CIs that did not include OR = 1, which are shown along the y-axis. The results are grouped by those shown in the Martínez-Laperche et al report (literature) and DISCOVeRY-BMT (cohort 1 and cohort 2), as indicated in the gray boxes displayed along the top of the figure. Each circle, square, and triangle represents an OR from regression analysis, with the shapes corresponding to grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, or NRM, respectively. The colors represent P values of the analyses: .05 > P ≥ .01 (blue) and P > .05 (red). rs2275913 and rs3819024 are donor variants; the remaining 4 SNPs are recipient associations. *SNPs with a 5% and **SNPs with a 10% difference in minor allele frequency (MAF) between DISCOVeRY-BMT and GETH, respectively; all other MAFs were comparable between the groups. For example, (recipient) SNP rs4711998 was associated with increased risk of grade 2 to 4 aGVHD in GETH (blue and to the right of OR = 1) but resides on OR = 1 and P > .05 in DISCOVeRY BMT.

Figure 1.

Validation in DISCOVeRY-BMT of significant SNP associations in Martínez-Laperche et al report. This figure shows the ORs, 95% CIs, and P values as reported by Martínez-Laperche et al1  (literature) and validation of these associations in DISCOVeRY-BMT cohort 1 and cohort 2. Martínez-Laperche et al reported 6 SNPs with P < .05 and 95% CIs that did not include OR = 1, which are shown along the y-axis. The results are grouped by those shown in the Martínez-Laperche et al report (literature) and DISCOVeRY-BMT (cohort 1 and cohort 2), as indicated in the gray boxes displayed along the top of the figure. Each circle, square, and triangle represents an OR from regression analysis, with the shapes corresponding to grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, or NRM, respectively. The colors represent P values of the analyses: .05 > P ≥ .01 (blue) and P > .05 (red). rs2275913 and rs3819024 are donor variants; the remaining 4 SNPs are recipient associations. *SNPs with a 5% and **SNPs with a 10% difference in minor allele frequency (MAF) between DISCOVeRY-BMT and GETH, respectively; all other MAFs were comparable between the groups. For example, (recipient) SNP rs4711998 was associated with increased risk of grade 2 to 4 aGVHD in GETH (blue and to the right of OR = 1) but resides on OR = 1 and P > .05 in DISCOVeRY BMT.

We assessed effect sizes detectable in DISCOVeRY-BMT and GETH for the 25 SNPs tested ranging in MAF from 0.4% to 45%. After correcting for testing in 22 (of 25) independent SNPs (α = 0.05/22 = 0.0023), DISCOVeRY-BMT was powered (β = 0.80) to detect hazard ratios from 4.0 to 1.2 (NRM and aGVHD 3-4) and 2.4 to 1.1 (aGVHD 2-4) for MAFs from 0.4% to 45%, respectively.3  We were therefore powered to validate the effect sizes detected in the Martínez-Laperche et al1  report.3  In contrast, the GETH cohort was powered (β = 0.80) to detect medium effect sizes (OR, 1.6) for a MAF of 45% in grade 2 to 4 aGVHD; however, for variants at the bottom end of the MAF range, the detectable OR was ∼40, with further reduced power for NRM and grade 3 to 4 aGVHD outcomes. Compounding this sample size challenge was the fact that GETH SNP data were analyzed for all outcomes using logistic regression. For aGVHD, these events occur early on, and this model may be reasonable. However, failure to analyze overall survival or NRM with survival models can result in information loss, and the use of survival models is preferable.5,8-11 

The well-described LASSO and prediction modeling performed in the GETH cohorts provides us the opportunity to consider important issues in association testing and prediction modeling of transplantation outcomes using genetic variables.6,12  Because of problems with interpretations of the proposed LASSO models in the report, we did not attempt validation of the exact model in DISCOVeRY-BMT. Specifically, the same SNPs were included under multiple transmission assumptions for all LASSO models. For example, in the LASSO model for grade 3 to 4 aGVHD, 7 of 11 SNPs were included in 2 different transmission models, and 1 SNP was included in 3 transmission models (ie, rs8193036 and rs2430561 [recipients] and rs2275913 [donors] were included as recessive [2 copies of the minor allele impact risk] and additive [each additional copy of the minor allele impact risk]). The LASSO models for all outcomes had various combinations of additive, dominant, recessive, and codominant models for the same SNP. In human population genetic studies, it is not biologically reasonable that the same SNPs are acting in multiple contradictory ways to change the risk of transplantation outcomes. Although it is common practice for genetic association and prediction studies to use additive genetic models,13,14  other appropriate analytic approaches have been developed to assess modes of transmission.15,16 

To consider the 25 SNPs in aggregate, we constructed 3 separate LASSO models for grade 2 to 4 aGVHD, grade 3 to 4 aGVHD, and NRM under an additive genetic model. All 25 recipient and donor SNPs for cohorts 1 and 2 were possible variables (this approach most closely represents Table 4 in the Martínez-Laperche et al1  report); each cohort was divided into a training (85%) and test (15%) set. To select the variables with the smallest prediction errors, we built logistic LASSO regression using fivefold cross validation to find the best penalty parameter λ, and we repeated this cross validation 50 times to give a robust estimate of λ by taking the median. The final LASSO model selected the SNPs with the best λ. LASSO models for 2 outcomes selected 1 SNP; however, the coefficients were almost 0 (<5 × 10−15). Thus, for unrelated donor-recipient pairs, we concluded the 25 SNPs were not associated with either aGVHD or NRM and therefore did not pursue additional predictions. In addition, the predictive models specified in the report cannot be generalized to other transplantation patient cohorts or applied in a prospective setting, because high- and low-risk cut points were determined by the proportion of events in the GETH cohort. The successful stratification of risks for aGVHD and NRM outcomes in 1 cohort may not necessarily be carried over to another cohort when prediction is needed.17 

Our inability to validate the univariate associations or find SNPs predictive of either aGVHD or NRM may have been driven by the differences in transplantation type (related sibling vs unrelated donor), distribution of disease, genomic ancestry, and/or event rates between DISCOVeRY-BMT and GETH. It is important to consider that these single SNPs were initially identified as important in small expression studies or were selected because they reside in gene promoter regions. However, recent large-scale functional studies of SNPs in thousands of samples can now be leveraged.18-20  For example, rs3819024, although selected as an IL17A donor variant, is correlated with the expression of PAQR8, not IL17A, and only in whole blood, although tested in >70 tissues.18-20  Therefore, collectively, these variants may not be informative for the genes of interest. Irrespective of this, we must consider that the DISCOVeRY-BMT results show these SNP findings are not generalizable to other transplantation populations, and to start building successful prediction models of transplantation outcomes, we need larger homogeneous studies across multiple patient populations.

Acknowledgments:

This work was supported by grants from the National Heart Lung and Blood Institute (NHLBI) (1R01HL102278) and National Cancer Institute (NCI) (1R03CA188733), National Institutes of Health (NIH) (L.E.S.-C. and T.H.). E.K. is supported by the Pelotonia Foundation Graduate Student Fellowship. S.P. is supported by the NCI (R01CA168814), National Institute of Child Health and Human Development (R01HD074587 and U54HD090215), and Leukemia & Lymphoma Society (1293-15). The Center for International Blood and Marrow Transplant Research is supported by public health service grant/cooperative agreement 5U24-CA076518 from the NCI, NHLBI, and National Institute of Allergy and Infectious Diseases; grant/cooperative agreement 5U10HL069294 from the NHLBI and NCI; contract HHSH250201200016C with the Health Resources and Services Administration; grants N00014-15-1-0848 and N00014-16-1-2020 from the Office of Naval Research; grants from Alexion, Amgen Inc. (corporate member), Astellas Pharma US, AstraZeneca, Be the Match Foundation, Bluebird Bio Inc. (corporate member), Bristol-Myers Squibb Oncology (corporate member), Celgene Corporation (corporate member), Cellular Dynamics International Inc., Chimerix Inc. (corporate member), Fred Hutchinson Cancer Research Center, Gamida Cell Ltd., Genentech Inc., Genzyme Corporation, Gilead Sciences Inc. (corporate member), Health Research Inc.–Roswell Park Cancer Institute, HistoGenetics Inc., Incyte Corporation, Janssen Scientific Affairs LLC, Jazz Pharmaceuticals Inc. (corporate member), Jeff Gordon Children’s Foundation, Leukemia & Lymphoma Society, Medac GmbH, MedImmune, Medical College of Wisconsin, Merck & Co Inc. (corporate member), Mesoblast, MesoScale Diagnostics Inc., Miltenyi Biotec Inc. (corporate member), National Marrow Donor Program, Neovii Biotech NA Inc., Novartis Pharmaceuticals Corporation, Onyx Pharmaceuticals, Optum Healthcare Solutions Inc., Otsuka America Pharmaceutical Inc., Otsuka Pharmaceutical Co Ltd.–Japan, Patient-Centered Outcomes Research Institute, Perkin Elmer Inc., Pfizer Inc., Sanofi US (corporate member), Seattle Genetics (corporate member), Spectrum Pharmaceuticals Inc. (corporate member), St. Baldrick’s Foundation, Sunesis Pharmaceuticals Inc. (corporate member), Swedish Orphan Biovitrum Inc., Takeda Oncology, Telomere Diagnostics Inc., University of Minnesota, and Wellpoint Inc. (corporate member); and an anonymous donation to the Medical College of Wisconsin.

Opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect those of the Pelotonia Fellowship Program or The Ohio State University. The views expressed in this article do not reflect the official policy or position of the National Institutes of Health, the Department of the Navy, the Department of Defense, the Health Resources and Services Administration, or any other agency of the US government.

Contribution: H.T. analyzed data and wrote the paper; E.K., A.A.R., J.W., L. Preus, Y.W., G.B., A.W., Q.Z., and T.W. analyzed the data; C.A.H., X.S., D.V.D.B., D.S., and L. Pooler performed genotyping and quality control; S.R.S., P.L.M., and M.C.P. helped design the study; T.H. and S.P. provided feedback on approach and analyses; L.E.S.-C. designed the study, analyzed the data, and wrote the paper; and all authors read the paper.

Conflict-of-interest disclosure: S.P. has a patent on “Methods of detection of graft-versus-host disease” (US 20130115232A1, WO2013066369A3) licensed to Viracor-IBT Laboratories. The remaining authors declare no competing financial interests.

Correspondence: Lara E. Sucheston-Campbell, The Ohio State University, 496 W 12th Ave, 604 Riffe Building, Columbus, OH 43210; e-mail: sucheston-campbell.1@osu.edu.

References

References
1.
Martínez-Laperche
C
,
Buces
E
,
Aguilera-Morillo
MC
, et al;
GVHD/Immunotherapy Committee of the Spanish Group for Hematopoietic Transplantation
.
A novel predictive approach for GVHD after allogeneic SCT based on clinical variables and cytokine gene polymorphisms
.
Blood Adv
.
2018
;
2
(
14
):
1719
-
1737
.
2.
Igl
BW
,
Konig
IR
,
Ziegler
A
.
What do we mean by “replication” and “validation” in genome-wide association studies?
Hum Hered
.
2009
;
67
(
1
):
66
-
68
.
3.
Sucheston-Campbell
LE
,
Clay
A
,
McCarthy
PL
, et al
.
Identification and utilization of donor and recipient genetic variants to predict survival after HCT: are we ready for primetime?
Curr Hematol Malig Rep
.
2015
;
10
(
1
):
45
-
58
.
4.
Zhu
Q
,
Yan
L
,
Liu
Q
, et al
.
Exome chip analyses identify genes affecting mortality after HLA-matched unrelated-donor blood and marrow transplantation
.
Blood
.
2018
;
131
(
22
):
2490
-
2499
.
5.
Karaesmen
E
,
Rizvi
AA
,
Preus
LM
, et al
.
Replication and validation of genetic polymorphisms associated with survival after allogeneic blood or marrow transplant
.
Blood
.
2017
;
130
(
13
):
1585
-
1596
.
6.
Tibshirani
R
.
Regression shrinkage and selection via the Lasso
.
J R Stat Soc B
.
1996
;
58
(
1
):
267
-
288
.
7.
Machiela
MJ
,
Chanock
SJ
.
LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants
.
Bioinformatics
.
2015
;
31
(
21
):
3555
-
3557
.
8.
Aalen
OO
,
Borgan
Ø
,
Gjessing
HK
.
Survival and Event History Analysis
.
New York, NY
:
Springer
;
2008
.
9.
Peduzzi
P
,
Holford
T
,
Detre
K
,
Chan
YK
.
Comparison of the logistic and Cox regression models when outcome is determined in all patients after a fixed period of time
.
J Chronic Dis
.
1987
;
40
(
8
):
761
-
767
.
10.
Klein
J
,
Moeschberger
M.
Survival Analysis: Techniques for Censored and Truncated Data. New York, NY: Springer;
2006
.
11.
Rizvi
AA
,
Karaesmen
E
,
Morgan
M
, et al
.
gwasurvivr: an R package for genome-wide survival analysis
.
Bioinformatics
.
2019
;
35
(
11
):
1968
-
1970
.
12.
Friedman
J
,
Hastie
T
,
Tibshirani
R
. The Elements of Statistical Learning. New York, NY: Springer;
2001
.
13.
Ayers
KL
,
Cordell
HJ
.
SNP selection in genome-wide and candidate gene studies via penalized logistic regression
.
Genet Epidemiol
.
2010
;
34
(
8
):
879
-
891
.
14.
Bush
WS
,
Moore
JH
.
Chapter 11: genome-wide association studies
.
PLOS Comput Biol
.
2012
;
8
(
12
):e1002822.
15.
Lettre
G
,
Lange
C
,
Hirschhorn
JN
.
Genetic model testing and statistical power in population-based association studies of quantitative traits
.
Genet Epidemiol
.
2007
;
31
(
4
):
358
-
362
.
16.
Li
Q
,
Yu
K
,
Li
Z
,
Zheng
G
.
MAX-rank: a simple and robust genome-wide scan for case-control association studies
.
Hum Genet
.
2008
;
123
(
6
):
617
-
623
.
17.
Kappen
TH
,
Vergouwe
Y
,
van Klei
WA
,
van Wolfswinkel
L
,
Kalkman
CJ
,
Moons
KG
.
Adaptation of clinical prediction models for application in local settings
.
Med Decis Making
.
2012
;
32
(
3
):
E1
-
E10
.
18.
Lonsdale
J
,
Thomas
J
,
Salvatore
M
, et al;
GTEx Consortium
.
The Genotype-Tissue Expression (GTEx) project
.
Nat Genet
.
2013
;
45
(
6
):
580
-
585
.
19.
Võsa
U
,
Claringbould
A
,
Westra
H-J
, et al. 
Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis.
https://www.biorxiv.org/content/biorxiv/early/2018/10/19/447367.full.pdf. Accessed 1 July 2019.
20.
Staley
JR
,
Blackshaw
J
,
Kamat
MA
, et al
.
PhenoScanner: a database of human genotype-phenotype associations
.
Bioinformatics
.
2016
;
32
(
20
):
3207
-
3209
.