Graft versus host disease (GVHD) continues to be a life-threatening complication of stem cell transplantation for hematological malignancies. GVHD occurs in perfectly HLA-matched, related (MRD) transplants, and conversely can be absent in matched un-related (MUD) or haploidentical stem cell transplants (SCT). Minor histocompatibility antigens (mHA) have been investigated for their contribution to GVHD in SCT, which is mediated by differential presentation of these antigens in the recipient tissues to the donor derived T cells. Our group and others have shown that predicted mHA in bone marrow transplant donor-recipient pairs (DRP) range in the 1,000s, with MUD on the average having 2x as many mHA as MRD. However, repeated studies have shown that absolute number of mHA does not correlate with GVHD - that is more minor histocompatibility antigens do not lead to higher rates of GVHD. It is unknown if the tissue distribution of mHA in SCT differs significantly between patients, and what role this may have in the biology of GVHD. Therefore, our group investigated if the distribution of in silico derived mHA might correlate with GVHD occurrence. Whole exome sequencing (WES) of 77 HLA matched SCT DRP was performed and revealed thousands of nucleotide polymorphisms unique to the recipient when compared with the donor. When translated into peptide fragments and queried for binding to HLA-A, B, and C, in each DRP, these polymorphic peptides generated thousands of putative mHA/DRP in silico. Lists with the genes of origin for the mHA were then compiled for each DRP and sorted by the binding affinity of polymorphic recipient peptides to HLA; presented binders (IC50<500, termed "PB") and strong binders (IC50<50, termed "SB") numbered in the hundreds and were considered further.

Normal gene expression data for 13 distinct tissues was obtained from the GTEx database v6 consisting of 54,354 transcripts averaged over hundreds of samples. The R package "pSI" was utilized to generate lists of genes that were disproportionately expressed in various tissues. A "Specificity index" (SI) was calculated for each gene in specific tissues by comparing its expression level in that tissue, against expression level in all other tissues, with cutoffs given by P value (Wells et al., 2015 Nucleic Acids Research). Using these SI gene lists as reference, SB mHA in the patient data set showed significant differences in predicted tissue expression of the polymorphic genes between the 77 patients. The probabilities of mHA being presented by specific tissues was seen to follow a normal distribution across patient samples, allowing for further statistical analyses that assume normal distributions. The intersection of the mHA lists (PB or SB) and the SI lists (normal tissue specific expression) were calculated (Figure 1). 16 models were then analyzed for ability to predict GVHD, and best fit was discovered using C-statistics. The SI list at a specificity of P=0.001 together with the proportion analysis had the best C-statistic (0.831) and was used for further investigation. Univariate and multivariate analysis revealed that higher proportions, but not absolute number, of antigens expressed in colonic tissue was statistically correlated with GVHD in this pilot study (Table 1). In conclusion, this pilot study reveals that the tissue distribution of WES derived mHA are not uniform across DRP. It also suggests that differences in mHA tissue distribution may contribute to GVHD. Specifically, a heavier burden of polymorphisms in the genes differentially expressed in the colon compared to other tissues may make recipients more likely to experience GVHD.


No relevant conflicts of interest to declare.

Author notes


Asterisk with author names denotes non-ASH members.