Application of high-throughput DNA sequencing to the analysis of B- and T-lymphocyte antigen receptors has great potential for improving the monitoring of lymphoid malignancies, assessing immune reconstitution after hematopoietic cell transplantation, and characterizing the composition of lymphocyte repertoires. Current technology can define the number and frequency of immunoglobulin heavy, T-cell receptor (TCR)α, TCRβ, or TCRγ chains expressed in a population of lymphocytes; techniques for determining the number of antigen receptor heterodimers, such as TCRαβ pairs, expressed in the population are under development.
B and T lymphocytes are unique among somatic cells because much of their fundamental biology is determined by DNA sequences that are not encoded in the germline. The genes that encode B- and T-cell antigen receptors undergo rearrangement during lymphocyte development in a process that concatenates noncontiguous variable (V), diversity (D), and joining (J) gene segments to assemble sequences encoding functional receptors whose antigenic specificity is determined by 3 complementarity-determining regions (CDRs) (Figure 1; supplemental movies 1A and 1B).1-3 The V gene segments encode the CDR1 and CDR2 of both BCRs and TCRs, whereas the V, D, and J segments collectively encode CDR3, which is the most critical determinant of antigenic specificity. Extremely diverse repertoires of B- and T-cell antigen receptors are created by exploiting both combinatorial diversity, made possible by the existence of many distinct V, D, and J gene segments in the 4 T-cell and 3 B-cell antigen receptor loci, as well as junctional diversity. The latter is created by the unique molecular mechanism that mediates V-D-J rearrangement and involves template-independent insertion and deletion of nucleotides at the V-D, D-J, and V-J splice junctions. The diversity of the BCR repertoire is further increased by somatic hypermutation (SHM) of previously rearranged BCR genes during affinity maturation after initial antigen encounter. SHM is not limited to the V-D-J region that encodes the CDR3 and occurs throughout the V gene segment, potentially affecting the sequence of CDR1 and CDR2, as well as the intervening framework regions.
The antigen receptor repertoires created by the processes of V-D-J rearrangement and SHM are far too diverse to be amenable to comprehensive exploration and definition by conventional capillary-based DNA sequencing. Over the past 5 years, high-throughput DNA sequencing has been applied to the analysis of BCR4-7 and TCR8-11 CDR3 region sequence repertoires, enabling analysis of the repertoires realized in any given individual with unprecedented depth, resolution, and accuracy. The CDR3 region in the vast majority of successfully rearranged TCRβ chains and immunoglobulin heavy (IGH) chains comprises no more than 668,9 or 907 bp, encoding no more than 22 or 30 amino acids, respectively. This region is therefore ideally suited to comprehensive definition by current sequencing platforms that can generate ∼1 × 109 reads of the requisite length in a single run. Although these platforms cannot yet capture the entire V gene segment in addition to the V-D-J region, which limits their utility for analysis of SHM in BCR genes, SHM can be studied using other platforms that provide longer read lengths, albeit with less depth per sequencing run (Figure 2).5,12 Both genomic DNA and complementary DNA from lymphocytes have been used as the template for sequencing, but genomic DNA is preferred for studies in which the relative frequency of specific BCR or TCR sequences is to be assessed. Molecular and computational strategies based on high-throughput DNA sequencing have made it possible to explore BCR and TCR sequence repertoires of any degree of complexity, and thereby to address biological questions that were never before amenable to direct experimental analysis.
Defining the characteristics of B- and T-cell repertoires
High-throughput sequencing of rearranged BCR and TCR genes in lymphocytes from peripheral blood has begun to define the essential characteristics of the B- and T-cell repertoires in healthy adults. Sequencing of the IGH locus in naive B cells from 4 individuals revealed an average of ∼86 000 unique CDR3 sequences per person.7 Up to 650 000 unique TCRβ CDR3 sequences were observed in naive CD8+ T cells from single individuals,10 and exhaustive TCRβ sequencing of pooled naive and memory T cells from a single healthy adult revealed >1 × 106 unique sequences.13 Statistical methods developed for estimating total species diversity in ecological and microbial communities14,15 have been used to estimate total CDR3 sequence diversity, including the sequences that are directly observed as well as those that remain “unseen” due to the fact that such diverse repertoires cannot be sampled completely. Such efforts have suggested that the naive and memory CD8+ and CD4+ TCRβ chain CDR3 repertoires in healthy adults collectively contain at least 3 × 106 to 4 × 106 unique sequences,9 with at least as many unique TCRα chain CDR3 sequences.11 The IGH CDR3 repertoire is estimated to contain at least 106 distinct sequences.5 The IGH and TCRβ repertoires are both characterized by an extremely large number of CDR3 sequences that are present at low frequencies.5,8-10 It has therefore proven difficult, despite the application of methods for estimating unseen species diversity, to determine the true diversity in individual BCR and TCR repertoires with great accuracy, and the published estimates are best viewed as lower bounds for repertoire diversity.
Although most of the deep-sequencing studies published to date have focused on the TRB and IGH loci, the loci that encode the α chains of αβ T cells and the γ chains of γδ T cells have also been explored. Analysis of TCRα CDR3 sequences in several subsets of αβ T cells from a single individual revealed 1.2- to 2.4-fold more unique TCRα than TCRβ CDR3 sequences in each of the subsets studied.11 Sequencing of the TRG locus has provided valuable insight into the structure of the γδ T-cell repertoire. Although the theoretical diversity of γδ TCRs rivals that of αβ TCRs, the antigen receptors expressed by peripheral blood γδ T cells demonstrate very limited TCR CDR3 sequence diversity. Indeed, deep sequencing of the rearranged TRG genes expressed in peripheral blood γδ T cells from 3 healthy adults revealed that 45% of the TCRγ CDR3 sequences from the 3 individuals were identical to a previously described sequence found in a “public” γδ TCR, found in most individuals, that is specifically reactive with nonpeptide prenyl pyrophosphate antigens.16,19
Deep sequencing of the TRG locus in αβ and γδ T cells from peripheral blood of healthy adults has also provided valuable insight into the process by which T cells are committed to the αβ or γδ lineages during thymic development. The vast majority of both αβ and γδ T cells in peripheral blood contain rearranged TRG genes, suggesting that rearrangement of the TRG locus occurs before αβ/γδ lineage commitment.17-19 In contrast, only a very small fraction (<4%) of γδ T cells appear to have rearranged TRB loci, suggesting that rearrangement of the TRB locus only occurs in T cells that have committed to the αβ lineage.19
Comparison of the TCR repertoires carried in specific T-cell subsets within single individuals is the focus of several current studies and has great potential for defining the relationships between subsets as well as illuminating the developmental pathways that lead to each subset. Analysis of TCRα and TCRβ diversity in the CD4+ regulatory T-cell (Treg) compartment in a single individual, for example, demonstrated higher TCR diversity in this subset than in any other major subset.11 This observation is consistent with the results of previous studies of both murine20 and human21 Tregs, performed before the advent of high-throughput sequencing, which demonstrated that CD4+ Treg TCR diversity is at least comparable to, if not in fact greater than, that observed in the conventional CD4+ compartment.
Reconstitution of B- and T-cell repertoires after HCT
Recent TCRβ and IGH sequencing studies have characterized reconstitution of the αβ T-cell and B-cell compartments in recipients of allogeneic hematopoietic cell transplantation (HCT). One study of αβ T-cell reconstitution in 10 allogeneic HCT recipients, in which a uniform sequencing depth of >1 × 106 reads per sample was achieved, demonstrated reduced TCRβ complexity at 100 days posttransplant, with modest improvement noted by 1 year posttransplant.22 A different study of 28 allogeneic HCT recipients, in which the sequencing depth was nonuniform and 50- to 200-fold lower (<1.5 × 104 reads per sample), evaluated reconstitution of TCRβ diversity in the CD8+ and CD4+ compartments of cord blood and T cell–depleted (TCD) HCT recipients.23 At 6 months posttransplant, the cord blood recipients demonstrated CD8+ and CD4+ TCRβ diversity comparable to healthy adults, but the TCD recipients had >10-fold lower diversity; by 1 year posttransplant, the CD4+ but not the CD8+ TCRβ diversity in the TCD recipients had improved. Serial sequencing of the IGH locus with a uniform number of 15 000 reads per sample in 6 CLL patients after allogeneic HCT revealed that reconstitution of IGH diversity to levels observed in the HCT donor typically required more than 1 year.12 Although repertoire diversity was a primary end point in all 3 of these studies, they did not address the important issues of sampling depth, the existence of “unseen” CDR3 sequence diversity, or possible confounding by intersample variation in absolute T- or B-lymphocyte count. Nonetheless, these 3 pilot studies have provided valuable initial data on the reconstitution of lymphocyte repertoire diversity after allogeneic HCT and have laid the foundation for more comprehensive future studies.
Tracking malignant or therapeutically administered lymphocytes
High-throughput sequencing can be exploited to detect the rearranged CDR3 sequences carried in malignant B and T cells with unprecedented sensitivity and specificity. IGH4,5,12,24-27 and TCRβ/γ28 sequencing is already being used to identify the malignant clone or clones in patients with B- and T-lymphoid malignancies, respectively, at the time of diagnosis, and to track them during and after therapy. Deep sequencing of antigen-receptor loci has multiple advantages when compared with multiparameter flow cytometry or custom-designed patient- and clone-specific polymerase chain reaction (PCR) assays for the monitoring of lymphoid malignancies. Deep sequencing demands less time and labor, has superior sensitivity,25,26,28 and can simultaneously track all of the clones that comprise the malignant population. Its utility for monitoring disease burden in patients with chronic lymphocytic leukemia,4,5,12,24 pediatric B-lineage acute lymphoblastic leukemia (ALL),25,26 and T-lineage ALL28 has been compellingly demonstrated, and it will undoubtedly have utility for monitoring other lymphoid malignancies. Serial sequencing of the IGH locus in pediatric B-ALL has revealed surprisingly dynamic evolution of the locus in some patients,25 demonstrating that this technology may also generate valuable insights into the biology of B- and T-cell cancers.
Deep sequencing of antigen receptor genes has numerous other potential applications. It will likely become the technique of choice for tracking the fate of therapeutically administered lymphocytes, as its sensitivity, precision, and capacity for monitoring multiple clones simultaneously are far superior to that which can be achieved with clone-specific PCR or tetramer-based assays. Comprehensive profiling of specific lymphocyte populations, such as tumor-infiltrating lymphocytes or lymphocytic infiltrates associated with infections, graft rejection, or autoimmune disease, is also readily achieved with deep sequencing.
Limitations of antigen receptor sequencing
Current strategies for analysis of BCR and TCR repertoires using high-throughput sequencing have notable limitations. All protocols use multiplex PCR to amplify the rearranged variable regions of antigen receptor loci to provide sufficient template DNA for sequencing, which introduces bias due to more efficient amplification of some templates compared with others.9 Such bias can be addressed molecularly and computationally if its magnitude can be accurately determined. Analysis and interpretation of antigen receptor sequence data also poses a number of lingering computational challenges, one of the most difficult of which is inferring the most likely sequence of molecular events that produced each antigen receptor sequence. Addressing these challenges has provided the rationale for the nascent field of computational immunology.
Conclusions and future research directions
Application of high-throughput sequencing to the analysis of B- and T-cell antigen receptor repertoires is elucidating the fundamental structure of lymphocyte repertoires and seems poised to transform the diagnosis and monitoring of B- and T-lymphoid malignancies. The scientific and clinical utility of this innovative technology will strongly depend on the accuracy, precision, and reproducibility with which it can define both the number of unique antigen receptor sequences in a given clinical specimen and their relative frequency. Future studies that aim to compare the relative frequency of specific BCR or TCR sequences in 2 or more samples must take into account the many potentially confounding factors that could influence such comparisons, prominent among which are the proportion of lymphocyte DNA in the samples, the sequencing depth, and the underlying clonal structure of the samples. Studies that aim to determine the diversity of antigen receptors expressed in lymphocyte populations should explicitly take into account the existence of “unseen” diversity and the possibility that its magnitude may differ significantly from sample to sample.
The antigen receptors in B cells, αβ T cells, and γδ T cells comprise heterodimers whose 2 constituent chains are generated by independent rearrangement events at distinct chromosomal loci. The true diversity of BCRs or TCRs expressed in an individual is thus defined by the number of distinct pairs of chains that occur, which is likely to be higher than the simple number of distinct IGH or TCRβ monomers that are generated. The development of molecular and computational strategies for estimating the number and frequency of distinct heavy/light, αβ, and γδ pairs in the B- and T-cell repertoires is therefore an important research priority. Techniques for simultaneous sequencing of the variable regions of the heavy/light or αβ gene pairs expressed in single lymphocytes have been developed,29,30 and extension of these techniques to the analysis of millions of lymphocytes is likely achievable.
Future applications of deep sequencing to antigen-receptor repertoires will hopefully address a broad range of critical questions in hematology and immunology. Perhaps most importantly, are specific characteristics or dimensions of the B- or T-cell repertoires significantly correlated with clinical immunity and pathogen resistance? Can one or more of these characteristics serve as a reliable surrogate for immune reconstitution after HCT? Can serial analysis of the B- or T-cell repertoire in an individual reliably identify changes that, in turn, are associated with specific clinical events or interventions, such as infections or immunizations? The current pace of research in this field suggests that answers to these questions will soon be at hand.
The online version of this article contains a data supplement.
This work was supported by grants from the National Institutes of Health (CA015704-37, AI033484, and HL110907).
Contribution: E.H.W. wrote the manuscript; and E.H.W., F.A.M., and J.C. read and revised the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Edus H. Warren, Program in Immunology, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, D3-100, P.O. Box 19024, Seattle, WA 98109-1024; e-mail: firstname.lastname@example.org.