Several recent studies have pointed towards a number of sets of CLL patients with highly similar IgV genes, both in terms of V, D, J segment and light chain use and also heavy chain third complementarity determining region (CDR3) architecture and amino acid composition. These findings support the notion that antigen may drive development or perpetuation of the leukemic cells. To effectively mine the available Ig sequence databases, we developed a novel sequence similarity and clustering algorithm that incorporates the mechanistic knowledge of IgV gene recombination. CDR3 sequences were initially aligned using the conventional blosum62 based scoring matrix, then rescored based on the junctional annotations. In this way, the method gives greater weight to sequence similarity created as a consequence of the VDJ recombination process, rather than simple germline homology. This modified scoring matrix allows the creation of more robust sequence “clusters” and is not restricted to sequences composed of the same germline genes. Preliminary analysis with this method on a collection of more than 1000 CLL IgV gene sequences has identified over 40 clusters, including all that have been previously described. Interestingly, a species abundance estimator indicates additional clusters of IgV genes in CLL remain to be discovered. Additionally, we performed a focused evaluation of over 100 VH4-34 containing sequences expressed by CLL cells of unrelated patients. The VH4-34 gene is among the most commonly used IVH genes expressed in CLL, has a curious bi-phasic distribution of mutational frequencies among different CLL cases, and can encode antibodies with specificity for the linear polylactosamine carbohydrate antigen “i” found on neonatal red blood cells. Prior studies by the Chiorazzi group identified a subgroup of patients with CLL cells that expressed remarkably similar mutated VH4-34 genes encoding isotype-switched Ig. Using the method described above, seven additional clusters of CLL cases that expressed highly related VH4-34-encoded Ig were identified, including three clusters containing sequences with a mutated VH4-34. The mutations seen within these clusters have a spectra that is distinct from other VH4-34 sequences from CLL, normal plasma cells, or marginal zone B cells. Furthermore, within each cluster there were conserved replacement mutations not commonly found in the other VH4-34 sequences. This clearly indicates specific selective pressures subsequent to IgV gene rearrangement and implies that the expressed Ig were selected for binding to several distinct antigens or epitopes. However, redundant antigen specificities encoded by different sequence clusters can not be excluded and the number of driving antigens may still be quite restricted. Comparative analysis of normal sequence collections is ongoing.