Abstract

Molecular profiling strategies generally apply either transcriptomic or proteomic technologies for dissection of aberrant expression patterns between normal and diseased samples. Although studies in yeast (S. cerevisiae) demonstrate excellent concordance between gene/protein abundance, the lack of integrated comparisons using human systems prompted us to stringently dissect these relationships in a cellular model system (platelets) devoid of active transcriptional activity. Highly-purified platelets from four healthy donors were pooled for proteomic analyses which were completed in duplicate using tryptic digestion of 100 mg cytoplasmic fractions, followed by cation exchange liquid chromatography coupled to tandem mass spectrometry (μLC-MS/MS). Spectral (peptide) counts were used as a semi-quantitative means of establishing protein abundance among normalized MS data sets, with excellent concordance between platelet runs (Spearman rank correlation r = 0.87, p <0.0001). Microarray data derived from five platelet apheresis donors were hybridized to the Affymetrix HU133a gene chip, and relative transcript abundance established by rank-ordering the unique, normalized set of non-redundant mRNAs (N=1240). Two comprehensive bioinformatics approaches using either platelet-restricted or genome-wide relational databases were additive in identifying 690 unique platelet proteins, representing increase of 230% or 131% to either approach alone. Of the identified proteins, 72% had a corresponding mRNA transcript, although a smaller fraction of mRNAs (41%) had a corresponding protein. The calculated codon adaptation index (CAI) for the 156 highest- and the 156 lowest-expressed platelet transcripts predicted strong correlation to protein abundance. Spearman correlations on data rank-ordered by either protein or transcript abundance demonstrated a maximal correlation coefficient of 0.44 (p =0.034) for an 18-member protein subset (higher protein abundance), although the 20 most abundant platelet transcripts had an overall correlation that was considerably higher (r =0.84, p <0.0001). Thus, the most highly expressed platelet transcripts are powerful predictors for the corresponding protein abundance. More detailed gene/protein abundance comparisons were completed from a uniquely-created relational database containing quantitative profiles specifically linked to RefSeq accession numbers. While the range of molecular weights (5.1–280 kDa) and predicted tryptic fragments (9–326) for these gene/protein pairs were quite broad, there was a direct linear correlation using in silico mathematical models (Pearson correlation r =0.96, p <0.0001). Interestingly, both Pearson and Spearman correlation coefficients were low in the absence of tryptic correction, although they became statistically-significant after tryptic normalization (Pearson correlation r = 0.31, p = 0.019; Spearman correlation r = 0.27, p = 0.0014). In summary, these data establish that

  1. an integrated analytical platform incorporating both transcriptomic and proteomic databases considerably enhances efficiency of platelet protein detection, and

  2. when analyzed using highly-rigorous and novel methods that normalize for tryptic fragment number, there is significant correlation between gene/protein abundance, unrelated to ongoing transcriptional activity.

Author notes

Disclosure: No relevant conflicts of interest to declare.