Abstract

c-MYC is an important proto-oncogene. Its actions are mediated by sequence specific binding of the c-MYC protein to genomic DNA. While many c-MYC recognition sites can be identified in c-MYC responsive genes, many others are associated with genes showing no c-MYC response. It is not yet known how the cell determines which of the many c-MYC recognition sites are biologically active and directly bind c-MYC protein to regulate gene expression. We have developed a computational model that predict c-MYC binding and functional activation as distinct processes. Our model integrates four types of evidence to predict functional c-MYC targets: genomic sequence, MYC binding, gene expression and gene function annotations. First, a Bayesian network classifier is used to predict c-MYC recognition sites likely to exhibit high occupancy binding in chromatin immunoprecipitation studies using several types of sequence information, including predicted DNA methylation using a computational model to estimate the likelihood of genomic DNA methylation. In the second step, the DNA binding probability of MYC is combined with the gene expression information from 9 independent microarray datasets in multiple tissues and the gene function annotations in Gene Ontology to predict the c-MYC targets. The prediction results were compared with the c-MYC targets in public MYC target database [www.myccancergene.org], which collected the c-MYC targets identified in biomedical literatures. In total, we predicted 599 likely c-MYC genes on human genome, of which 73 have been reported to be both bound and regulated by MYC, 83 are bound by MYC in vivo and another 93 are MYC regulated. The approach thus successfully identified many known c-MYC targets as well as suggesting many novel sites including many sites that are remote from the transcription start site. Our findings suggest that to identify c-MYC genomic targets, any study based on single high throughput dataset is likely to be insufficient. Using multiple gene expression datasets helps to improve the sensitivity and integration of different data sources helps to improve the specificity.

Summary of c-MYC Targets Prediction

Microarray DatasetData Source (Citation)TissuePredicted TargetsBinding&Regulation ReportedOnly Binding ReportedOnly Regulation Reported
PMID: 15778709 B Cell 421 61 60 56 
PMID: 12086878 Prostate Cancer 428 56 65 76 
PMID: 14722351 Prostate Cancer 50 13 
PMID: 15254046 Prostate Cancer 66 19 14 
PMID: 12747878 Breast Cancer 17 
PMID: 11707567 Lung Cancer 295 51 42 59 
PMID: 15820940 CML 
PMID: 12704389 ALL 222 45 32 46 
PMID: 11731795 ALL / MLL / AML 22 
Total   599 73 83 93 
Microarray DatasetData Source (Citation)TissuePredicted TargetsBinding&Regulation ReportedOnly Binding ReportedOnly Regulation Reported
PMID: 15778709 B Cell 421 61 60 56 
PMID: 12086878 Prostate Cancer 428 56 65 76 
PMID: 14722351 Prostate Cancer 50 13 
PMID: 15254046 Prostate Cancer 66 19 14 
PMID: 12747878 Breast Cancer 17 
PMID: 11707567 Lung Cancer 295 51 42 59 
PMID: 15820940 CML 
PMID: 12704389 ALL 222 45 32 46 
PMID: 11731795 ALL / MLL / AML 22 
Total   599 73 83 93 

Disclosures: This work was supported in part by grants R01 LM008106, R01 CA85368, U54 DA021519 from NIH.

Author notes

*

Corresponding author