Enhancers are short DNA regions that improve transcription efficiency by recruiting transcription factors. Enhancers are associated with many life phenomena. Studies have shown that enhancers take part in the activation of oncogenes. The oncogenes in normal cells are not expressed or expressed very low in general. Thus, the identification of enhancers can provide a basis for designing targeted antitumor drugs. As enhancers are independent of their distances and orientations to the target genes, it is difficult to locate enhancers accurately. Recently, with the development of high-throughput ChiP-seq technologies, several computational methods were developed to predict enhancers. However, most of these methods rely on p300 binding sites and/or DNase I hypersensitive sites (DHSs) for selecting positive training samples, which is imprecise and subsequently leads to unsatisfactory prediction performance. Besides, in scholarly literature, there is no work that predicts enhancers from tissues across different developmental stages.
In this research, a method based on support vector machines (SVMs) to investigate enhancer prediction on cell lines and tissues from EnhancerAtlas is proposed. Instead of trying to develop completely new prediction approaches, the authors focus on the study of enhancer prediction on different cell types and tissues. They aim to examine the performance difference of enhancer prediction on multiple cell types and tissues across different developmental stages, which may shed new lights on the properties and functions of enhancers. Concretely, in their study, they selected frequently used cell types of H1, GM12878, K562 and HUVEC, tissues of heart, fetal heart, lung and fetal lung. They obtained the positive enhancer samples from EnhancerAtlas that can guarantee the quality of the training set. Then the researchers performed a feature calculation based on DNA sequence and histone modification, including 4-mer, PseKNC, GC content and histone modification. Not only the single feature, but also the combined feature set was used. After that, they applied support vector machine (SVM) to construct prediction models for cell types and tissues. Feature combinations were evaluated to see which contributes most to prediction performance, which was measured by accuracy, recall and AUC.
The results show that 1) the features and models used in this research achieve good performance on most cell lines and tissues. Especially on heart and lung tissues, the AUC values of their method reach 0.9690 and 0.9543 respectively. 2) The method exhibits much better performance on tissues than on cell types. 3) For the same tissue, it is easier to predict enhancers in the adult stage than in the fetal stage, which provides new biological finding for enhancer prediction, and understanding the properties and functions of enhancers. We suppose that the characteristics of enhancers in developed tissues are more obvious, which needs to be verified by biologists.
For more information, please visit: http://www.