Abstract
The DNase I hypersensitive sites (DHSs) are active regions on chromatin that have been found to be highly sensitive to DNase I. These regions contain various cis-regulatory elements, including promoters, enhancers and silencers. Accurate identification of DHSs helps researchers better understand the transcriptional machinery of DNA and deepen the knowledge of functional DNA elements in non-coding sequences. Researchers have developed many methods based on traditional experiments and machine learning to identify DHSs. However, low prediction accuracy and robustness limit their application in genetics research. In this paper, a novel computational approach based on deep learning is proposed by feature fusion and local–global feature extraction network to identify DHSs in mouse, named iDHS-FFLG. First of all, multiple binary features of nucleotides are fused to better express sequence information. Then, a network consisting of the convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM) and self-attention mechanism is designed to extract local features and global contextual associations. In the end, the prediction module is applied to distinguish between DHSs and non-DHSs. The results of several experiments demonstrate the superior performances of iDHS-FFLG compared to the latest methods.
Similar content being viewed by others
Data Availability
The source code and datasets for this study are accessible at https://github.com/zhlSunLab/iDHS-FFLG.
References
Jiang J (2015) The dark matter in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin. Curr Opin Plant Biol 24:17–23. https://doi.org/10.1016/j.pbi.2015.01.005
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al (2012) The accessible chromatin landscape of the human genome. Nature 489(7414):75–82. https://doi.org/10.1038/nature11232
Wittkopp PJ, Kalay G (2012) \(Cis\)-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet 13(1):59–69. https://doi.org/10.1038/nrg3095
Koohy H, Down TA, Hubbard TJ (2013) Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One 8(7):69853. https://doi.org/10.1371/journal.pone.0069853
Wu C, Bingham PM, Livak KJ, Holmgren R, Elgin SC (1979) The chromatin structure of specific genes: I. Evidence for higher order domains of defined dna sequence. Cell 16(4):797–806. https://doi.org/10.1016/0092-8674(79)90095-3
Felsenfeld G, Groudine M (2003) Controlling the double helix. Nature 421(6921):448–453. https://doi.org/10.1038/nature01411
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099):1190–1195. https://doi.org/10.1126/science.1222794
Weghorn D, Coulet F, Olson KM, DeBoever C, Drees F, Arias A, Alakus H, Richardson AL, Schwab RB, Farley EK et al (2017) Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer. Nat Commun 8(1):1–16. https://doi.org/10.1038/s41467-017-00100-x
Carrasquillo MM, Allen M, Burgess JD, Wang X, Strickland SL, Aryal S, Siuda J, Kachadoorian ML, Medway C, Younkin CS et al (2017) A candidate regulatory variant at the TREM gene cluster associates with decreased Alzheimer’s disease risk and increased TREML1 and TREM2 brain gene expression. Alzheimers Dement 13(6):663–673. https://doi.org/10.1016/j.jalz.2016.10.005
Bis JC, Jian X, Kunkle BW, Chen Y, Hamilton-Nelson KL, Bush WS, Salerno WJ, Lancour D, Ma Y, Renton AE et al (2020) Whole exome sequencing study identifies novel rare and common Alzheimer’s-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatr 25(8):1859–1875. https://doi.org/10.1038/s41380-018-0112-7
Meuleman W, Muratov A, Rynes E, Halow J, Lee K, Bates D, Diegel M, Dunn D, Neri F, Teodosiadis A et al (2020) Index and biological spectrum of human DNase I hypersensitive sites. Nature 584(7820):244–251. https://doi.org/10.1038/s41586-020-2559-3
Guénet JL (2005) The mouse genome. Genome Res 15(12):1729–1740. https://doi.org/10.1101/gr.3728305
Rohdewohld H, Weiher H, Reik W, Jaenisch R, Breindl M (1987) Retrovirus integration and chromatin structure: Moloney murine leukemia proviral integration sites map near dnase i-hypersensitive sites. J Virol 61(2):336–343. https://doi.org/10.1128/jvi.61.2.336-343.1987
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD et al (2014) A comparative encyclopedia of dna elements in the mouse genome. Nature 515(7527):355–364. https://doi.org/10.1038/nature13992
Sylvie N, Hardouin AN (2000) Mouse models for human disease. Clin Genet 57(4):237–244. https://doi.org/10.1034/j.1399-0004.2000.570401.x
Calkins K, Devaskar SU (2011) Fetal origins of adult disease. Curr Probl Pediatr Adolesc Health Care 41(6):158–176. https://doi.org/10.1016/j.cppeds.2011.01.001
Breeze CE, Lazar J, Mercer T, Halow J, Washington I, Lee K, Ibarrientos S, Castillo A, Neri F, Haugen E, et al (2020) Atlas and developmental dynamics of mouse DNase I hypersensitive sites. https://doi.org/10.1101/2020.06.26.172718
Wilken MS, Brzezinski JA, La Torre A, Siebenthall K, Thurman R, Sabo P, Sandstrom RS, Vierstra J, Canfield TK, Hansen RS et al (2015) DNase I hypersensitivity analysis of the mouse brain and retina identifies region-specific regulatory elements. Epigenet Chromatin 8(1):1–17. https://doi.org/10.1186/1756-8935-8-8
Noble WS, Kuehn S, Thurman R, Yu M, Stamatoyannopoulos J (2005) Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21(suppl_1):338–343. https://doi.org/10.1093/bioinformatics/bti1047
Feng P, Jiang N, Liu N (2014) Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. Sci World J 2014:740506. https://doi.org/10.1155/2014/740506
Liu B, Long R, Chou KC (2016) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418. https://doi.org/10.1093/bioinformatics/btw186
Xu ZC, Jiang SY, Qiu WR, Liu YC, Xiao X (2017) iDHSs-PseTNC: identifying DNase I hypersensitive sites with pseuo trinucleotide component by deep sparse auto-encoder. Lett Org Chem 14(9):655–664. https://doi.org/10.2174/1570178614666170213102455
Manavalan B, Shin TH, Lee G (2018) DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget 9(2):1944. https://doi.org/10.18632/oncotarget.23099
Liang Y, Zhang S (2019) iDHS-DMCAC: identifying DNase I hypersensitive sites with balanced dinucleotide-based detrending moving-average cross-correlation coefficient. SAR QSAR Environ Res 30(6):429–445. https://doi.org/10.1080/1062936X.2019.1615546
Liang Y, Zhang S (2019) Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou’s 5-steps rule. Biophys Chem 253:106227. https://doi.org/10.1016/j.bpc.2019.106227
Zhang S, Yu Q, He H, Zhu F, Wu P, Gu L, Jiang S (2020) iDHS-DSAMS: identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree. Genomics 112(2):1282–1289. https://doi.org/10.1016/j.ygeno.2019.07.017
Zhang S, Xue T (2020) Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting. Mol Genet Genom 295(6):1431–1442. https://doi.org/10.1007/s00438-020-01711-8
Su W, Wang F, Tan JX, Dao FY, Yang H, Ding H (2021) The prediction of human DNase I hypersensitive sites based on DNA sequence information. Chemometr Intell Lab 209:104223. https://doi.org/10.1016/j.chemolab.2020.104223
Zou H, Yang F, Yin Z (2022) iDHS-DT: Identifying DNase I hypersensitive sites by integrating DNA dinucleotide and trinucleotide information. Biophys Chem 281:106717. https://doi.org/10.1016/j.bpc.2021.106717
Zhang S, Zhou Z, Chen X, Hu Y, Yang L (2017) pDHS-SVM: a prediction method for plant DNase I hypersensitive sites based on support vector machine. J Theor Biol 426:126–133. https://doi.org/10.1016/j.jtbi.2017.05.030
Zhang S, Zhuang W, Xu Z (2018) Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Anal Biochem 549:149–156. https://doi.org/10.1016/j.ab.2018.03.025
Zhang S, Chang M, Zhou Z, Dai X, Xu Z (2018) pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines. Mol Genet Genom 293(4):1035–1049. https://doi.org/10.1007/s00438-018-1436-3
Zhang S, Lin J, Su L, Zhou Z (2019) pDHS-DSET: prediction of DNase I hypersensitive sites in plant genome using DS evidence theory. Anal Biochem 564:54–63. https://doi.org/10.1016/j.ab.2018.10.018
Dao FY, Lv H, Su W, Sun ZJ, Huang QL, Lin H (2021) iDHS-deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network. Brief Bioinform 22(5):047. https://doi.org/10.1093/bib/bbab047
Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16(1):123–131. https://doi.org/10.1101/gr.4074106
Chen Y, Chen A (2019) Unveiling the gene regulatory landscape in diseases through the identification of DNase I-hypersensitive sites. Biomed Rep 11(3):87–97. https://doi.org/10.3892/br.2019.1233
Song L, Crawford GE (2010) DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc 2:5384. https://doi.org/10.1101/pdb.prot5384
Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R et al (2020) Expanded encyclopaedias of dna elements in the human and mouse genomes. Nature 583(7818):699–710. https://doi.org/10.1038/s41586-020-2493-4
Rodríguez P, Bautista MA, Gonzàlez J, Escalera S (2018) Beyond one-hot encoding: lower dimensional target embedding. Image Vision Comput 75:21–31. https://doi.org/10.1016/j.imavis.2018.04.004
Liu Q, Chen J, Wang Y, Li S, Jia C, Song J, Li F (2021) DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites. Brief Bioinform 22(3):124. https://doi.org/10.1093/bib/bbaa124
Zhang Q, Zhu L, Huang DS (2018) High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans Comput Biol Bioinform 16(4):1184–1192. https://doi.org/10.1109/TCBB.2018.2819660
Li H, Tian S, Li Y, Fang Q, Tan R, Pan Y, Huang C, Xu Y, Gao X (2020) Modern deep learning in bioinformatics. J Mol Cell Biol 12(11):823–827. https://doi.org/10.1093/jmcb/mjaa030
Han K, Shen LC, Zhu YH, Xu J, Song J, Yu DJ (2022) MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network. Brief Bioinform 23(1):445. https://doi.org/10.1093/bib/bbab445
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. https://doi.org/10.1093/bioinformatics/btl158
Zhang SW, Zhang XX, Fan XN, Li WN (2020) LPI-CNNCP: prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick. Anal Biochem 601:113767. https://doi.org/10.1016/j.ab.2020.113767
Zhang Y, Liu Y, Xu J, Wang X, Peng X, Song J, Yu DJ (2021) Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites. Brief Bioinform 22(6):351. https://doi.org/10.1093/bib/bbab351
Wekesa JS, Meng J, Luan Y (2020) Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 112(5):2928–2936. https://doi.org/10.1016/j.ygeno.2020.05.005
Wang Z, Lei X (2021) Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture. Brief Bioinform 22(6):342. https://doi.org/10.1093/bib/bbab342
Valueva MV, Nagornov N, Lyakhov PA, Valuev GV, Chervyakov NI (2020) Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math Comput Simul 177:232–243. https://doi.org/10.1016/j.matcom.2020.04.031
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:abs/1406.1078. https://doi.org/10.48550/arXiv.1406.1078
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Shi Q, Tang X, Yang T, Liu R, Zhang L (2021) Hyperspectral image denoising using a 3-D attention denoising network. IEEE T Geosci Remote 59(12):10348–10363. https://doi.org/10.1109/TGRS.2020.3045273
Miao X, McLoughlin I, Wang W, Zhang P (2021) D-mona: a dilated mixed-order non-local attention network for speaker and language recognition. Neural Netw 139:201–211. https://doi.org/10.1016/j.neunet.2021.03.014
Yang Y, Hou Z, Ma Z, Li X, Wong KC (2021) iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinf 22(4):274. https://doi.org/10.1093/bib/bbaa274
Mahmoud MA, Guo P (2021) DNA sequence classification based on MLP with PILAE algorithm. Soft Comput 25(5):4003–4014. https://doi.org/10.1007/s00500-020-05429-y
Zhang P, Wei Z, Che C, Jin B (2022) DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug–Target interaction prediction. Comput Biol Med:105214. https://doi.org/10.1016/j.compbiomed.2022.105214
Agarap AF (2018) Deep learning using rectified linear units (relu). arxiv:abs/1803.08375. https://doi.org/10.48550/arXiv.1803.08375
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B Methodol 36(2):111–133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x
Rahman CR, Amin R, Shatabda S, Toaha M, Islam S (2021) A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome. Sci Rep 11(1):1–13. https://doi.org/10.1038/s41598-021-89850-9
Yu X, Jiang L, Jin S, Zeng X, Liu X (2022) preMLI: a pre-trained method to uncover microRNA-lncRNA potential interactions. Brief Bioinf 23(1):470. https://doi.org/10.1093/bib/bbab470
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–605
Funding
This work was supported by a National Natural Science Foundation of China (No.61972002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, LS., Sun, ZL. iDHS-FFLG: Identifying DNase I Hypersensitive Sites by Feature Fusion and Local–Global Feature Extraction Network. Interdiscip Sci Comput Life Sci 15, 155–170 (2023). https://doi.org/10.1007/s12539-022-00538-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-022-00538-8