Skip to main content

Advertisement

Log in

iDHS-FFLG: Identifying DNase I Hypersensitive Sites by Feature Fusion and Local–Global Feature Extraction Network

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

The DNase I hypersensitive sites (DHSs) are active regions on chromatin that have been found to be highly sensitive to DNase I. These regions contain various cis-regulatory elements, including promoters, enhancers and silencers. Accurate identification of DHSs helps researchers better understand the transcriptional machinery of DNA and deepen the knowledge of functional DNA elements in non-coding sequences. Researchers have developed many methods based on traditional experiments and machine learning to identify DHSs. However, low prediction accuracy and robustness limit their application in genetics research. In this paper, a novel computational approach based on deep learning is proposed by feature fusion and local–global feature extraction network to identify DHSs in mouse, named iDHS-FFLG. First of all, multiple binary features of nucleotides are fused to better express sequence information. Then, a network consisting of the convolutional neural network (CNN), bidirectional long short-term memory (BiLSTM) and self-attention mechanism is designed to extract local features and global contextual associations. In the end, the prediction module is applied to distinguish between DHSs and non-DHSs. The results of several experiments demonstrate the superior performances of iDHS-FFLG compared to the latest methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The source code and datasets for this study are accessible at https://github.com/zhlSunLab/iDHS-FFLG.

References

  1. Jiang J (2015) The dark matter in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin. Curr Opin Plant Biol 24:17–23. https://doi.org/10.1016/j.pbi.2015.01.005

    Article  CAS  PubMed  Google Scholar 

  2. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B et al (2012) The accessible chromatin landscape of the human genome. Nature 489(7414):75–82. https://doi.org/10.1038/nature11232

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Wittkopp PJ, Kalay G (2012) \(Cis\)-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet 13(1):59–69. https://doi.org/10.1038/nrg3095

    Article  CAS  Google Scholar 

  4. Koohy H, Down TA, Hubbard TJ (2013) Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One 8(7):69853. https://doi.org/10.1371/journal.pone.0069853

    Article  CAS  Google Scholar 

  5. Wu C, Bingham PM, Livak KJ, Holmgren R, Elgin SC (1979) The chromatin structure of specific genes: I. Evidence for higher order domains of defined dna sequence. Cell 16(4):797–806. https://doi.org/10.1016/0092-8674(79)90095-3

    Article  CAS  PubMed  Google Scholar 

  6. Felsenfeld G, Groudine M (2003) Controlling the double helix. Nature 421(6921):448–453. https://doi.org/10.1038/nature01411

    Article  CAS  PubMed  Google Scholar 

  7. Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099):1190–1195. https://doi.org/10.1126/science.1222794

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Weghorn D, Coulet F, Olson KM, DeBoever C, Drees F, Arias A, Alakus H, Richardson AL, Schwab RB, Farley EK et al (2017) Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer. Nat Commun 8(1):1–16. https://doi.org/10.1038/s41467-017-00100-x

    Article  CAS  Google Scholar 

  9. Carrasquillo MM, Allen M, Burgess JD, Wang X, Strickland SL, Aryal S, Siuda J, Kachadoorian ML, Medway C, Younkin CS et al (2017) A candidate regulatory variant at the TREM gene cluster associates with decreased Alzheimer’s disease risk and increased TREML1 and TREM2 brain gene expression. Alzheimers Dement 13(6):663–673. https://doi.org/10.1016/j.jalz.2016.10.005

    Article  PubMed  Google Scholar 

  10. Bis JC, Jian X, Kunkle BW, Chen Y, Hamilton-Nelson KL, Bush WS, Salerno WJ, Lancour D, Ma Y, Renton AE et al (2020) Whole exome sequencing study identifies novel rare and common Alzheimer’s-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatr 25(8):1859–1875. https://doi.org/10.1038/s41380-018-0112-7

    Article  CAS  Google Scholar 

  11. Meuleman W, Muratov A, Rynes E, Halow J, Lee K, Bates D, Diegel M, Dunn D, Neri F, Teodosiadis A et al (2020) Index and biological spectrum of human DNase I hypersensitive sites. Nature 584(7820):244–251. https://doi.org/10.1038/s41586-020-2559-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Guénet JL (2005) The mouse genome. Genome Res 15(12):1729–1740. https://doi.org/10.1101/gr.3728305

    Article  CAS  PubMed  Google Scholar 

  13. Rohdewohld H, Weiher H, Reik W, Jaenisch R, Breindl M (1987) Retrovirus integration and chromatin structure: Moloney murine leukemia proviral integration sites map near dnase i-hypersensitive sites. J Virol 61(2):336–343. https://doi.org/10.1128/jvi.61.2.336-343.1987

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD et al (2014) A comparative encyclopedia of dna elements in the mouse genome. Nature 515(7527):355–364. https://doi.org/10.1038/nature13992

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sylvie N, Hardouin AN (2000) Mouse models for human disease. Clin Genet 57(4):237–244. https://doi.org/10.1034/j.1399-0004.2000.570401.x

    Article  Google Scholar 

  16. Calkins K, Devaskar SU (2011) Fetal origins of adult disease. Curr Probl Pediatr Adolesc Health Care 41(6):158–176. https://doi.org/10.1016/j.cppeds.2011.01.001

    Article  PubMed  PubMed Central  Google Scholar 

  17. Breeze CE, Lazar J, Mercer T, Halow J, Washington I, Lee K, Ibarrientos S, Castillo A, Neri F, Haugen E, et al (2020) Atlas and developmental dynamics of mouse DNase I hypersensitive sites. https://doi.org/10.1101/2020.06.26.172718

  18. Wilken MS, Brzezinski JA, La Torre A, Siebenthall K, Thurman R, Sabo P, Sandstrom RS, Vierstra J, Canfield TK, Hansen RS et al (2015) DNase I hypersensitivity analysis of the mouse brain and retina identifies region-specific regulatory elements. Epigenet Chromatin 8(1):1–17. https://doi.org/10.1186/1756-8935-8-8

    Article  CAS  Google Scholar 

  19. Noble WS, Kuehn S, Thurman R, Yu M, Stamatoyannopoulos J (2005) Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21(suppl_1):338–343. https://doi.org/10.1093/bioinformatics/bti1047

  20. Feng P, Jiang N, Liu N (2014) Prediction of DNase I hypersensitive sites by using pseudo nucleotide compositions. Sci World J 2014:740506. https://doi.org/10.1155/2014/740506

  21. Liu B, Long R, Chou KC (2016) iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32(16):2411–2418. https://doi.org/10.1093/bioinformatics/btw186

    Article  PubMed  Google Scholar 

  22. Xu ZC, Jiang SY, Qiu WR, Liu YC, Xiao X (2017) iDHSs-PseTNC: identifying DNase I hypersensitive sites with pseuo trinucleotide component by deep sparse auto-encoder. Lett Org Chem 14(9):655–664. https://doi.org/10.2174/1570178614666170213102455

    Article  CAS  Google Scholar 

  23. Manavalan B, Shin TH, Lee G (2018) DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget 9(2):1944. https://doi.org/10.18632/oncotarget.23099

  24. Liang Y, Zhang S (2019) iDHS-DMCAC: identifying DNase I hypersensitive sites with balanced dinucleotide-based detrending moving-average cross-correlation coefficient. SAR QSAR Environ Res 30(6):429–445. https://doi.org/10.1080/1062936X.2019.1615546

    Article  CAS  PubMed  Google Scholar 

  25. Liang Y, Zhang S (2019) Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou’s 5-steps rule. Biophys Chem 253:106227. https://doi.org/10.1016/j.bpc.2019.106227

    Article  CAS  PubMed  Google Scholar 

  26. Zhang S, Yu Q, He H, Zhu F, Wu P, Gu L, Jiang S (2020) iDHS-DSAMS: identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree. Genomics 112(2):1282–1289. https://doi.org/10.1016/j.ygeno.2019.07.017

    Article  CAS  PubMed  Google Scholar 

  27. Zhang S, Xue T (2020) Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting. Mol Genet Genom 295(6):1431–1442. https://doi.org/10.1007/s00438-020-01711-8

    Article  CAS  Google Scholar 

  28. Su W, Wang F, Tan JX, Dao FY, Yang H, Ding H (2021) The prediction of human DNase I hypersensitive sites based on DNA sequence information. Chemometr Intell Lab 209:104223. https://doi.org/10.1016/j.chemolab.2020.104223

    Article  CAS  Google Scholar 

  29. Zou H, Yang F, Yin Z (2022) iDHS-DT: Identifying DNase I hypersensitive sites by integrating DNA dinucleotide and trinucleotide information. Biophys Chem 281:106717. https://doi.org/10.1016/j.bpc.2021.106717

    Article  CAS  PubMed  Google Scholar 

  30. Zhang S, Zhou Z, Chen X, Hu Y, Yang L (2017) pDHS-SVM: a prediction method for plant DNase I hypersensitive sites based on support vector machine. J Theor Biol 426:126–133. https://doi.org/10.1016/j.jtbi.2017.05.030

    Article  CAS  PubMed  Google Scholar 

  31. Zhang S, Zhuang W, Xu Z (2018) Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Anal Biochem 549:149–156. https://doi.org/10.1016/j.ab.2018.03.025

    Article  CAS  PubMed  Google Scholar 

  32. Zhang S, Chang M, Zhou Z, Dai X, Xu Z (2018) pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines. Mol Genet Genom 293(4):1035–1049. https://doi.org/10.1007/s00438-018-1436-3

    Article  CAS  Google Scholar 

  33. Zhang S, Lin J, Su L, Zhou Z (2019) pDHS-DSET: prediction of DNase I hypersensitive sites in plant genome using DS evidence theory. Anal Biochem 564:54–63. https://doi.org/10.1016/j.ab.2018.10.018

    Article  CAS  PubMed  Google Scholar 

  34. Dao FY, Lv H, Su W, Sun ZJ, Huang QL, Lin H (2021) iDHS-deep: an integrated tool for predicting DNase I hypersensitive sites by deep neural network. Brief Bioinform 22(5):047. https://doi.org/10.1093/bib/bbab047

    Article  CAS  Google Scholar 

  35. Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16(1):123–131. https://doi.org/10.1101/gr.4074106

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Chen Y, Chen A (2019) Unveiling the gene regulatory landscape in diseases through the identification of DNase I-hypersensitive sites. Biomed Rep 11(3):87–97. https://doi.org/10.3892/br.2019.1233

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Song L, Crawford GE (2010) DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc 2:5384. https://doi.org/10.1101/pdb.prot5384

    Article  Google Scholar 

  38. Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R et al (2020) Expanded encyclopaedias of dna elements in the human and mouse genomes. Nature 583(7818):699–710. https://doi.org/10.1038/s41586-020-2493-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Rodríguez P, Bautista MA, Gonzàlez J, Escalera S (2018) Beyond one-hot encoding: lower dimensional target embedding. Image Vision Comput 75:21–31. https://doi.org/10.1016/j.imavis.2018.04.004

    Article  Google Scholar 

  40. Liu Q, Chen J, Wang Y, Li S, Jia C, Song J, Li F (2021) DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites. Brief Bioinform 22(3):124. https://doi.org/10.1093/bib/bbaa124

    Article  CAS  Google Scholar 

  41. Zhang Q, Zhu L, Huang DS (2018) High-order convolutional neural network architecture for predicting DNA-protein binding sites. IEEE/ACM Trans Comput Biol Bioinform 16(4):1184–1192. https://doi.org/10.1109/TCBB.2018.2819660

    Article  PubMed  Google Scholar 

  42. Li H, Tian S, Li Y, Fang Q, Tan R, Pan Y, Huang C, Xu Y, Gao X (2020) Modern deep learning in bioinformatics. J Mol Cell Biol 12(11):823–827. https://doi.org/10.1093/jmcb/mjaa030

    Article  PubMed  PubMed Central  Google Scholar 

  43. Han K, Shen LC, Zhu YH, Xu J, Song J, Yu DJ (2022) MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network. Brief Bioinform 23(1):445. https://doi.org/10.1093/bib/bbab445

    Article  CAS  Google Scholar 

  44. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658–1659. https://doi.org/10.1093/bioinformatics/btl158

    Article  CAS  PubMed  Google Scholar 

  45. Zhang SW, Zhang XX, Fan XN, Li WN (2020) LPI-CNNCP: prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick. Anal Biochem 601:113767. https://doi.org/10.1016/j.ab.2020.113767

  46. Zhang Y, Liu Y, Xu J, Wang X, Peng X, Song J, Yu DJ (2021) Leveraging the attention mechanism to improve the identification of DNA N6-methyladenine sites. Brief Bioinform 22(6):351. https://doi.org/10.1093/bib/bbab351

    Article  CAS  Google Scholar 

  47. Wekesa JS, Meng J, Luan Y (2020) Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 112(5):2928–2936. https://doi.org/10.1016/j.ygeno.2020.05.005

    Article  CAS  PubMed  Google Scholar 

  48. Wang Z, Lei X (2021) Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture. Brief Bioinform 22(6):342. https://doi.org/10.1093/bib/bbab342

    Article  CAS  Google Scholar 

  49. Valueva MV, Nagornov N, Lyakhov PA, Valuev GV, Chervyakov NI (2020) Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. Math Comput Simul 177:232–243. https://doi.org/10.1016/j.matcom.2020.04.031

    Article  Google Scholar 

  50. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:abs/1406.1078. https://doi.org/10.48550/arXiv.1406.1078

  51. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  CAS  PubMed  Google Scholar 

  52. Shi Q, Tang X, Yang T, Liu R, Zhang L (2021) Hyperspectral image denoising using a 3-D attention denoising network. IEEE T Geosci Remote 59(12):10348–10363. https://doi.org/10.1109/TGRS.2020.3045273

    Article  Google Scholar 

  53. Miao X, McLoughlin I, Wang W, Zhang P (2021) D-mona: a dilated mixed-order non-local attention network for speaker and language recognition. Neural Netw 139:201–211. https://doi.org/10.1016/j.neunet.2021.03.014

    Article  PubMed  Google Scholar 

  54. Yang Y, Hou Z, Ma Z, Li X, Wong KC (2021) iCircRBP-DHN: identification of circRNA-RBP interaction sites using deep hierarchical network. Brief Bioinf 22(4):274. https://doi.org/10.1093/bib/bbaa274

    Article  CAS  Google Scholar 

  55. Mahmoud MA, Guo P (2021) DNA sequence classification based on MLP with PILAE algorithm. Soft Comput 25(5):4003–4014. https://doi.org/10.1007/s00500-020-05429-y

    Article  Google Scholar 

  56. Zhang P, Wei Z, Che C, Jin B (2022) DeepMGT-DTI: Transformer network incorporating multilayer graph information for Drug–Target interaction prediction. Comput Biol Med:105214. https://doi.org/10.1016/j.compbiomed.2022.105214

  57. Agarap AF (2018) Deep learning using rectified linear units (relu). arxiv:abs/1803.08375. https://doi.org/10.48550/arXiv.1803.08375

  58. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B Methodol 36(2):111–133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x

    Article  Google Scholar 

  59. Rahman CR, Amin R, Shatabda S, Toaha M, Islam S (2021) A convolution based computational approach towards DNA N6-methyladenine site identification and motif extraction in rice genome. Sci Rep 11(1):1–13. https://doi.org/10.1038/s41598-021-89850-9

    Article  CAS  Google Scholar 

  60. Yu X, Jiang L, Jin S, Zeng X, Liu X (2022) preMLI: a pre-trained method to uncover microRNA-lncRNA potential interactions. Brief Bioinf 23(1):470. https://doi.org/10.1093/bib/bbab470

    Article  CAS  Google Scholar 

  61. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(86):2579–605

    Google Scholar 

Download references

Funding

This work was supported by a National Natural Science Foundation of China (No.61972002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhan-Li Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, LS., Sun, ZL. iDHS-FFLG: Identifying DNase I Hypersensitive Sites by Feature Fusion and Local–Global Feature Extraction Network. Interdiscip Sci Comput Life Sci 15, 155–170 (2023). https://doi.org/10.1007/s12539-022-00538-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-022-00538-8

Keywords

Navigation