{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T14:07:56Z","timestamp":1776002876663,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,9,25]],"date-time":"2021-09-25T00:00:00Z","timestamp":1632528000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,9,25]],"date-time":"2021-09-25T00:00:00Z","timestamp":1632528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions\u2019 representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database\u2019s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-021-04359-2","type":"journal-article","created":{"date-parts":[[2021,9,25]],"date-time":"2021-09-25T19:02:53Z","timestamp":1632596573000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders"],"prefix":"10.1186","volume":"22","author":[{"given":"Quentin","family":"Ferr\u00e9","sequence":"first","affiliation":[]},{"given":"Jeanne","family":"Ch\u00e8neby","sequence":"additional","affiliation":[]},{"given":"Denis","family":"Puthier","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4057-774X","authenticated-orcid":false,"given":"C\u00e9cile","family":"Capponi","sequence":"additional","affiliation":[]},{"given":"Beno\u00eet","family":"Ballester","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,9,25]]},"reference":[{"key":"4359_CR1","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/nature11247","volume":"489","author":"ENCODE Project Consortium","year":"2012","unstructured":"ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57\u201374.","journal-title":"Nature"},{"key":"4359_CR2","doi-asserted-by":"publisher","first-page":"D991","DOI":"10.1093\/nar\/gks1193","volume":"41","author":"T Barrett","year":"2013","unstructured":"Barrett T, et al. NCBI GEO: archive for functional genomics data sets\u2014update. Nucleic Acids Res. 2013;41:D991\u20135.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR3","doi-asserted-by":"publisher","first-page":"D1002","DOI":"10.1093\/nar\/gkq1040","volume":"39","author":"H Parkinson","year":"2011","unstructured":"Parkinson H, et al. ArrayExpress update\u2014an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res. 2011;39:D1002\u20134.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR4","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1186\/gb-2003-5-1-201","volume":"5","author":"ML Bulyk","year":"2004","unstructured":"Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2004;5:201.","journal-title":"Genome Biol"},{"key":"4359_CR5","doi-asserted-by":"publisher","first-page":"1497","DOI":"10.1126\/science.1141319","volume":"316","author":"DS Johnson","year":"2007","unstructured":"Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316:1497\u2013502.","journal-title":"Science"},{"key":"4359_CR6","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","volume":"172","author":"SA Lambert","year":"2018","unstructured":"Lambert SA, et al. The human transcription factors. Cell. 2018;172:650\u201365.","journal-title":"Cell"},{"key":"4359_CR7","doi-asserted-by":"publisher","first-page":"2843","DOI":"10.1093\/bioinformatics\/btu356","volume":"30","author":"H Li","year":"2014","unstructured":"Li H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinform Oxf Engl. 2014;30:2843\u201351.","journal-title":"Bioinform Oxf Engl"},{"key":"4359_CR8","doi-asserted-by":"publisher","first-page":"1813","DOI":"10.1101\/gr.136184.111","volume":"22","author":"SG Landt","year":"2012","unstructured":"Landt SG, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22:1813\u201331.","journal-title":"Genome Res"},{"key":"4359_CR9","doi-asserted-by":"publisher","first-page":"918","DOI":"10.1038\/ni.2117","volume":"12","author":"BL Kidder","year":"2011","unstructured":"Kidder BL, Hu G, Zhao K. ChIP-Seq: technical considerations for obtaining high-quality data. Nat Immunol. 2011;12:918\u201322.","journal-title":"Nat Immunol"},{"key":"4359_CR10","doi-asserted-by":"publisher","first-page":"e11471","DOI":"10.1371\/journal.pone.0011471","volume":"5","author":"EG Wilbanks","year":"2010","unstructured":"Wilbanks EG, Facciotti MT. Evaluation of algorithm performance in ChIP-Seq peak detection. PLoS ONE. 2010;5:e11471.","journal-title":"PLoS ONE"},{"key":"4359_CR11","doi-asserted-by":"publisher","first-page":"6959","DOI":"10.1093\/nar\/gkv637","volume":"43","author":"D Jain","year":"2015","unstructured":"Jain D, Baldi S, Zabel A, Straub T, Becker PB. Active promoters give rise to false positive \u2018phantom peaks\u2019 in ChIP-seq experiments. Nucleic Acids Res. 2015;43:6959\u201368.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR12","doi-asserted-by":"publisher","first-page":"18602","DOI":"10.1073\/pnas.1316064110","volume":"110","author":"L Teytelman","year":"2013","unstructured":"Teytelman L, Thurtle DM, Rine J, van Oudenaarden A. Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. Proc Natl Acad Sci U S A. 2013;110:18602\u20137.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"4359_CR13","doi-asserted-by":"publisher","DOI":"10.1101\/260687","author":"JG Chitpin","year":"2018","unstructured":"Chitpin JG, Awdeh A, Perkins TJ. RECAP reveals the true statistical significance of ChIP-seq peak calls. bioRxiv. 2018. https:\/\/doi.org\/10.1101\/260687.","journal-title":"bioRxiv"},{"key":"4359_CR14","doi-asserted-by":"publisher","first-page":"i225","DOI":"10.1093\/bioinformatics\/btx243","volume":"33","author":"PW Koh","year":"2017","unstructured":"Koh PW, Pierson E, Kundaje A. Denoising genome-wide histone ChIP-seq with convolutional neural networks. Bioinformatics. 2017;33:i225.","journal-title":"Bioinformatics"},{"key":"4359_CR15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-37186-2","volume":"9","author":"HM Amemiya","year":"2019","unstructured":"Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep. 2019;9:1\u20135.","journal-title":"Sci Rep"},{"key":"4359_CR16","first-page":"1752","volume":"5","author":"Q Li","year":"2011","unstructured":"Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Ann Appl Stat. 2011;5:1752\u201379.","journal-title":"Ann Appl Stat"},{"key":"4359_CR17","doi-asserted-by":"publisher","first-page":"952","DOI":"10.1038\/ncb3573","volume":"19","author":"LLP Hanssen","year":"2017","unstructured":"Hanssen LLP, et al. Tissue-specific CTCF-cohesin-mediated chromatin architecture delimits enhancer interactions and function in vivo. Nat Cell Biol. 2017;19:952\u201361.","journal-title":"Nat Cell Biol"},{"key":"4359_CR18","doi-asserted-by":"publisher","first-page":"967","DOI":"10.1093\/bib\/bbv101","volume":"17","author":"D Kleftogiannis","year":"2016","unstructured":"Kleftogiannis D, Kalnis P, Bajic VB. Progress and challenges in bioinformatics approaches for enhancer identification. Brief Bioinform. 2016;17:967\u201379.","journal-title":"Brief Bioinform"},{"key":"4359_CR19","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1093\/bioinformatics\/bts009","volume":"28","author":"MD Chikina","year":"2012","unstructured":"Chikina MD, Troyanskaya OG. An effective statistical evaluation of ChIPseq dataset similarity. Bioinformatics. 2012;28:607\u201313.","journal-title":"Bioinformatics"},{"key":"4359_CR20","doi-asserted-by":"publisher","first-page":"15:1","DOI":"10.1145\/1541880.1541882","volume":"41","author":"V Chandola","year":"2009","unstructured":"Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009;41:15:1-15:58.","journal-title":"ACM Comput Surv"},{"key":"4359_CR21","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkx1092","volume":"46","author":"J Ch\u00e8neby","year":"2018","unstructured":"Ch\u00e8neby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic Acids Res. 2018;46:D267\u201375.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR22","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","volume":"20","author":"G Eraslan","year":"2019","unstructured":"Eraslan G, Avsec \u017d, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389\u2013403.","journal-title":"Nat Rev Genet"},{"key":"4359_CR23","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbw113","author":"Y Li","year":"2016","unstructured":"Li Y, Wu F-X, Ngom A. A review on machine learning principles for multi-view biological data integration. Brief Bioinform. 2016. https:\/\/doi.org\/10.1093\/bib\/bbw113.","journal-title":"Brief Bioinform"},{"key":"4359_CR24","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1016\/j.cancergen.2013.11.005","volume":"206","author":"R Daber","year":"2013","unstructured":"Daber R, Sukhadia S, Morrissette JJD. Understanding the limitations of next generation sequencing informatics, an approach to clinical pipeline validation using artificial data sets. Cancer Genet. 2013;206:441\u20138.","journal-title":"Cancer Genet"},{"key":"4359_CR25","doi-asserted-by":"publisher","first-page":"e24","DOI":"10.1093\/nar\/gkt1105","volume":"42","author":"L Teng","year":"2014","unstructured":"Teng L, He B, Gao P, Gao L, Tan K. Discover context-specific combinatorial transcription factor interactions by integrating diverse ChIP-Seq data sets. Nucleic Acids Res. 2014;42:e24\u2013e24.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR26","first-page":"D180","volume":"48","author":"J Ch\u00e8neby","year":"2020","unstructured":"Ch\u00e8neby J, et al. ReMap 2020: a database of regulatory regions from an integrative analysis of human and arabidopsis DNA-binding sequencing experiments. Nucleic Acids Res. 2020;48:D180\u20138.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR27","doi-asserted-by":"publisher","first-page":"6256","DOI":"10.1093\/nar\/gku281","volume":"42","author":"NL Sharma","year":"2014","unstructured":"Sharma NL, et al. The ETS family member GABP\u03b1 modulates androgen receptor signalling and mediates an aggressive phenotype in prostate cancer. Nucleic Acids Res. 2014;42:6256\u201369.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR28","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/j.molcel.2010.01.026","volume":"37","author":"C Lin","year":"2010","unstructured":"Lin C, et al. AFF4, a component of the ELL\/P-TEFb elongation complex and a shared subunit of MLL chimeras, can link transcription elongation to Leukemia. Mol Cell. 2010;37:429\u201337.","journal-title":"Mol Cell"},{"key":"4359_CR29","doi-asserted-by":"publisher","first-page":"6238","DOI":"10.1074\/jbc.M112.429605","volume":"288","author":"S Lin","year":"2013","unstructured":"Lin S, et al. Proteomic and functional analyses reveal the role of chromatin reader SFMBT1 in regulating epigenetic silencing and the myogenic gene program. J Biol Chem. 2013;288:6238\u201347.","journal-title":"J Biol Chem"},{"key":"4359_CR30","doi-asserted-by":"publisher","first-page":"9230","DOI":"10.1093\/nar\/gkt712","volume":"41","author":"C Chen","year":"2013","unstructured":"Chen C, Zhang S, Zhang X-S. Discovery of cell-type specific regulatory elements in the human genome using differential chromatin modification analysis. Nucleic Acids Res. 2013;41:9230\u201342.","journal-title":"Nucleic Acids Res"},{"key":"4359_CR31","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1007\/11558484_48","volume-title":"Advanced concepts for intelligent vision systems","author":"N Ponomarenko","year":"2005","unstructured":"Ponomarenko N, Lukin V, Zriakhov M, Egiazarian K, Astola J. Lossy compression of images with additive noise. In: Blanc-Talon J, Philips W, Popescu D, Scheunders P, editors. Advanced concepts for intelligent vision systems. Berlin: Springer; 2005. p. 381\u20136."},{"key":"4359_CR32","unstructured":"Theis L, Shi W, Cunningham A, Husz\u00e1r F. Lossy image compression with compressive autoencoders. ArXiv170300395 Cs Stat (2017)"},{"key":"4359_CR33","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1007\/978-3-030-10925-7_11","volume-title":"Machine learning and knowledge discovery in databases","author":"R Chalapathy","year":"2019","unstructured":"Chalapathy R, Toth E, Chawla S. Group anomaly detection using deep generative models. In: Berlingerio M, Bonchi F, G\u00e4rtner T, Hurley N, Ifrim G, editors. Machine learning and knowledge discovery in databases, vol. 11051. Berlin: Springer; 2019. p. 173\u201389."},{"key":"4359_CR34","doi-asserted-by":"publisher","unstructured":"Vincent P, Larochelle H, Bengio Y, Manzagol P-A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning: ICML \u201908 1096\u20131103. ACM Press (2008). https:\/\/doi.org\/10.1145\/1390156.1390294.","DOI":"10.1145\/1390156.1390294"},{"key":"4359_CR35","doi-asserted-by":"crossref","first-page":"2686378","DOI":"10.1155\/2019\/2686378","volume":"2019","author":"X Xu","year":"2019","unstructured":"Xu X, Liu H, Yao M. Recent progress of anomaly detection. Complexity. 2019;2019:2686378.","journal-title":"Complexity"},{"key":"4359_CR36","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1002\/wics.1347","volume":"7","author":"S Ranshous","year":"2015","unstructured":"Ranshous S, et al. Anomaly detection in dynamic networks: a survey. WIREs Comput Stat. 2015;7:223\u201347.","journal-title":"WIREs Comput Stat"},{"key":"4359_CR37","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1007\/s10618-014-0365-y","volume":"29","author":"L Akoglu","year":"2015","unstructured":"Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description: a survey. Data Min Knowl Discov. 2015;29:626\u201388.","journal-title":"Data Min Knowl Discov"},{"key":"4359_CR38","doi-asserted-by":"publisher","unstructured":"Zheng L, Li Z, Li J, Li Z, Gao J. AddGraph: anomaly detection in dynamic graph using attention-based temporal GCN. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, pp 4419\u20134425. International Joint Conferences on Artificial Intelligence Organization (2019). https:\/\/doi.org\/10.24963\/ijcai.2019\/614.","DOI":"10.24963\/ijcai.2019\/614"},{"key":"4359_CR39","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1093\/bioinformatics\/bty513","volume":"35","author":"J Fang","year":"2019","unstructured":"Fang J. Tightly integrated genomic and epigenomic data mining using tensor decomposition. Bioinformatics. 2019;35:112\u20138.","journal-title":"Bioinformatics"},{"key":"4359_CR40","doi-asserted-by":"crossref","unstructured":"Jaritz M, de Charette R, Wirbel E, Perrotton X, Nashashibi F. Sparse and dense data with CNNs: depth completion and semantic segmentation. ArXiv180800769 Cs (2018).","DOI":"10.1109\/3DV.2018.00017"},{"key":"4359_CR41","unstructured":"keras-team\/keras. Keras (2020)."},{"key":"4359_CR42","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/MCSE.2011.37","volume":"13","author":"S van der Walt","year":"2011","unstructured":"van der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng. 2011;13:22\u201330.","journal-title":"Comput Sci Eng"},{"key":"4359_CR43","doi-asserted-by":"crossref","unstructured":"Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. ArXiv160306937 Cs (2016).","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"4359_CR44","doi-asserted-by":"publisher","DOI":"10.1101\/032821","author":"D Quang","year":"2015","unstructured":"Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Cell. 2015. https:\/\/doi.org\/10.1101\/032821.","journal-title":"Cell"},{"key":"4359_CR45","unstructured":"Mishra N, Rohaninejad M, Chen X, Abbeel P. A simple neural attentive meta-learner. ArXiv170703141 Cs Stat (2018)."},{"key":"4359_CR46","unstructured":"Lehtinen J et al. Noise2Noise: learning image restoration without clean data. ArXiv180304189 Cs Stat (2018)."},{"key":"4359_CR47","unstructured":"Kingma DP, Ba J. Adam: a method for stochastic optimization. ArXiv14126980 Cs (2014)."},{"key":"4359_CR48","unstructured":"Malhotra P et al. LSTM-based encoder-decoder for multi-sensor anomaly detection. ArXiv160700148 Cs Stat (2016)."},{"key":"4359_CR49","unstructured":"Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. ArXiv13126034 Cs (2014)."},{"key":"4359_CR50","doi-asserted-by":"publisher","unstructured":"Ordway-West E, Parveen P, Henslee A. Autoencoder evaluation and hyper-parameter tuning in an unsupervised setting. In: 2018 IEEE international congress on big data (BigData congress), pp 205\u2013209 (2018). https:\/\/doi.org\/10.1109\/BigDataCongress.2018.00034.","DOI":"10.1109\/BigDataCongress.2018.00034"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04359-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04359-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04359-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,8]],"date-time":"2024-09-08T20:28:22Z","timestamp":1725827302000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04359-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,25]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4359"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04359-2","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,25]]},"assertion":[{"value":"15 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 August 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 September 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"460"}}