{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T07:01:48Z","timestamp":1780383708312,"version":"3.54.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2019,10,9]],"date-time":"2019-10-09T00:00:00Z","timestamp":1570579200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2016YFA0502303"],"award-info":[{"award-number":["2016YFA0502303"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key Basic Research Project of China","award":["2015CB910303"],"award-info":[{"award-number":["2015CB910303"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31871342"],"award-info":[{"award-number":["31871342"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2016YFC0901603"],"award-info":[{"award-number":["2016YFC0901603"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"China 863 Program","award":["2015AA020108"],"award-info":[{"award-number":["2015AA020108"]}]},{"name":"Beijing Advanced Innovation Center for Genomics (ICG) and the State Key Laboratory of Protein and Plant Gene Research, Peking University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Convolutional neural networks (CNNs) have outperformed conventional methods in modeling the sequence specificity of DNA\u2013protein binding. While previous studies have built a connection between CNNs and probabilistic models, simple models of CNNs cannot achieve sufficient accuracy on this problem. Recently, some methods of neural networks have increased performance using complex neural networks whose results cannot be directly interpreted. However, it is difficult to combine probabilistic models and CNNs effectively to improve DNA\u2013protein binding predictions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this article, we present a novel global pooling method: expectation pooling for predicting DNA\u2013protein binding. Our pooling method stems naturally from the expectation maximization algorithm, and its benefits can be interpreted both statistically and via deep learning theory. Through experiments, we demonstrate that our pooling method improves the prediction performance DNA\u2013protein binding. Our interpretable pooling method combines probabilistic ideas with global pooling by taking the expectations of inputs without increasing the number of parameters. We also analyze the hyperparameters in our method and propose optional structures to help fit different datasets. We explore how to effectively utilize these novel pooling methods and show that combining statistical methods with deep learning is highly beneficial, which is promising and meaningful for future studies in this field.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All code is public in https:\/\/github.com\/gao-lab\/ePooling.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz768","type":"journal-article","created":{"date-parts":[[2019,10,8]],"date-time":"2019-10-08T04:19:25Z","timestamp":1570508365000},"page":"1405-1412","source":"Crossref","is-referenced-by-count":38,"title":["Expectation pooling: an effective and interpretable pooling method for predicting DNA\u2013protein binding"],"prefix":"10.1093","volume":"36","author":[{"given":"Xiao","family":"Luo","sequence":"first","affiliation":[{"name":"School of Mathematical Sciences"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xinming","family":"Tu","sequence":"additional","affiliation":[{"name":"Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and the State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yang","family":"Ding","sequence":"additional","affiliation":[{"name":"Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and the State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6470-8815","authenticated-orcid":false,"given":"Ge","family":"Gao","sequence":"additional","affiliation":[{"name":"Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and the State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9143-1898","authenticated-orcid":false,"given":"Minghua","family":"Deng","sequence":"additional","affiliation":[{"name":"School of Mathematical Sciences"},{"name":"Center for Quantitative Biology, Peking University , Beijing 100871, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2019,10,9]]},"reference":[{"key":"2023060910274421600_btz768-B1","doi-asserted-by":"crossref","first-page":"831.","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023060910274421600_btz768-B2","doi-asserted-by":"crossref","first-page":"W369","DOI":"10.1093\/nar\/gkl198","article-title":"Meme: discovering and analyzing DNA and protein sequence motifs","volume":"34","author":"Bailey","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023060910274421600_btz768-B3","first-page":"1185","volume-title":"Advances in Neural Information Processing Systems","author":"Boureau","year":"2008"},{"key":"2023060910274421600_btz768-B4","first-page":"111","volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML-10)","author":"Boureau","year":"2010"},{"key":"2023060910274421600_btz768-B5","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1089\/10665270252935430","article-title":"Finding motifs using random projections","volume":"9","author":"Buhler","year":"2002","journal-title":"J. Comput. Biol"},{"key":"2023060910274421600_btz768-B6","doi-asserted-by":"crossref","first-page":"1837","DOI":"10.1093\/bioinformatics\/bty893","article-title":"Simple tricks of convolutional neural network architectures improve DNA\u2013protein binding prediction","volume":"35","author":"Cao","year":"2018","journal-title":"Bioinformatics"},{"key":"2023060910274421600_btz768-B7","doi-asserted-by":"crossref","first-page":"20.","DOI":"10.1038\/538020a","article-title":"Can we open the black box of AI?","volume":"538","author":"Castelvecchi","year":"2016","journal-title":"Nat. News"},{"key":"2023060910274421600_btz768-B8","author":"Chollet","year":"2015"},{"key":"2023060910274421600_btz768-B9","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1145\/1143844.1143874","volume-title":"Proceedings of the 23rd International Conference on Machine Learning","author":"Davis","year":"2006"},{"key":"2023060910274421600_btz768-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Series B Methodol"},{"key":"2023060910274421600_btz768-B11","article-title":"An exact transformation for cnn kernel enables accurate sequence motif identification and leads to a potentially full probabilistic interpretation of cnn","author":"Ding","year":"2019"},{"key":"2023060910274421600_btz768-B12","first-page":"1","article-title":"ROC graphs: notes and practical considerations for researchers","volume":"31","author":"Fawcett","year":"2004","journal-title":"Mach. Learn"},{"key":"2023060910274421600_btz768-B13","volume-title":"The Elements of Statistical Learning","author":"Friedman","year":"2001"},{"key":"2023060910274421600_btz768-B14","author":"Graham","year":"2014"},{"key":"2023060910274421600_btz768-B15","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1007\/978-3-662-44848-9_34","volume-title":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"Gulcehre","year":"2014"},{"key":"2023060910274421600_btz768-B16","doi-asserted-by":"crossref","first-page":"R24.","DOI":"10.1186\/gb-2007-8-2-r24","article-title":"Quantifying similarity between motifs","volume":"8","author":"Gupta","year":"2007","journal-title":"Genome Biol"},{"key":"2023060910274421600_btz768-B17","volume-title":"European Conference on Computer Vision","author":"He","year":"2014"},{"key":"2023060910274421600_btz768-B18","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He","year":"2016"},{"key":"2023060910274421600_btz768-B19","author":"Huang","year":"2018"},{"key":"2023060910274421600_btz768-B20","doi-asserted-by":"crossref","first-page":"2146","DOI":"10.1109\/ICCV.2009.5459469","volume-title":"Computer Vision, 2009 IEEE 12th International Conference on","author":"Jarrett","year":"2009"},{"key":"2023060910274421600_btz768-B21","author":"Kingma","year":"2014"},{"key":"2023060910274421600_btz768-B22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1002\/prot.340070105","article-title":"An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences","volume":"7","author":"Lawrence","year":"1990","journal-title":"Proteins"},{"key":"2023060910274421600_btz768-B23","first-page":"396","article-title":"Handwritten digit recognition with a back-propagation network","author":"LeCun","year":"1990","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023060910274421600_btz768-B24","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"2023060910274421600_btz768-B25","first-page":"464","article-title":"Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree","author":"Lee","year":"2016","journal-title":"Artificial Intelligence and Statistics"},{"key":"2023060910274421600_btz768-B26","author":"Lin","year":"2013"},{"key":"2023060910274421600_btz768-B27","first-page":"990","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Lu","year":"2015"},{"key":"2023060910274421600_btz768-B28","first-page":"281","volume-title":"Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability","author":"MacQueen","year":"1967"},{"key":"2023060910274421600_btz768-B29","doi-asserted-by":"crossref","first-page":"3990","DOI":"10.1093\/bioinformatics\/bty404","article-title":"SSMART: sequence-structure motif identification for RNA-binding proteins","volume":"34","author":"Munteanu","year":"2018","journal-title":"Bioinformatics"},{"key":"2023060910274421600_btz768-B30","doi-asserted-by":"crossref","first-page":"3427.","DOI":"10.1093\/bioinformatics\/bty364","article-title":"Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks","volume":"34","author":"Pan","year":"2018","journal-title":"Bioinformatics"},{"key":"2023060910274421600_btz768-B31","doi-asserted-by":"crossref","first-page":"511.","DOI":"10.1186\/s12864-018-4889-1","article-title":"Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks","volume":"19","author":"Pan","year":"2018","journal-title":"BMC Genomics"},{"key":"2023060910274421600_btz768-B32","author":"Radford","year":"2015"},{"key":"2023060910274421600_btz768-B33","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023060910274421600_btz768-B34","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1093\/bioinformatics\/16.1.16","article-title":"DNA binding sites: representation and discovery","volume":"16","author":"Stormo","year":"2000","journal-title":"Bioinformatics"},{"key":"2023060910274421600_btz768-B35","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Series B Methodol"},{"key":"2023060910274421600_btz768-B36","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1093\/nar\/24.1.238","article-title":"Transfac: a database on transcription factors and their DNA binding sites","volume":"24","author":"Wingender","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2023060910274421600_btz768-B37","first-page":"1179","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Xie","year":"2015"},{"key":"2023060910274421600_btz768-B38","author":"Zeiler","year":"2013"},{"key":"2023060910274421600_btz768-B39","doi-asserted-by":"crossref","first-page":"i121","DOI":"10.1093\/bioinformatics\/btw255","article-title":"Convolutional neural network architectures for predicting DNA\u2013protein binding","volume":"32","author":"Zeng","year":"2016","journal-title":"Bioinformatics"},{"key":"2023060910274421600_btz768-B40","first-page":"4970","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhai","year":"2017"},{"key":"2023060910274421600_btz768-B41","doi-asserted-by":"crossref","first-page":"R137.","DOI":"10.1186\/gb-2008-9-9-r137","article-title":"Model-based analysis of ChIP-seq (MACS)","volume":"9","author":"Zhang","year":"2008","journal-title":"Genome Biol"},{"key":"2023060910274421600_btz768-B42","doi-asserted-by":"crossref","first-page":"931.","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning-based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat. Methods"},{"key":"2023060910274421600_btz768-B43","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1038\/s41588-018-0295-5","article-title":"A primer on deep learning in genomics","volume":"51","author":"Zou","year":"2019","journal-title":"Nat. Genet"},{"key":"2023060910274421600_btz768-B44","doi-asserted-by":"crossref","first-page":"4180.","DOI":"10.1093\/bioinformatics\/bty497","article-title":"SpliceRover: interpretable convolutional neural networks for improved splice site prediction","volume":"34","author":"Zuallaert","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz768\/30313914\/btz768.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1405\/50553011\/bioinformatics_36_5_1405.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/5\/1405\/50553011\/bioinformatics_36_5_1405.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,24]],"date-time":"2024-07-24T14:21:31Z","timestamp":1721830891000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/5\/1405\/5584233"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2019,10,9]]},"references-count":44,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz768","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/658427","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,10,9]]}}}