{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:20:25Z","timestamp":1763202025705,"version":"3.37.3"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,7,12]],"date-time":"2017-07-12T00:00:00Z","timestamp":1499817600000},"content-version":"vor","delay-in-days":11,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["61672382","61402334","61520106006","31571364","61532008","61472280","61472173","61572447","61373098"],"award-info":[{"award-number":["61672382","61402334","61520106006","31571364","61532008","61472280","61472173","61572447","61373098"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>CDAUC is available at: https:\/\/drive.google.com\/drive\/folders\/0BxOW5MtIZbJjNFpCeHlBVWJHeW8.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx255","type":"journal-article","created":{"date-parts":[[2017,4,17]],"date-time":"2017-04-17T19:11:50Z","timestamp":1492456310000},"page":"i243-i251","source":"Crossref","is-referenced-by-count":29,"title":["Direct AUC optimization of regulatory motifs"],"prefix":"10.1093","volume":"33","author":[{"given":"Lin","family":"Zhu","sequence":"first","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, Department of College of Electronics and Information Engineering, Tongji University , Shanghai, China"}]},{"given":"Hong-Bo","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, Department of College of Electronics and Information Engineering, Tongji University , Shanghai, China"}]},{"given":"De-Shuang","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute of Machine Learning and Systems Biology, Department of College of Electronics and Information Engineering, Tongji University , Shanghai, China"}]}],"member":"286","published-online":{"date-parts":[[2017,7,12]]},"reference":[{"key":"2023051506492004500_btx255-B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1090\/conm\/223\/03131","article-title":"Geometric range searching and its relatives","volume":"223","author":"Agarwal","year":"1999","journal-title":"Contemp. Math"},{"key":"2023051506492004500_btx255-B2","doi-asserted-by":"crossref","first-page":"925","DOI":"10.1186\/1471-2164-15-925","article-title":"SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences","volume":"15","author":"Agostini","year":"2014","journal-title":"BMC Genomics"},{"key":"2023051506492004500_btx255-B3","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051506492004500_btx255-B4","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1093\/bioinformatics\/btr261","article-title":"DREME: motif discovery in transcription factor ChIP-seq data","volume":"27","author":"Bailey","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B5","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1093\/nar\/gks433","article-title":"Inferring direct DNA binding from ChIP-seq","volume":"40","author":"Bailey","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B6","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1145\/1143844.1143874","article-title":"The relationship between Precision-Recall and ROC curves","author":"Davis","year":"2006","journal-title":"ICML"},{"key":"2023051506492004500_btx255-B7","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-04245-8","volume-title":"Computational geometry","author":"De Berg","year":"2000"},{"key":"2023051506492004500_btx255-B8","doi-asserted-by":"crossref","first-page":"1268","DOI":"10.1101\/gr.184671.114","article-title":"A widespread role of the motif environment in transcription factor binding across diverse protein families","volume":"25","author":"Dror","year":"2015","journal-title":"Genome Res"},{"key":"2023051506492004500_btx255-B9","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.artint.2016.03.003","article-title":"One-pass AUC optimization","volume":"236","author":"Gao","year":"2016","journal-title":"Artif. Intell"},{"key":"2023051506492004500_btx255-B11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btx115","article-title":"Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning","author":"Gao","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B12","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1371\/journal.pcbi.1003711","article-title":"Enhanced regulatory sequence prediction using gapped k-mer features","volume":"10","author":"Ghandi","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023051506492004500_btx255-B13","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1093\/nar\/gkt831","article-title":"A general approach for discriminative de novo motif discovery from high-throughput data","volume":"41","author":"Grau","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B14","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from Imbalanced Data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans. Knowledge Data Eng"},{"key":"2023051506492004500_btx255-B15","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1016\/j.molcel.2010.05.004","article-title":"Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities","volume":"38","author":"Heinz","year":"2010","journal-title":"Mol. Cell"},{"key":"2023051506492004500_btx255-B16","first-page":"1064","article-title":"Fast coordinate descent methods with variable selection for non-negative matrix factorization","author":"Hsieh","year":"2011","journal-title":"KDD"},{"key":"2023051506492004500_btx255-B17","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1093\/bioinformatics\/btv017","article-title":"Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets","volume":"31","author":"Ikebata","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B18","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2023051506492004500_btx255-B19","doi-asserted-by":"crossref","first-page":"i310","DOI":"10.1093\/bioinformatics\/btu286","article-title":"Stochastic EM-based TFBS motif discovery with MITSU","volume":"30","author":"Kilpatrick","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-15-752","article-title":"Differential motif enrichment analysis of paired ChIP-seq experiments","volume":"15","author":"Lesluyes","year":"2014","journal-title":"BMC Genomics"},{"key":"2023051506492004500_btx255-B21","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1093\/bioinformatics\/btm080","article-title":"GAPWM: a genetic algorithm method for optimizing a position weight matrix","volume":"23","author":"Li","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B22","first-page":"1158","article-title":"Fast motif discovery in short sequences","author":"Liu","year":"2016","journal-title":"ICDE"},{"key":"2023051506492004500_btx255-B23","doi-asserted-by":"crossref","first-page":"12995","DOI":"10.1093\/nar\/gku1083","article-title":"Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models","volume":"42","author":"Maaskola","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B24","doi-asserted-by":"crossref","first-page":"2826","DOI":"10.1093\/bioinformatics\/btq546","article-title":"Identification of Context-Dependent Motifs by Contrasting ChIP Binding Data","volume":"26","author":"Mason","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B25","article-title":"Motif enrichment analysis: a unified framework and an evaluation on ChIP data","volume":"11, 165","author":"McLeay","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023051506492004500_btx255-B26","first-page":"516","article-title":"A structural SVM based approach for optimizing partial AUC","author":"Narasimhan","year":"2013","journal-title":"ICML."},{"key":"2023051506492004500_btx255-B27","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1093\/nar\/gku117","article-title":"A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data","volume":"42","author":"Orenstein","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B28","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1093\/bioinformatics\/btt748","article-title":"Discriminative motif optimization based on perceptron training","volume":"30","author":"Patel","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B29","doi-asserted-by":"crossref","first-page":"21.","DOI":"10.1371\/journal.pcbi.1004271","article-title":"SeqGL identifies context-dependent binding signals in genome-wide regulatory element maps","volume":"11","author":"Setty","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051506492004500_btx255-B30","doi-asserted-by":"crossref","first-page":"6055","DOI":"10.1093\/nar\/gkw521","article-title":"Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences","volume":"44","author":"Siebert","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051506492004500_btx255-B31","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1371\/journal.pone.0047836","article-title":"The limits of de novo DNA motif discovery","volume":"7","author":"Simcha","year":"2012","journal-title":"PLoS One"},{"key":"2023051506492004500_btx255-B32","doi-asserted-by":"crossref","first-page":"1965","DOI":"10.1093\/bioinformatics\/btu163","article-title":"Improving MEME via a two-tiered significance analysis","volume":"30","author":"Tanaka","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B33","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1371\/journal.pcbi.1000562","article-title":"Discovery of regulatory elements is improved by a discriminatory approach","volume":"5","author":"Valen","year":"2009","journal-title":"PLoS Comput. Biol"},{"key":"2023051506492004500_btx255-B34","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1101\/gr.139105.112","article-title":"Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors","volume":"22","author":"Wang","year":"2012","journal-title":"Genome Res"},{"key":"2023051506492004500_btx255-B35","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/nbt.2486","article-title":"Evaluation of methods for modeling transcription factor sequence specificity","volume":"31","author":"Weirauch","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2023051506492004500_btx255-B36","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1093\/bioinformatics\/btt615","article-title":"Discriminative motif analysis of high-throughput dataset","volume":"30","author":"Yao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051506492004500_btx255-B37","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1093\/bioinformatics\/btw255","article-title":"Convolutional neural network architectures for predicting DNA-protein binding","volume":"32","author":"Zeng","year":"2016","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i243\/50314781\/bioinformatics_33_14_i243.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i243\/50314781\/bioinformatics_33_14_i243.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T06:50:04Z","timestamp":1684133404000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/i243\/3953970"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7]]},"references-count":37,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx255","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,7]]},"published":{"date-parts":[[2017,7]]}}}