{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T13:46:56Z","timestamp":1762004816374,"version":"3.37.3"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2016,9,7]],"date-time":"2016-09-07T00:00:00Z","timestamp":1473206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01GM114341"],"award-info":[{"award-number":["R01GM114341"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"USDA","award":["2012-67013-19361","R775499"],"award-info":[{"award-number":["2012-67013-19361","R775499"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>With the rapid emergence of technologies for locating cis-regulatory modules (CRMs) genome-wide, the next pressing challenge is to assign precise functions to each CRM, i.e. to determine the spatiotemporal domains or cell-types where it drives expression. A popular approach to this task is to model the typical k-mer composition of a set of CRMs known to drive a common expression pattern, and assign that pattern to other CRMs exhibiting a similar k-mer composition. This approach does not rely on prior knowledge of transcription factors relevant to the CRM or their binding motifs, and is thus more widely applicable than motif-based methods for predicting CRM activity, but is also prone to false positive predictions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present a novel strategy to improve the above-mentioned approach: to predict if a CRM drives a specific gene expression pattern, assess not only how similar the CRM is to other CRMs with similar activity but also to CRMs with distinct activities. We use a state-of-the-art statistical method to quantify a CRM\u2019s sequence similarity to many different training sets of CRMs, and employ a classification algorithm to integrate these similarity scores into a single prediction of the CRM\u2019s activity. This strategy is shown to significantly improve CRM activity prediction over current approaches.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>Our implementation of the new method, called IMMBoost, is freely available as source code, at https:\/\/github.com\/weiyangedward\/IMMBoost.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw552","type":"journal-article","created":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T00:21:39Z","timestamp":1473380499000},"page":"1-7","source":"Crossref","is-referenced-by-count":36,"title":["A novel method for predicting activity of cis-regulatory modules, based on a diverse training set"],"prefix":"10.1093","volume":"33","author":[{"given":"Wei","family":"Yang","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Illinois, Urbana-Champaign, Urbana, IL, USA"}]},{"given":"Saurabh","family":"Sinha","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois, Urbana-Champaign, Urbana, IL, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,9,7]]},"reference":[{"key":"2023020204203987900_btw552-B1","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/B978-0-12-386499-4.00005-7","volume-title":"Current Topics in Developmental Biology, Transcriptional Switches during Development","author":"Aerts","year":"2012"},{"key":"2023020204203987900_btw552-B2","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1242\/dev.101709","article-title":"Machine learning classification of cell-specific cardiac enhancers uncovers developmental subnetworks regulating progenitor cell division and cell fate specification","volume":"141","author":"Ahmad","year":"2014","journal-title":"Development"},{"key":"2023020204203987900_btw552-B3","doi-asserted-by":"crossref","first-page":"1723","DOI":"10.1101\/gr.127712.111","article-title":"Sequence and chromatin determinants of cell-type\u2013specific transcription factor binding","volume":"22","author":"Arvey","year":"2012","journal-title":"Genome Res"},{"key":"2023020204203987900_btw552-B4","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B5","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1038\/nbt1010-1045","article-title":"The NIH Roadmap Epigenomics Mapping Consortium","volume":"28","author":"Bernstein","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023020204203987900_btw552-B6","doi-asserted-by":"crossref","first-page":"3998","DOI":"10.1093\/nar\/gkv195","article-title":"Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism","volume":"43","author":"Blatti","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B7","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.cell.2007.12.014","article-title":"High-resolution mapping and characterization of open chromatin across the genome","volume":"132","author":"Boyle","year":"2008","journal-title":"Cell"},{"key":"2023020204203987900_btw552-B8","doi-asserted-by":"crossref","first-page":"1213","DOI":"10.1038\/nmeth.2688","article-title":"Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position","volume":"10","author":"Buenrostro","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020204203987900_btw552-B9","doi-asserted-by":"crossref","first-page":"27:1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM. A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":". ACM Trans. Intell. Syst. Technol"},{"volume-title":"Genomic Regulatory Systems: In Development and Evolution","year":"2001","author":"Davidson","key":"2023020204203987900_btw552-B10"},{"key":"2023020204203987900_btw552-B11","doi-asserted-by":"crossref","first-page":"e1003677","DOI":"10.1371\/journal.pcbi.1003677","article-title":"Integrating diverse datasets improves developmental enhancer prediction","volume":"10","author":"Erwin","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023020204203987900_btw552-B12","first-page":"1871","article-title":"LIBLINEAR: a library for large linear classification","volume":"9","author":"Fan","year":"2008","journal-title":"J Mach. Learn. Res"},{"key":"2023020204203987900_btw552-B13","doi-asserted-by":"crossref","first-page":"3666","DOI":"10.1093\/nar\/gkg540","article-title":"Cluster-Buster: finding dense clusters of motifs in DNA sequences","volume":"31","author":"Frith","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B14","doi-asserted-by":"crossref","first-page":"e1003711","DOI":"10.1371\/journal.pcbi.1003711","article-title":"Enhanced regulatory sequence prediction using gapped k-mer features","volume":"10","author":"Ghandi","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023020204203987900_btw552-B15","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1101\/gr.5533506","article-title":"FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin","volume":"17","author":"Giresi","year":"2007","journal-title":"Genome Res"},{"key":"2023020204203987900_btw552-B16","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1016\/j.devcel.2009.09.002","article-title":"Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse","volume":"17","author":"Kantorovitz","year":"2009","journal-title":"Dev. Cell"},{"key":"2023020204203987900_btw552-B17","doi-asserted-by":"crossref","first-page":"2301","DOI":"10.1093\/gbe\/evu184","article-title":"Evidence for deep regulatory similarities in early developmental programs across highly diverged insects","volume":"6","author":"Kazemian","year":"2014","journal-title":"Genome Biol. Evol"},{"key":"2023020204203987900_btw552-B18","doi-asserted-by":"crossref","first-page":"9463","DOI":"10.1093\/nar\/gkr621","article-title":"Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison","volume":"39","author":"Kazemian","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B19","doi-asserted-by":"crossref","first-page":"e6\u2013e6","DOI":"10.1093\/nar\/gku1058","article-title":"DEEP: a general computational framework for predicting enhancers","volume":"43","author":"Kleftogiannis","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B20","doi-asserted-by":"crossref","first-page":"2167","DOI":"10.1101\/gr.121905.111","article-title":"Discriminative prediction of mammalian enhancers from DNA sequence","volume":"21","author":"Lee","year":"2011","journal-title":"Genome Res"},{"key":"2023020204203987900_btw552-B21","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R. News"},{"key":"2023020204203987900_btw552-B22","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1101\/gr.098657.109","article-title":"Genome-wide discovery of human heart enhancers","volume":"20","author":"Narlikar","year":"2010","journal-title":"Genome Res"},{"key":"2023020204203987900_btw552-B23","first-page":"519","article-title":"Modulefinder: a tool for computational discovery of cis regulatory modules","author":"Philippakis","year":"2005","journal-title":"Pac. Symp. Biocomput"},{"key":"2023020204203987900_btw552-B24","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1093\/nar\/gks237","article-title":"Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection","volume":"40","author":"Sun","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020204203987900_btw552-B25","doi-asserted-by":"crossref","first-page":"854","DOI":"10.1038\/nature07730","article-title":"ChIP-seq accurately predicts tissue-specific activity of enhancers","volume":"457","author":"Visel","year":"2009","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/1\/1\/49037381\/bioinformatics_33_1_1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/1\/1\/49037381\/bioinformatics_33_1_1.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T04:21:28Z","timestamp":1675311688000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/1\/1\/2525664"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,9,7]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2017,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw552","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,1,1]]},"published":{"date-parts":[[2016,9,7]]}}}