{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T17:13:35Z","timestamp":1777050815236,"version":"3.51.4"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011491","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T00:00:00Z","timestamp":1701648000000}}],"reference-count":60,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:00:00Z","timestamp":1700438400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["106955\/Z\/15\/Z"],"award-info":[{"award-number":["106955\/Z\/15\/Z"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000265","name":"Medical Research Council","doi-asserted-by":"publisher","award":["MC_UP_1102\/1"],"award-info":[{"award-number":["MC_UP_1102\/1"]}],"id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Core promoters are stretches of DNA at the beginning of genes that contain information that facilitates the binding of transcription initiation complexes. Different functional subsets of genes have core promoters with distinct architectures and characteristic motifs. Some of these motifs inform the selection of transcription start sites (TSS). By discovering motifs with fixed distances from known TSS positions, we could in principle classify promoters into different functional groups. Due to the variability and overlap of architectures, promoter classification is a difficult task that requires new approaches. In this study, we present a new method based on non-negative matrix factorisation (NMF) and the associated software called seqArchR that clusters promoter sequences based on their motifs at near-fixed distances from a reference point, such as TSS. When combined with experimental data from CAGE, seqArchR can efficiently identify TSS-directing motifs, including known ones like TATA, DPE, and nucleosome positioning signal, as well as novel lineage-specific motifs and the function of genes associated with them. By using seqArchR on developmental time courses, we reveal how relative use of promoter architectures changes over time with stage-specific expression. seqArchR is a powerful tool for initial genome-wide classification and functional characterisation of promoters. Its use cases are more general: it can also be used to discover any motifs at near-fixed distances from a reference point, even if they are present in only a small subset of sequences.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011491","type":"journal-article","created":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T13:44:01Z","timestamp":1700487841000},"page":"e1011491","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":4,"title":["Identifying promoter sequence architectures via a chunking-based algorithm using non-negative matrix factorisation"],"prefix":"10.1371","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3163-4447","authenticated-orcid":true,"given":"Sarvesh","family":"Nikumbh","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1114-1509","authenticated-orcid":true,"given":"Boris","family":"Lenhard","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2023,11,20]]},"reference":[{"issue":"20","key":"pcbi.1011491.ref001","doi-asserted-by":"crossref","first-page":"2583","DOI":"10.1101\/gad.1026202","article-title":"The RNA polymerase II core promoter: a key component in the regulation of gene expression","volume":"16","author":"JE Butler","year":"2002","journal-title":"Genes & development"},{"issue":"1","key":"pcbi.1011491.ref002","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1002\/wdev.21","article-title":"Perspectives on the RNA polymerase II core promoter","volume":"1","author":"JT Kadonaga","year":"2012","journal-title":"Wiley Interdisciplinary Reviews: Developmental Biology"},{"issue":"6","key":"pcbi.1011491.ref003","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1038\/ng1789","article-title":"Genome-wide analysis of mammalian promoter architecture and evolution","volume":"38","author":"P Carninci","year":"2006","journal-title":"Nature genetics"},{"issue":"2","key":"pcbi.1011491.ref004","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.ydbio.2009.08.009","article-title":"Regulation of gene expression via the core promoter and the basal transcriptional machinery","volume":"339","author":"T Juven-Gershon","year":"2010","journal-title":"Developmental biology"},{"issue":"7492","key":"pcbi.1011491.ref005","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nature12974","article-title":"Two independent transcription initiation codes overlap on vertebrate core promoters","volume":"507","author":"V Haberle","year":"2014","journal-title":"Nature"},{"issue":"10","key":"pcbi.1011491.ref006","doi-asserted-by":"crossref","first-page":"e1005144","DOI":"10.1371\/journal.pcbi.1005144","article-title":"Influence of rotational nucleosome positioning on transcription start site selection in animal promoters","volume":"12","author":"R Dreos","year":"2016","journal-title":"PLoS computational biology"},{"key":"pcbi.1011491.ref007","first-page":"11","volume-title":"Seminars in cell & developmental biology","author":"V Haberle","year":"2016"},{"issue":"20","key":"pcbi.1011491.ref008","doi-asserted-by":"crossref","first-page":"12388","DOI":"10.1093\/nar\/gku924","article-title":"Multiple novel promoter-architectures revealed by decoding the hidden heterogeneity within the genome","volume":"42","author":"L Narlikar","year":"2014","journal-title":"Nucleic acids research"},{"issue":"6755","key":"pcbi.1011491.ref009","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","volume":"401","author":"DD Lee","year":"1999","journal-title":"Nature"},{"issue":"8","key":"pcbi.1011491.ref010","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2006-7-8-r78","article-title":"Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters","volume":"7","author":"J Ponjavic","year":"2006","journal-title":"Genome biology"},{"issue":"5","key":"pcbi.1011491.ref011","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1093\/bioinformatics\/btv645","article-title":"No Promoter Left Behind (NPLB): learn de novo promoter architectures from genome-wide transcription start sites","volume":"32","author":"S Mitra","year":"2016","journal-title":"Bioinformatics"},{"issue":"12","key":"pcbi.1011491.ref012","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"JP Brunet","year":"2004","journal-title":"Proceedings of the national academy of sciences"},{"issue":"23","key":"pcbi.1011491.ref013","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1093\/bioinformatics\/btn526","article-title":"Position-dependent motif characterization using non-negative matrix factorization","volume":"24","author":"LN Hutchins","year":"2008","journal-title":"Bioinformatics"},{"issue":"10","key":"pcbi.1011491.ref014","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1016\/j.tig.2018.07.003","article-title":"Enter the matrix: factorization uncovers knowledge from omics","volume":"34","author":"GL Stein-O\u2019Brien","year":"2018","journal-title":"Trends in Genetics"},{"issue":"1","key":"pcbi.1011491.ref015","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1093\/nar\/gks950","article-title":"MuMoD: a Bayesian approach to detect multiple modes of protein\u2013DNA binding from genome-wide ChIP data","volume":"41","author":"L Narlikar","year":"2012","journal-title":"Nucleic Acids Research"},{"issue":"7","key":"pcbi.1011491.ref016","doi-asserted-by":"crossref","first-page":"1209","DOI":"10.1101\/gr.159384.113","article-title":"Comparative validation of the D. melanogaster modENCODE transcriptome annotation","volume":"24","author":"ZX Chen","year":"2014","journal-title":"Genome research"},{"issue":"4","key":"pcbi.1011491.ref017","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1038\/ng.3791","article-title":"Promoter shape varies across populations and affects promoter evolution and expression noise","volume":"49","author":"IE Schor","year":"2017","journal-title":"Nature genetics"},{"issue":"11","key":"pcbi.1011491.ref018","doi-asserted-by":"crossref","first-page":"1938","DOI":"10.1101\/gr.153692.112","article-title":"Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis","volume":"23","author":"C Nepal","year":"2013","journal-title":"Genome research"},{"issue":"7414","key":"pcbi.1011491.ref019","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"Consortium The ENCODE Project","year":"2012","journal-title":"Nature"},{"issue":"D1","key":"pcbi.1011491.ref020","doi-asserted-by":"crossref","first-page":"D794","DOI":"10.1093\/nar\/gkx1081","article-title":"The Encyclopedia of DNA elements (ENCODE): data portal update","volume":"46","author":"CA Davis","year":"2018","journal-title":"Nucleic acids research"},{"issue":"5","key":"pcbi.1011491.ref021","doi-asserted-by":"crossref","first-page":"1445","DOI":"10.1093\/nar\/gki282","article-title":"NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence","volume":"33","author":"TA Down","year":"2005","journal-title":"Nucleic acids research"},{"issue":"19","key":"pcbi.1011491.ref022","doi-asserted-by":"crossref","first-page":"6305","DOI":"10.1093\/nar\/gkp682","article-title":"CpG-depleted promoters harbor tissue-specific transcription factor binding signals\u2014implications for motif overrepresentation analyses","volume":"37","author":"HG Roider","year":"2009","journal-title":"Nucleic Acids Research"},{"issue":"12","key":"pcbi.1011491.ref023","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2002-3-12-research0087","article-title":"Computational analysis of core promoters in the Drosophila genome","volume":"3","author":"U Ohler","year":"2002","journal-title":"Genome biology"},{"issue":"20","key":"pcbi.1011491.ref024","doi-asserted-by":"crossref","first-page":"5943","DOI":"10.1093\/nar\/gkl608","article-title":"Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction","volume":"34","author":"U Ohler","year":"2006","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"pcbi.1011491.ref025","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-13687-0","article-title":"Dual-initiation promoters with intertwined canonical and TCT\/TOP transcription start sites diversify transcript processing","volume":"11","author":"C Nepal","year":"2020","journal-title":"Nature communications"},{"issue":"7","key":"pcbi.1011491.ref026","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2006-7-7-r53","article-title":"Comparative genomics of Drosophila and human core promoters","volume":"7","author":"PC FitzGerald","year":"2006","journal-title":"Genome biology"},{"issue":"6","key":"pcbi.1011491.ref027","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1101\/gad.10.6.711","article-title":"Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters","volume":"10","author":"TW Burke","year":"1996","journal-title":"Genes & development"},{"issue":"18","key":"pcbi.1011491.ref028","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1101\/gad.1951110","article-title":"The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery","volume":"24","author":"TJ Parry","year":"2010","journal-title":"Genes & development"},{"issue":"5","key":"pcbi.1011491.ref029","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1101\/gr.113381.110","article-title":"Core promoter T-blocks correlate with gene expression levels in C. elegans","volume":"21","author":"V Grishkevich","year":"2011","journal-title":"Genome research"},{"issue":"7","key":"pcbi.1011491.ref030","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1016\/j.bpj.2012.08.030","article-title":"TATA Binding Proteins Can Recognize Nontraditional DNA Sequences","volume":"103","author":"S Ahn","year":"2012","journal-title":"Biophysical Journal"},{"issue":"7193","key":"pcbi.1011491.ref031","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1038\/nature06929","article-title":"Nucleosome organization in the Drosophila genome","volume":"453","author":"TN Mavrich","year":"2008","journal-title":"Nature"},{"issue":"6122","key":"pcbi.1011491.ref032","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1126\/science.1229386","article-title":"Precise maps of RNA polymerase reveal how promoters direct initiation and pausing","volume":"339","author":"H Kwak","year":"2013","journal-title":"Science"},{"issue":"12","key":"pcbi.1011491.ref033","doi-asserted-by":"crossref","first-page":"1898","DOI":"10.1101\/gr.6669607","article-title":"Genomic regulatory blocks underlie extensive microsynteny conservation in insects","volume":"17","author":"PG Engstr\u00f6m","year":"2007","journal-title":"Genome research"},{"issue":"2","key":"pcbi.1011491.ref034","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1016\/j.dci.2014.05.003","article-title":"Functional characterization of mannose-binding lectin in zebrafish: Implication for a lectin-dependent complement system in early embryos","volume":"46","author":"L Yang","year":"2014","journal-title":"Developmental & Comparative Immunology"},{"issue":"7","key":"pcbi.1011491.ref035","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.1038\/s41588-022-01089-w","article-title":"Multiomic atlas with functional stratification and developmental dynamics of zebrafish cis-regulatory elements","volume":"54","author":"D Baranasic","year":"2022","journal-title":"Nature genetics"},{"issue":"2","key":"pcbi.1011491.ref036","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.devcel.2022.12.007","article-title":"The miR-430 locus with extreme promoter density forms a transcription body during the minor wave of zygotic genome activation","volume":"58","author":"Y Hadzhiev","year":"2023","journal-title":"Developmental Cell"},{"issue":"5770","key":"pcbi.1011491.ref037","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1126\/science.1122689","article-title":"Zebrafish MiR-430 Promotes Deadenylation and Clearance of Maternal mRNAs","volume":"312","author":"AJ Giraldez","year":"2006","journal-title":"Science"},{"issue":"6078","key":"pcbi.1011491.ref038","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1126\/science.1215704","article-title":"Ribosome Profiling Shows That miR-430 Reduces Translation Before Causing mRNA Decay in Zebrafish","volume":"336","author":"AA Bazzini","year":"2012","journal-title":"Science"},{"issue":"4","key":"pcbi.1011491.ref039","doi-asserted-by":"crossref","first-page":"160009","DOI":"10.1098\/rsob.160009","article-title":"Structure and evolutionary history of a large family of NLR proteins in the zebrafish","volume":"6","author":"K Howe","year":"2016","journal-title":"Open biology"},{"issue":"7493","key":"pcbi.1011491.ref040","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1038\/nature12787","article-title":"An atlas of active enhancers across human cell types and tissues","volume":"507","author":"R Andersson","year":"2014","journal-title":"Nature"},{"issue":"5","key":"pcbi.1011491.ref041","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1093\/bioinformatics\/bti042","article-title":"Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification","volume":"21","author":"I Yanai","year":"2005","journal-title":"Bioinformatics"},{"issue":"3","key":"pcbi.1011491.ref042","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1101\/gr.214202","article-title":"The human ribosomal protein genes: sequencing and comparative analysis of 73 genes","volume":"12","author":"M Yoshihama","year":"2002","journal-title":"Genome research"},{"issue":"1","key":"pcbi.1011491.ref043","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-7-37","article-title":"Characteristics and clustering of human ribosomal protein genes","volume":"7","author":"K Ishii","year":"2006","journal-title":"BMC genomics"},{"issue":"1","key":"pcbi.1011491.ref044","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.ygeno.2011.03.009","article-title":"Over-represented localized sequence motifs in ribosomal protein gene promoters of basal metazoans","volume":"98","author":"D Perina","year":"2011","journal-title":"Genomics"},{"key":"pcbi.1011491.ref045","article-title":"TF-MoDISco v0.4.4.2-alpha: Technical Note","author":"A Shrikumar","year":"2018","journal-title":"CoRR"},{"issue":"10","key":"pcbi.1011491.ref046","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1016\/j.tig.2013.05.010","article-title":"Human housekeeping genes, revisited","volume":"29","author":"E Eisenberg","year":"2013","journal-title":"TRENDS in Genetics"},{"issue":"3","key":"pcbi.1011491.ref047","doi-asserted-by":"crossref","first-page":"3313","DOI":"10.18632\/aging.202648","article-title":"Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues","volume":"13","author":"D Palmer","year":"2021","journal-title":"Aging (Albany NY)"},{"issue":"3","key":"pcbi.1011491.ref048","doi-asserted-by":"crossref","first-page":"e9722","DOI":"10.1371\/journal.pone.0009722","article-title":"Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix","volume":"5","author":"R Siddharthan","year":"2010","journal-title":"PloS one"},{"issue":"9","key":"pcbi.1011491.ref049","doi-asserted-by":"crossref","first-page":"e1003214","DOI":"10.1371\/journal.pcbi.1003214","article-title":"The next generation of transcription factor binding site prediction","volume":"9","author":"A Mathelier","year":"2013","journal-title":"PLoS computational biology"},{"issue":"0","key":"pcbi.1011491.ref050","article-title":"Biostrings: Efficient manipulation of biological strings","volume":"2","author":"H Pag\u00e8s","year":"2017","journal-title":"R package version"},{"issue":"16","key":"pcbi.1011491.ref051","doi-asserted-by":"crossref","first-page":"4290","DOI":"10.1073\/pnas.1521171113","article-title":"Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks","volume":"113","author":"S Wu","year":"2016","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"2","key":"pcbi.1011491.ref052","first-page":"564","article-title":"Bi-cross-validation of the SVD and the nonnegative matrix factorization","volume":"3","author":"AB Owen","year":"2009","journal-title":"The annals of applied statistics"},{"issue":"2","key":"pcbi.1011491.ref053","first-page":"1","article-title":"Patterns of joint involvement in juvenile idiopathic arthritis and prediction of disease course: A prospective study with multilayer non-negative matrix factorization","volume":"16","author":"SWM Eng","year":"2019","journal-title":"PLOS Medicine"},{"key":"pcbi.1011491.ref054","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2009"},{"issue":"3","key":"pcbi.1011491.ref055","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1587\/transfun.E92.A.708","article-title":"Fast local algorithms for large scale nonnegative matrix and tensor factorizations","volume":"92","author":"A Cichocki","year":"2009","journal-title":"IEICE transactions on fundamentals of electronics, communications and computer sciences"},{"issue":"4","key":"pcbi.1011491.ref056","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1016\/j.patcog.2007.09.010","article-title":"SVD based initialization: A head start for nonnegative matrix factorization","volume":"41","author":"C Boutsidis","year":"2008","journal-title":"Pattern recognition"},{"issue":"13","key":"pcbi.1011491.ref057","doi-asserted-by":"crossref","first-page":"e119","DOI":"10.1093\/nar\/gkx314","article-title":"RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections","volume":"45","author":"JA Castro-Mondragon","year":"2017","journal-title":"Nucleic Acids Research"},{"issue":"20","key":"pcbi.1011491.ref058","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","article-title":"Sequence logos: a new way to display consensus sequences","volume":"18","author":"TD Schneider","year":"1990","journal-title":"Nucleic acids research"},{"key":"pcbi.1011491.ref059","unstructured":"Nikumbh S. snikumbh\/archR: archR_v0.1.8; 2021. Available from: https:\/\/doi.org\/10.5281\/zenodo.5055408."},{"key":"pcbi.1011491.ref060","unstructured":"FitzJohn R. remake: Make-like build management, reimagined for R;. Available from: https:\/\/github.com\/richfitz\/remake."}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011491","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T00:00:00Z","timestamp":1701648000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011491","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T13:46:11Z","timestamp":1701697571000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011491"}},"subtitle":[],"editor":[{"given":"Denis","family":"Thieffry","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2023,11,20]]},"references-count":60,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,11,20]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011491","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.03.02.530868","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,20]]}}}