{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T20:03:25Z","timestamp":1775160205941,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T00:00:00Z","timestamp":1761091200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Molecular Pathology"},{"name":"Boehringer Ingelheim GmbH and the Austrian Research Promotion Agency","award":["FFG, FO999902549"],"award-info":[{"award-number":["FFG, FO999902549"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Pairwise sequence similarity is a core operation in genomic analysis, yet most attention has been given to sequences made up of discrete characters. With the growing prevalence of machine learning, calculating similarities for sequences of continuous representations, e.g. frequency-based position-weight matrices (PWMs) and attribution-based contribution-weight matrices, is taking on newfound importance. Tomtom has previously been proposed as an algorithm for identifying pairs of PWMs whose similarity is statistically significant, but the implementation remains inefficient for both real-time and large-scale analysis. Accordingly, we have re-implemented Tomtom as a numba-accelerated Python function that is natively multi-threaded, avoids cache misses, more efficiently caches intermediate values, and uses approximations at compute bottlenecks. Here, we provide a detailed description of the original Tomtom method and present results demonstrating that our re-implementation can achieve over a 1000-fold speedup compared with the original tool on reasonable tasks.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Our implementation of Tomtom is freely available as a Python package at https:\/\/github.com\/jmschrei\/memesuite-lite, which can be downloaded via pip install memelite or at https:\/\/zenodo.org\/records\/17008952.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf577","type":"journal-article","created":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T12:26:18Z","timestamp":1760531178000},"source":"Crossref","is-referenced-by-count":3,"title":["Tomtom-lite: accelerating Tomtom enables large-scale and real-time motif similarity scoring"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4230-6625","authenticated-orcid":false,"given":"Jacob","family":"Schreiber","sequence":"first","affiliation":[{"name":"Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC) , Vienna 1030,","place":["Austria"]},{"name":"Department of Genomics and Computational Biology, UMass Chan Medical School , Worcester, MA 01655,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,10,22]]},"reference":[{"key":"2025111616061047900_btaf577-B1","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat Methods"},{"key":"2025111616061047900_btaf577-B2","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","article-title":"Base-resolution models of transcription-factor binding reveal soft motif syntax","volume":"53","author":"Avsec","year":"2021","journal-title":"Nat Genet"},{"key":"2025111616061047900_btaf577-B3","doi-asserted-by":"crossref","first-page":"2834","DOI":"10.1093\/bioinformatics\/btab203","article-title":"STREME: accurate and versatile sequence motif discovery","volume":"37","author":"Bailey","year":"2021","journal-title":"Bioinformatics"},{"key":"2025111616061047900_btaf577-B4","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in biopolymers","volume":"2","author":"Bailey","year":"1994","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"key":"2025111616061047900_btaf577-B5","first-page":"115","article-title":"Scoring pairwise genomic sequence alignments","author":"Chiaromonte","year":"2002","journal-title":"Pac Symp Biocomput"},{"key":"2025111616061047900_btaf577-B6","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1038\/s41592-024-02523-z","article-title":"Nucleotide transformer: building and evaluating robust foundation models for human genomics","volume":"22","author":"Dalla-Torre","year":"2025","journal-title":"Nat Methods"},{"key":"2025111616061047900_btaf577-B7","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2025111616061047900_btaf577-B8","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1093\/bioinformatics\/btr064","article-title":"FIMO: scanning for occurrences of a given motif","volume":"27","author":"Grant","year":"2011","journal-title":"Bioinformatics"},{"key":"2025111616061047900_btaf577-B9","doi-asserted-by":"crossref","first-page":"R24","DOI":"10.1186\/gb-2007-8-2-r24","article-title":"Quantifying similarity between motifs","volume":"8","author":"Gupta","year":"2007","journal-title":"Genome Biol"},{"key":"2025111616061047900_btaf577-B10","author":"Haque","year":"2009"},{"key":"2025111616061047900_btaf577-B11","doi-asserted-by":"crossref","first-page":"102887","DOI":"10.1016\/j.copbio.2022.102887","article-title":"Deep learning in regulatory genomics: from identification to design","volume":"79","author":"Hu","year":"2023","journal-title":"Curr Opin Biotechnol"},{"key":"2025111616061047900_btaf577-B12","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.cell.2012.12.009","article-title":"DNA-binding specificities of human transcription factors","volume":"152","author":"Jolma","year":"2013","journal-title":"Cell"},{"key":"2025111616061047900_btaf577-B13","first-page":"87","author":"Kluyver","year":"2016"},{"key":"2025111616061047900_btaf577-B14","doi-asserted-by":"crossref","first-page":"3181","DOI":"10.1093\/bioinformatics\/btp554","article-title":"MOODS: fast search for position weight matrix matches in DNA sequences","volume":"25","author":"Korhonen","year":"2009","journal-title":"Bioinformatics"},{"key":"2025111616061047900_btaf577-B15","author":"Lam","year":"2015"},{"key":"2025111616061047900_btaf577-B16","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nrg3920","article-title":"Machine learning applications in genetics and genomics","volume":"16","author":"Libbrecht","year":"2015","journal-title":"Nat Rev Genet"},{"key":"2025111616061047900_btaf577-B17","first-page":"4768","author":"Lundberg","year":"2017"},{"key":"2025111616061047900_btaf577-B18","doi-asserted-by":"crossref","first-page":"e50","DOI":"10.1093\/nar\/gkr1135","article-title":"A highly efficient and effective motif discovery method for ChIP-seq\/ChIP-chip data using positional information","volume":"40","author":"Ma","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2025111616061047900_btaf577-B19","first-page":"43177","article-title":"HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution","author":"Nguyen","year":"2023"},{"key":"2025111616061047900_btaf577-B20","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/s41576-022-00532-2","article-title":"Obtaining genetics insights from deep learning via explainable artificial intelligence","volume":"24","author":"Novakovsky","year":"2023","journal-title":"Nat Rev Genet"},{"key":"2025111616061047900_btaf577-B21","author":"Pampari","year":"2025"},{"key":"2025111616061047900_btaf577-B22","doi-asserted-by":"crossref","first-page":"D174","DOI":"10.1093\/nar\/gkad1059","article-title":"JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles","volume":"52","author":"Rauluseviciute","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025111616061047900_btaf577-B23","author":"Schreiber","year":"2025"},{"key":"2025111616061047900_btaf577-B24","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.cbpa.2021.04.008","article-title":"Machine learning for profile prediction in genomics","volume":"65","author":"Schreiber","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2025111616061047900_btaf577-B25","first-page":"3145","author":"Shrikumar","year":"2017"},{"key":"2025111616061047900_btaf577-B26","author":"Shrikumar","year":"2018"},{"key":"2025111616061047900_btaf577-B27","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J Mol Biol"},{"key":"2025111616061047900_btaf577-B28","doi-asserted-by":"crossref","first-page":"1603","DOI":"10.1093\/bioinformatics\/btr257","article-title":"Improved similarity scores for comparing motifs","volume":"27","author":"Tanaka","year":"2011","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf577\/64857516\/btaf577.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf577\/64857516\/btaf577.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/11\/btaf577\/64857516\/btaf577.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,16]],"date-time":"2025-11-16T21:06:20Z","timestamp":1763327180000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf577\/8297099"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,10,22]]},"references-count":28,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf577","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,10,22]]},"article-number":"btaf577"}}