{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T03:56:11Z","timestamp":1776052571437,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T00:00:00Z","timestamp":1686873600000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["872539"],"award-info":[{"award-number":["872539"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>WarpSTR is freely available at https:\/\/github.com\/fmfi-compbio\/warpstr<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad388","type":"journal-article","created":{"date-parts":[[2023,6,15]],"date-time":"2023-06-15T00:51:56Z","timestamp":1686790316000},"source":"Crossref","is-referenced-by-count":13,"title":["WarpSTR: determining tandem repeat lengths using raw nanopore signals"],"prefix":"10.1093","volume":"39","author":[{"given":"Jozef","family":"Sitar\u010d\u00edk","sequence":"first","affiliation":[{"name":"Comenius University Science Park , Bratislava 841 04, Slovakia"},{"name":"Geneton Ltd , Bratislava 841 04, Slovakia"},{"name":"Slovak Centre of Scientific and Technical Information , Bratislava 811 04, Slovakia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3898-3447","authenticated-orcid":false,"given":"Tom\u00e1\u0161","family":"Vina\u0159","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics, Physics, and Informatics, Comenius University , Bratislava 842 48, Slovakia"}]},{"given":"Bro\u0148a","family":"Brejov\u00e1","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics, Physics, and Informatics, Comenius University , Bratislava 842 48, Slovakia"}]},{"given":"Werner","family":"Krampl","sequence":"additional","affiliation":[{"name":"Comenius University Science Park , Bratislava 841 04, Slovakia"},{"name":"Geneton Ltd , Bratislava 841 04, Slovakia"},{"name":"Department of Molecular Biology, Faculty of Natural Sciences, Comenius University , Bratislava 841 04, Slovakia"}]},{"given":"Jaroslav","family":"Budi\u0161","sequence":"additional","affiliation":[{"name":"Comenius University Science Park , Bratislava 841 04, Slovakia"},{"name":"Geneton Ltd , Bratislava 841 04, Slovakia"},{"name":"Slovak Centre of Scientific and Technical Information , Bratislava 811 04, Slovakia"}]},{"given":"J\u00e1n","family":"Radv\u00e1nszky","sequence":"additional","affiliation":[{"name":"Comenius University Science Park , Bratislava 841 04, Slovakia"},{"name":"Geneton Ltd , Bratislava 841 04, Slovakia"},{"name":"Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences , Bratislava 845 05, Slovakia"}]},{"given":"M\u00e1ria","family":"Luck\u00e1","sequence":"additional","affiliation":[{"name":"Slovak Centre of Scientific and Technical Information , Bratislava 811 04, Slovakia"},{"name":"KInIT\u2014Kempelen Institute of Intelligent Technologies , Bottova, 7939\/2A , Bratislava 811 09, Slovakia"}]}],"member":"286","published-online":{"date-parts":[[2023,6,16]]},"reference":[{"key":"2023062908345177400_btad388-B1","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1093\/hmg\/3.1.65","article-title":"A CCG repeat polymorphism adjacent to the CAG repeat in the Huntington disease gene: implications for diagnostic accuracy and predictive testing","volume":"3","author":"Andrew","year":"1994","journal-title":"Hum Mol Genet"},{"key":"2023062908345177400_btad388-B2","doi-asserted-by":"crossref","first-page":"736","DOI":"10.12688\/f1000research.13980.1","article-title":"Recent advances in the detection of repeat expansions with short-read next-generation sequencing","volume":"7","author":"Bahlo","year":"2018","journal-title":"F1000Res"},{"key":"2023062908345177400_btad388-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TAC.1959.1104847","article-title":"On adaptive control processes","volume":"4","author":"Bellman","year":"1959","journal-title":"IRE Trans Automat Contr"},{"key":"2023062908345177400_btad388-B4","doi-asserted-by":"crossref","first-page":"1310","DOI":"10.1093\/bioinformatics\/bty791","article-title":"Dante: genotyping of known complex and expanded short tandem repeats","volume":"35","author":"Budi\u0161","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B5","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1186\/s13059-018-1505-2","article-title":"STRetch: detecting and discovering pathogenic short tandem repeat expansions","volume":"19","author":"Dashnow","year":"2018","journal-title":"Genome Biol"},{"key":"2023062908345177400_btad388-B6","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1186\/s13059-019-1856-3","article-title":"NanoSatellite: accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION","volume":"20","author":"De Roeck","year":"2019","journal-title":"Genome Biol"},{"key":"2023062908345177400_btad388-B7","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1016\/j.ajhg.2021.03.011","article-title":"30 years of repeat expansion disorders: what have we learned and what are the remaining challenges?","volume":"108","author":"Depienne","year":"2021","journal-title":"Am J Hum Genet"},{"key":"2023062908345177400_btad388-B8","doi-asserted-by":"crossref","first-page":"4754","DOI":"10.1093\/bioinformatics\/btz431","article-title":"ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions","volume":"35","author":"Dolzhenko","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B9","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1002\/ana.410320517","article-title":"Severity of X-linked recessive bulbospinal neuronopathy correlates with size of the tandem CAG repeat in androgen receptor gene","volume":"32","author":"Doyu","year":"1992","journal-title":"Ann Neurol"},{"key":"2023062908345177400_btad388-B10","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/s13059-015-0670-9","article-title":"Split-alignment of genomes finds orthologies more accurately","volume":"16","author":"Frith","year":"2015","journal-title":"Genome Biol"},{"key":"2023062908345177400_btad388-B11","doi-asserted-by":"crossref","first-page":"D80","DOI":"10.1093\/nar\/gkl1013","article-title":"TRDB\u2014the tandem repeats database","volume":"35","author":"Gelfand","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023062908345177400_btad388-B12","doi-asserted-by":"crossref","first-page":"1478","DOI":"10.1038\/s41587-019-0293-x","article-title":"Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing","volume":"37","author":"Giesselmann","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023062908345177400_btad388-B13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.gde.2017.01.012","article-title":"A genomic view of short tandem repeats","volume":"44","author":"Gymrek","year":"2017","journal-title":"Curr Opin Genet Dev"},{"key":"2023062908345177400_btad388-B14","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1038\/ng.3461","article-title":"Abundant contribution of short tandem repeats to gene expression variation in humans","volume":"48","author":"Gymrek","year":"2016","journal-title":"Nat Genet"},{"key":"2023062908345177400_btad388-B15","doi-asserted-by":"crossref","first-page":"200","DOI":"10.12688\/f1000research.22639.1","article-title":"Accuracy of short tandem repeats genotyping tools in whole exome sequencing data","volume":"9","author":"Halman","year":"2020","journal-title":"F1000Res"},{"key":"2023062908345177400_btad388-B16","doi-asserted-by":"crossref","first-page":"i722","DOI":"10.1093\/bioinformatics\/bty555","article-title":"An accurate and rapid continuous wavelet dynamic time warping algorithm for end-to-end mapping in ultra-long nanopore sequencing","volume":"34","author":"Han","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B17","doi-asserted-by":"crossref","first-page":"1333","DOI":"10.1093\/bioinformatics\/btz742","article-title":"Novel algorithms for efficient subsequence searching and mapping in nanopore raw signals towards targeted sequencing","volume":"36","author":"Han","year":"2020","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B18","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2023062908345177400_btad388-B19","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1186\/s13073-017-0456-7","article-title":"Interrogating the \u201cunsequenceable\u201d genomic trinucleotide repeat disorders by long-read sequencing","volume":"9","author":"Liu","year":"2017","journal-title":"Genome Med"},{"key":"2023062908345177400_btad388-B20","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1186\/s12859-020-03876-w","article-title":"Genome-wide detection of short tandem repeat expansions by long-read sequencing","volume":"21","author":"Liu","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2023062908345177400_btad388-B21","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1038\/nmeth.3930","article-title":"Real-time selective sequencing using nanopore technology","volume":"13","author":"Loose","year":"2016","journal-title":"Nat Methods"},{"key":"2023062908345177400_btad388-B22","doi-asserted-by":"crossref","first-page":"1201","DOI":"10.1007\/s00439-019-02064-y","article-title":"Long-read sequencing in deciphering human genetics to a greater depth","volume":"138","author":"Midha","year":"2019","journal-title":"Hum Genet"},{"key":"2023062908345177400_btad388-B23","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1186\/s13059-019-1667-6","article-title":"Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads","volume":"20","author":"Mitsuhashi","year":"2019","journal-title":"Genome Biol"},{"key":"2023062908345177400_btad388-B24","doi-asserted-by":"crossref","first-page":"1365","DOI":"10.1002\/ajmg.a.32987","article-title":"Highly unstable sequence interruptions of the CTG repeat in the myotonic dystrophy gene","volume":"149A","author":"Musova","year":"2009","journal-title":"Am J Med Genet A"},{"key":"2023062908345177400_btad388-B25","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learning Res"},{"key":"2023062908345177400_btad388-B26","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1089\/gtmb.2010.0073","article-title":"The expanding world of myotonic dystrophies: how can they be detected?","volume":"14","author":"Radvansky","year":"2010","journal-title":"Genet Test Mol Biomarkers"},{"key":"2023062908345177400_btad388-B27","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1097\/PDM.0b013e3181efe290","article-title":"Effect of unexpected sequence interruptions to conventional PCR and repeat primed PCR in myotonic dystrophy type 1 testing","volume":"20","author":"Radvansky","year":"2011","journal-title":"Diagn Mol Pathol"},{"key":"2023062908345177400_btad388-B28","doi-asserted-by":"crossref","first-page":"3934","DOI":"10.3390\/jcm10173934","article-title":"Characterisation of non-pathogenic premutation-range myotonic dystrophy type 2 alleles","volume":"10","author":"Radvanszky","year":"2021","journal-title":"JCM"},{"key":"2023062908345177400_btad388-B29","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1038\/s41576-018-0003-4","article-title":"Piercing the dark matter: bioinformatics of long-range sequencing and mapping","volume":"19","author":"Sedlazeck","year":"2018","journal-title":"Nat Rev Genet"},{"key":"2023062908345177400_btad388-B30","doi-asserted-by":"crossref","first-page":"3491","DOI":"10.1093\/bioinformatics\/btu437","article-title":"Resolving complex tandem repeats with long reads","volume":"30","author":"Ummat","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B31","doi-asserted-by":"crossref","first-page":"100128","DOI":"10.1016\/j.xgen.2022.100128","article-title":"Benchmarking challenging small variants with linked and long reads","volume":"2","author":"Wagner","year":"2022","journal-title":"Cell Genomics"},{"key":"2023062908345177400_btad388-B32","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/s13059-019-1727-y","article-title":"Performance of neural network basecalling tools for oxford nanopore sequencing","volume":"20","author":"Wick","year":"2019","journal-title":"Genome Biol"},{"key":"2023062908345177400_btad388-B33","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1038\/nmeth.4267","article-title":"Genome-wide profiling of heritable and de novo str variations","volume":"14","author":"Willems","year":"2017","journal-title":"Nat Methods"},{"key":"2023062908345177400_btad388-B34","doi-asserted-by":"crossref","first-page":"i477","DOI":"10.1093\/bioinformatics\/btab264","article-title":"Real-time mapping of nanopore raw signals","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2023062908345177400_btad388-B35","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/s41587-019-0074-6","article-title":"An open resource for accurately benchmarking small variant and reference calls","volume":"37","author":"Zook","year":"2019","journal-title":"Nat Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad388\/50628662\/btad388.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad388\/50738104\/btad388.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/6\/btad388\/50738104\/btad388.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T04:35:40Z","timestamp":1688013340000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad388\/7199589"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad388","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.11.05.515275","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]},"article-number":"btad388"}}