{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T14:43:00Z","timestamp":1767192180275,"version":"3.41.2"},"reference-count":18,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T00:00:00Z","timestamp":1741910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Singapore International Graduate Award from the Agency for Science, Technology and Research"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Nanopore sequencing by Oxford Nanopore Technologies (ONT) enables direct analysis of DNA and RNA by capturing raw electrical signals. Different nanopore chemistries have varied k-mer lengths, current levels, and standard deviations, which are stored in \u201ck-mer models.\u201d In cases where official models are lacking or unsuitable for specific sequencing conditions, tailored k-mer models are crucial to ensure precise signal-to-sequence alignment, analysis and interpretation. The process of transforming raw signal data into nucleotide sequences, known as basecalling, is a fundamental step in nanopore sequencing.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this study, we leverage the move table produced by ONT\u2019s basecalling software to create a lightweight de novo k-mer model for RNA004 chemistry. We demonstrate the validity of our custom k-mer model by using it to guide signal-to-sequence alignment analysis, achieving high alignment rates (97.48%) compared to larger default models. Additionally, our 5-mer model exhibits similar performance as the default 9-mer models another analysis, such as detection of m6A RNA modifications. We provide our method, termed Poregen, as a generalizable approach for creation of custom, de novo k-mer models for nanopore signal data analysis.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Poregen is an open source package under an MIT license: https:\/\/github.com\/hiruna72\/poregen.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf111","type":"journal-article","created":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T14:15:39Z","timestamp":1741961739000},"source":"Crossref","is-referenced-by-count":3,"title":["Leveraging basecaller\u2019s move table to generate a lightweight k-mer model for nanopore sequencing analysis"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5812-1046","authenticated-orcid":false,"given":"Hiruna","family":"Samarakoon","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052,","place":["Australia"]},{"name":"Genomics and Inherited Disease Program, Garvan Institute of Medical Research , Sydney, NSW 2010,","place":["Australia"]},{"name":"Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children\u2019s Research Institute , Sydney, NSW 2010,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1774-261X","authenticated-orcid":false,"given":"Yuk Kei","family":"Wan","sequence":"additional","affiliation":[{"name":"Genome Institute of Singapore, A*STAR , Singapore 138672,","place":["Singapore"]},{"name":"Yong Loo Lin School of Medicine, National University of Singapore , Singapore 117597,","place":["Singapore"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0435-9080","authenticated-orcid":false,"given":"Sri","family":"Parameswaran","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, University of Sydney , Sydney, NSW 2008,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0825-4991","authenticated-orcid":false,"given":"Jonathan","family":"G\u00f6ke","sequence":"additional","affiliation":[{"name":"Genome Institute of Singapore, A*STAR , Singapore 138672,","place":["Singapore"]},{"name":"Department of Statistics and Data Science, National University of Singapore , Singapore 117546,","place":["Singapore"]},{"name":"National Cancer Center of Singapore , Singapore 168583,","place":["Singapore"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9034-9905","authenticated-orcid":false,"given":"Hasindu","family":"Gamaarachchi","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales , Sydney, NSW 2052,","place":["Australia"]},{"name":"Genomics and Inherited Disease Program, Garvan Institute of Medical Research , Sydney, NSW 2010,","place":["Australia"]},{"name":"Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children\u2019s Research Institute , Sydney, NSW 2010,","place":["Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3861-0472","authenticated-orcid":false,"given":"Ira W","family":"Deveson","sequence":"additional","affiliation":[{"name":"Genomics and Inherited Disease Program, Garvan Institute of Medical Research , Sydney, NSW 2010,","place":["Australia"]},{"name":"Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children\u2019s Research Institute , Sydney, NSW 2010,","place":["Australia"]},{"name":"St Vincent\u2019s Clinical School, Faculty of Medicine, University of New South Wales , Sydney, NSW 2052,","place":["Australia"]}]}],"member":"286","published-online":{"date-parts":[[2025,3,14]]},"reference":[{"volume-title":"Nat Methods","year":"2025","first-page":"1","key":"2025041602165890600_btaf111-B1"},{"key":"2025041602165890600_btaf111-B2","doi-asserted-by":"crossref","first-page":"6545","DOI":"10.1038\/s41467-021-26929-x","article-title":"Towards inferring nanopore sequencing ionic currents from nucleotide chemical structures","volume":"12","author":"Ding","year":"2021","journal-title":"Nat Commun"},{"key":"2025041602165890600_btaf111-B3","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/s41587-021-01147-4","article-title":"Fast nanopore sequencing data analysis with SLOW5","volume":"40","author":"Gamaarachchi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025041602165890600_btaf111-B4","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1186\/s12859-020-03697-x","article-title":"GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis","volume":"21","author":"Gamaarachchi","year":"2020","journal-title":"BMC Bioinformatics"},{"year":"2006","author":"Graves","first-page":"369","key":"2025041602165890600_btaf111-B5"},{"key":"2025041602165890600_btaf111-B6","doi-asserted-by":"crossref","first-page":"1590","DOI":"10.1038\/s41592-022-01666-1","article-title":"Detection of m6A from direct RNA sequencing using a multiple instance learning framework","volume":"19","author":"Hendra","year":"2022","journal-title":"Nat Methods"},{"key":"2025041602165890600_btaf111-B7","doi-asserted-by":"crossref","first-page":"14862","DOI":"10.1002\/ange.202013462","article-title":"Biological nanopore approach for single-molecule protein sequencing","volume":"133","author":"Hu","year":"2021","journal-title":"Angew Chem"},{"key":"2025041602165890600_btaf111-B8","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1038\/s41592-022-01633-w","article-title":"Advances in nanopore direct RNA sequencing","volume":"19","author":"Jain","year":"2022","journal-title":"Nat Methods"},{"year":"2024","author":"Kovaka","key":"2025041602165890600_btaf111-B9"},{"key":"2025041602165890600_btaf111-B10","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1186\/s13059-023-02910-3","article-title":"Flexible and efficient handling of nanopore sequencing signal data with slow5tools","volume":"24","author":"Samarakoon","year":"2023","journal-title":"Genome Biol"},{"key":"2025041602165890600_btaf111-B11","doi-asserted-by":"crossref","first-page":"btae501","DOI":"10.1093\/bioinformatics\/btae501","article-title":"Interactive visualization of nanopore sequencing signal data with Squigualiser","volume":"40","author":"Samarakoon","year":"2024","journal-title":"Bioinformatics"},{"key":"2025041602165890600_btaf111-B12","doi-asserted-by":"crossref","first-page":"giad046","DOI":"10.1093\/gigascience\/giad046","article-title":"Efficient real-time selective genome sequencing on resource-constrained devices","volume":"12","author":"Shih","year":"2023","journal-title":"GigaScience"},{"key":"2025041602165890600_btaf111-B13","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1038\/nmeth.4184","article-title":"Detecting DNA cytosine methylation using nanopore sequencing","volume":"14","author":"Simpson","year":"2017","journal-title":"Nat Methods"},{"key":"2025041602165890600_btaf111-B14","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1038\/s41587-021-01108-x","article-title":"Nanopore sequencing technology, bioinformatics and applications","volume":"39","author":"Wang","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025041602165890600_btaf111-B15","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/s13059-019-1727-y","article-title":"Performance of neural network basecalling tools for Oxford Nanopore sequencing","volume":"20","author":"Wick","year":"2019","journal-title":"Genome Biol"},{"year":"2017","author":"Zhan","first-page":"583","key":"2025041602165890600_btaf111-B16"},{"key":"2025041602165890600_btaf111-B17","doi-asserted-by":"crossref","first-page":"i477","DOI":"10.1093\/bioinformatics\/btab264","article-title":"Real-time mapping of nanopore raw signals","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2025041602165890600_btaf111-B18","doi-asserted-by":"crossref","first-page":"1949","DOI":"10.3390\/plants12101949","article-title":"6mA DNA methylation on genes in plants is associated with gene complexity, expression and duplication","volume":"12","author":"Zhang","year":"2023","journal-title":"Plants"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf111\/62415038\/btaf111.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf111\/62415038\/btaf111.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf111\/62415038\/btaf111.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T06:17:06Z","timestamp":1744784226000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf111\/8078598"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,14]]},"references-count":18,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf111","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,14]]},"article-number":"btaf111"}}