{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T00:18:59Z","timestamp":1775002739982,"version":"3.50.1"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,8]],"date-time":"2025-01-08T00:00:00Z","timestamp":1736294400000},"content-version":"vor","delay-in-days":13,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,12,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>As nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore\u2019s capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183\u2009s\/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257\u00d7 with the trade-off of a higher byte error rate of 3.52% compared to the state of the art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33\u2009bits\/base can be achieved, which is 4\u00d7 larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error-free reads when compared to DNA.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code for our soft decoder and data used to generate figures is available publicly in the Github repository https:\/\/github.com\/dna-storage\/hedges-soft-decoder (10.5281\/zenodo.11454877). All raw FAST5\/FASTQ data are available at 10.5281\/zenodo.11985454 and 10.5281\/zenodo.12014515.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf006","type":"journal-article","created":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T23:21:04Z","timestamp":1736205664000},"source":"Crossref","is-referenced-by-count":4,"title":["Nanopore decoding with speed and versatility for data storage"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5200-5909","authenticated-orcid":false,"given":"Kevin D","family":"Volkel","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, North Carolina State University , Raleigh, NC, 27606,","place":["United States"]}]},{"given":"Paul W","family":"Hook","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD, 21218,","place":["United States"]}]},{"given":"Albert","family":"Keung","sequence":"additional","affiliation":[{"name":"Department of Chemical and Biomolecular Engineering, North Carolina State University , Raleigh, NC, 27695,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2083-6027","authenticated-orcid":false,"given":"Winston","family":"Timp","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD, 21218,","place":["United States"]}]},{"given":"James M","family":"Tuck","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, North Carolina State University , Raleigh, NC, 27606,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,8]]},"reference":[{"key":"2025012310381420000_btaf006-B1","doi-asserted-by":"crossref","first-page":"5345","DOI":"10.1038\/s41467-020-19148-3","article-title":"Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction","volume":"11","author":"Antkowiak","year":"2020","journal-title":"Nat Commun"},{"key":"2025012310381420000_btaf006-B2","first-page":"8822","author":"Chandak"},{"key":"2025012310381420000_btaf006-B3","doi-asserted-by":"crossref","first-page":"nwab028","DOI":"10.1093\/nsr\/nwab028","article-title":"An artificial chromosome for data storage","volume":"8","author":"Chen","year":"2021","journal-title":"Natl Sci Rev"},{"key":"2025012310381420000_btaf006-B4","doi-asserted-by":"crossref","first-page":"6582","DOI":"10.1038\/s41598-019-43105-w","article-title":"High information capacity DNA-based data storage with augmented encoding characters using degenerate bases","volume":"9","author":"Choi","year":"2019","journal-title":"Sci Rep"},{"key":"2025012310381420000_btaf006-B5","doi-asserted-by":"crossref","first-page":"2552","DOI":"10.1002\/anie.201411378","article-title":"Robust chemical preservation of digital information on DNA in silica with error-correcting codes","volume":"54","author":"Grass","year":"2015","journal-title":"Angew Chem Int Ed Engl"},{"key":"2025012310381420000_btaf006-B6","first-page":"369","author":"Graves","year":"2006"},{"key":"2025012310381420000_btaf006-B7","first-page":"52075","author":"Hamoum","year":"2023"},{"key":"2025012310381420000_btaf006-B8","first-page":"1","author":"Hamoum"},{"key":"2025012310381420000_btaf006-B9","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1038\/s41587-020-0731-9","article-title":"Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED","volume":"39","author":"Kovaka","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025012310381420000_btaf006-B10","first-page":"267","volume-title":"Speech and Computer, Lecture Notes in Computer Science","author":"K\u00fcrzinger"},{"key":"2025012310381420000_btaf006-B11","first-page":"1","author":"Lenz"},{"key":"2025012310381420000_btaf006-B12","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1038\/nmeth.3930","article-title":"Real-time selective sequencing using nanopore technology","volume":"13","author":"Loose","year":"2016","journal-title":"Nat Methods"},{"key":"2025012310381420000_btaf006-B13","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1186\/s12859-022-04686-y","article-title":"RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data","volume":"23","author":"Neumann","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2025012310381420000_btaf006-B14","doi-asserted-by":"crossref","first-page":"eabi6714","DOI":"10.1126\/sciadv.abi6714","article-title":"Scaling DNA data storage with nanoscale electrode wells","volume":"7","author":"Nguyen","year":"2021","journal-title":"Sci Adv"},{"key":"2025012310381420000_btaf006-B15","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/nbt.4079","article-title":"Random access in large-scale DNA data storage","volume":"36","author":"Organick","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2025012310381420000_btaf006-B16","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1186\/s13059-023-02903-2","article-title":"Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling","volume":"24","author":"Pag\u00e8s-Gallego","year":"2023","journal-title":"Genome Biol"},{"key":"2025012310381420000_btaf006-B17","doi-asserted-by":"crossref","first-page":"18489","DOI":"10.1073\/pnas.2004821117","article-title":"HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints","volume":"117","author":"Press","year":"2020","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025012310381420000_btaf006-B18","first-page":"253","author":"Scheidl"},{"key":"2025012310381420000_btaf006-B19","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1021\/acssynbio.9b00100","article-title":"Driving the scalability of DNA-based information storage systems","volume":"8","author":"Tomek","year":"2019","journal-title":"ACS Synth Biol"},{"key":"2025012310381420000_btaf006-B20","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1038\/s41587-021-01108-x","article-title":"Nanopore sequencing technology, bioinformatics and applications","volume":"39","author":"Wang","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025012310381420000_btaf006-B21","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/s13059-019-1727-y","article-title":"Performance of neural network basecalling tools for Oxford nanopore sequencing","volume":"20","author":"Wick","year":"2019","journal-title":"Genome Biol"},{"key":"2025012310381420000_btaf006-B22","doi-asserted-by":"crossref","first-page":"5011","DOI":"10.1038\/s41598-017-05188-1","article-title":"Portable and error-free DNA-based data storage","volume":"7","author":"Yazdi","year":"2017","journal-title":"Sci Rep"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf006\/61381583\/btaf006.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btaf006\/61381583\/btaf006.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btaf006\/61381583\/btaf006.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,23]],"date-time":"2025-01-23T05:38:36Z","timestamp":1737610716000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf006\/7945662"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,12,26]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,12,26]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf006","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.06.18.599582","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,12,26]]},"article-number":"btaf006"}}