{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:55Z","timestamp":1772138035744,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"22-23","license":[{"start":{"date-parts":[[2020,12,16]],"date-time":"2020-12-16T00:00:00Z","timestamp":1608076800000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Center for Science of Information, Siemens, Philips and National Institutes of Health"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling\/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35\u201350% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (\u22720.2% reduction) and consensus accuracy (\u22720.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availabilityand implementation<\/jats:title>\n                    <jats:p>The code is available at https:\/\/github.com\/shubhamchandak94\/lossy_compression_evaluation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1017","type":"journal-article","created":{"date-parts":[[2020,11,24]],"date-time":"2020-11-24T21:43:35Z","timestamp":1606254215000},"page":"5313-5321","source":"Crossref","is-referenced-by-count":8,"title":["Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1130-9762","authenticated-orcid":false,"given":"Shubham","family":"Chandak","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Kedar","family":"Tatwawadi","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Srivatsan","family":"Sridhar","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University , Stanford, CA 94305, USA"}]},{"given":"Tsachy","family":"Weissman","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University , Stanford, CA 94305, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,12,16]]},"reference":[{"key":"2023062803480575000_btaa1017-B1","first-page":"342","author":"Chandak","year":"2020"},{"key":"2023062803480575000_btaa1017-B2","doi-asserted-by":"crossref","first-page":"4506","DOI":"10.1093\/bioinformatics\/btaa551","article-title":"ENANO: Encoder for NANOpore FASTQ files","volume":"36","author":"Dufort y \u00c1lvarez","year":"2020","journal-title":"Bioinformatics"},{"key":"2023062803480575000_btaa1017-B3","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","year":"2012","journal-title":"Nature"},{"key":"2023062803480575000_btaa1017-B4","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4615-3626-0","volume-title":"Vector Quantization and Signal Compression","author":"Gersho","year":"1992"},{"key":"2023062803480575000_btaa1017-B5","doi-asserted-by":"crossref","first-page":"227","DOI":"10.12688\/f1000research.11022.1","article-title":"Picopore: a tool for reducing the storage size of oxford nanopore technologies datasets without loss of functionality","volume":"6","author":"Gigante","year":"2017","journal-title":"F1000 Research"},{"key":"2023062803480575000_btaa1017-B6","first-page":"369","author":"Graves","year":"2006"},{"key":"2023062803480575000_btaa1017-B7","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1186\/s13059-016-1103-0","article-title":"The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community","volume":"17","author":"Jain","year":"2016","journal-title":"Genome Biol"},{"key":"2023062803480575000_btaa1017-B8","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023062803480575000_btaa1017-B9","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"Kolmogorov","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062803480575000_btaa1017-B10","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023062803480575000_btaa1017-B11","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062803480575000_btaa1017-B12","first-page":"438","author":"Liang","year":"2018"},{"key":"2023062803480575000_btaa1017-B13","doi-asserted-by":"crossref","first-page":"E8396","DOI":"10.1073\/pnas.1604560113","article-title":"Assembly of long error-prone reads using de Bruijn graphs","volume":"113","author":"Lin","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062803480575000_btaa1017-B14","first-page":"1","article-title":"Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data","volume":"10","author":"Liu","year":"2019","journal-title":"Nat. Commun"},{"key":"2023062803480575000_btaa1017-B15","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nmeth.3444","article-title":"A complete bacterial genome assembled de novo using only nanopore sequencing data","volume":"12","author":"Loman","year":"2015","journal-title":"Nat. Methods"},{"key":"2023062803480575000_btaa1017-B16","doi-asserted-by":"crossref","first-page":"4586","DOI":"10.1093\/bioinformatics\/btz276","article-title":"DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning","volume":"35","author":"Ni","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062803480575000_btaa1017-B17","doi-asserted-by":"crossref","first-page":"giz043","DOI":"10.1093\/gigascience\/giz043","article-title":"Ultra-deep, long-read nanopore sequencing of mock microbial community standards","volume":"8","author":"Nicholls","year":"2019","journal-title":"Gigascience"},{"key":"2023062803480575000_btaa1017-B18","first-page":"183","article-title":"Effect of lossy compression of quality scores on variant calling","volume":"18","author":"Ochoa","year":"2017","journal-title":"Brief. Bioinf"},{"key":"2023062803480575000_btaa1017-B19","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"Proc. IEEE"},{"key":"2023062803480575000_btaa1017-B20","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1186\/s13059-018-1462-9","article-title":"From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy","volume":"19","author":"Rang","year":"2018","journal-title":"Genome Biol"},{"key":"2023062803480575000_btaa1017-B21","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1038\/nmeth.4184","article-title":"Detecting DNA cytosine methylation using nanopore sequencing","volume":"14","author":"Simpson","year":"2017","journal-title":"Nat. Methods"},{"key":"2023062803480575000_btaa1017-B22","doi-asserted-by":"crossref","first-page":"giy037","DOI":"10.1093\/gigascience\/giy037","article-title":"Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning","volume":"7","author":"Teng","year":"2018","journal-title":"GigaScience"},{"key":"2023062803480575000_btaa1017-B23","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1101\/gr.214270.116","article-title":"Fast and accurate de novo genome assembly from long uncorrected reads","volume":"27","author":"Vaser","year":"2017","journal-title":"Genome Res"},{"key":"2023062803480575000_btaa1017-B24","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1186\/s13059-019-1727-y","article-title":"Performance of neural network basecalling tools for Oxford Nanopore sequencing","volume":"20","author":"Wick","year":"2019","journal-title":"Genome Biol"},{"key":"2023062803480575000_btaa1017-B25","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1038\/nbt.3170","article-title":"Quality score compression improves genotyping accuracy","volume":"33","author":"Yu","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023062803480575000_btaa1017-B26","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.3389\/fgene.2019.01332","article-title":"Causalcall: nanopore basecalling using a temporal convolutional network","volume":"10","author":"Zeng","year":"2020","journal-title":"Front. Genet"},{"key":"2023062803480575000_btaa1017-B27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1017\/35058556\/btaa1017.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/22-23\/5313\/50716133\/btaa1017.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/22-23\/5313\/50716133\/btaa1017.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T23:49:10Z","timestamp":1687909750000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/22-23\/5313\/6039112"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,1]]},"references-count":27,"journal-issue":{"issue":"22-23","published-print":{"date-parts":[[2021,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1017","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.04.19.049262","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,12,1]]},"published":{"date-parts":[[2020,12,1]]}}}