{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T16:04:31Z","timestamp":1775837071590,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:00:00Z","timestamp":1619481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Samsung Research Funding"},{"name":"Incubation Center of Samsung Electronics under Project","award":["SRFC-IT1802-09"],"award-info":[{"award-number":["SRFC-IT1802-09"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection and quality score-based ordering of sequences. We synthesized 513.6 KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich\u2019s research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thus was able to make use of 10.6\u201311.9% more sequence reads from the same sequencing environment, this resulted in 6.5\u20138.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The raw data files and the source codes of our experiments are available at: https:\/\/github.com\/jhjeong0702\/dna-storage.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab246","type":"journal-article","created":{"date-parts":[[2021,4,13]],"date-time":"2021-04-13T18:51:27Z","timestamp":1618339887000},"page":"3136-3143","source":"Crossref","is-referenced-by-count":43,"title":["Cooperative sequence clustering and decoding for DNA storage system with fountain codes"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1055-618X","authenticated-orcid":false,"given":"Jaeho","family":"Jeong","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC) , Seoul 08826, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4524-0302","authenticated-orcid":false,"given":"Seong-Joon","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC) , Seoul 08826, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1608-5849","authenticated-orcid":false,"given":"Jae-Won","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Gyeongsang National University, Engineering Research Institute , Jinju 52828, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jong-Seon","family":"No","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Seoul National University, Institute of New Media and Communications (INMC) , Seoul 08826, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ha Hyeon","family":"Jeon","sequence":"additional","affiliation":[{"name":"Department of Chemical Engineering, POSTECH , Pohang 37673, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0705-0177","authenticated-orcid":false,"given":"Jeong Wook","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Chemical Engineering, POSTECH , Pohang 37673, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6346-4182","authenticated-orcid":false,"given":"Albert","family":"No","sequence":"additional","affiliation":[{"name":"Department of Electronic and Electrical Engineering, Hongik University , Seoul 04066, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sunghwan","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, University of Ulsan , Ulsan 44610, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7854-7792","authenticated-orcid":false,"given":"Hosung","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Chonnam National University , Gwangju 61186, South Korea"},{"name":"Department of ICT Convergence System Engineering, Chonnam National University , Gwangju, 61186, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,4,27]]},"reference":[{"key":"2023051608273304900_btab246-B1","doi-asserted-by":"crossref","first-page":"606","DOI":"10.1093\/gbe\/evs116","article-title":"Distinct mutational behaviors differentiate short tandem repeats from microsatellites in the human genome","volume":"5","author":"Ananda","year":"2013","journal-title":"Genome Biol. Evol"},{"key":"2023051608273304900_btab246-B2","doi-asserted-by":"crossref","first-page":"1229","DOI":"10.1038\/s41587-019-0240-x","article-title":"Data storage in DNA with fewer synthesis cycles using composite DNA letters","volume":"37","author":"Anavy","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023051608273304900_btab246-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-19148-3","article-title":"Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction","volume":"11","author":"Antkowiak","year":"2020","journal-title":"Nat. Commun"},{"key":"2023051608273304900_btab246-B4","first-page":"637","author":"Bornholt","year":"2016"},{"key":"2023051608273304900_btab246-B5","first-page":"147","author":"Chandak","year":"2019"},{"key":"2023051608273304900_btab246-B6","author":"Chandak","year":"2020"},{"key":"2023051608273304900_btab246-B7","first-page":"1","article-title":"High information capacity DNA-based data storage with augmented encoding characters using degenerate bases","volume":"9","author":"Choi","year":"2019","journal-title":"Sci. Rep"},{"key":"2023051608273304900_btab246-B8","doi-asserted-by":"crossref","first-page":"2001249","DOI":"10.1002\/adma.202001249","article-title":"DNA micro-disks for the management of DNA-based data storage with index and write-once-read-many (WORM) memory features","volume":"32","author":"Choi","year":"2020","journal-title":"Adv. Mat"},{"key":"2023051608273304900_btab246-B9","doi-asserted-by":"crossref","first-page":"1628","DOI":"10.1126\/science.1226355","article-title":"Next-generation digital information storage in DNA","volume":"337","author":"Church","year":"2012","journal-title":"Science"},{"key":"2023051608273304900_btab246-B10","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1093\/nsr\/nwaa007","article-title":"DNA storage: research landscape and future prospects","volume":"7","author":"Dong","year":"2020","journal-title":"Nat. Sci. Rev"},{"key":"2023051608273304900_btab246-B11","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1126\/science.aaj2038","article-title":"DNA Fountain enables a robust and efficient storage architecture","volume":"355","author":"Erlich","year":"2017","journal-title":"Science"},{"key":"2023051608273304900_btab246-B12","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1038\/nature11875","article-title":"Towards practical, high-capacity, low-maintenance information storage in synthesized DNA","volume":"494","author":"Goldman","year":"2013","journal-title":"Nature"},{"key":"2023051608273304900_btab246-B13","doi-asserted-by":"crossref","first-page":"2552","DOI":"10.1002\/anie.201411378","article-title":"Robust chemical preservation of digital information on DNA in silica with error-correcting codes","volume":"54","author":"Grass","year":"2015","journal-title":"Angew. Chem. Int. Ed.Engl"},{"key":"2023051608273304900_btab246-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-45832-6","article-title":"A characterization of the DNA data storage channel","volume":"9","author":"Heckel","year":"2019","journal-title":"Sci. Rep"},{"key":"2023051608273304900_btab246-B15","first-page":"23","year":"2013"},{"key":"2023051608273304900_btab246-B16","author":"Lenz","year":"2020"},{"key":"2023051608273304900_btab246-B17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-10978-4","article-title":"DNA assembly for nanopore data storage readout","volume":"10","author":"Lopez","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051608273304900_btab246-B18","first-page":"271","author":"Luby","year":"2002"},{"key":"2023051608273304900_btab246-B19","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1038\/s41596-019-0244-5","article-title":"Reading and writing digital data in DNA","volume":"15","author":"Meiser","year":"2020","journal-title":"Nat. Protocols"},{"key":"2023051608273304900_btab246-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-09517-y","article-title":"High density DNA data storage library via dehydration with digital microfluidic retrieval","volume":"10","author":"Newman","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051608273304900_btab246-B21","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/nbt.4079","article-title":"Random access in large-scale DNA data storage","volume":"36","author":"Organick","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023051608273304900_btab246-B22","doi-asserted-by":"crossref","first-page":"18489","DOI":"10.1073\/pnas.2004821117","article-title":"HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints","volume":"117","author":"Press","year":"2020","journal-title":"Proc. Natl. Acad. Sci. U S A"},{"key":"2023051608273304900_btab246-B23","doi-asserted-by":"crossref","first-page":"R51","DOI":"10.1186\/gb-2013-14-5-r51","article-title":"Characterizing and measuring bias in sequence data","volume":"14","author":"Ross","year":"2013","journal-title":"Genome Biol"},{"key":"2023051608273304900_btab246-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-41228-8","article-title":"Demonstration of end-to-end automation of DNA data storage","volume":"9","author":"Takahashi","year":"2019","journal-title":"Sci. Rep"},{"key":"2023051608273304900_btab246-B25","doi-asserted-by":"crossref","first-page":"2705","DOI":"10.1093\/bioinformatics\/btaa051","article-title":"BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis","volume":"36","author":"Urgese","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051608273304900_btab246-B26","doi-asserted-by":"crossref","first-page":"614","DOI":"10.1093\/bioinformatics\/btt593","article-title":"PEAR: a fast and accurate Illumina Paired-End reAd mergeR","volume":"30","author":"Zhang","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051608273304900_btab246-B27","doi-asserted-by":"crossref","first-page":"1913","DOI":"10.1093\/bioinformatics\/btv053","article-title":"Starcode: sequence clustering based on all-pairs search","volume":"31","author":"Zorita","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab246\/39108322\/btab246.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3136\/50338399\/btab246.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3136\/50338399\/btab246.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T08:42:42Z","timestamp":1684226562000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3136\/6255306"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,4,27]]},"references-count":27,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab246","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,4,27]]}}}