{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T15:35:15Z","timestamp":1772292915372,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2024,3,31]],"date-time":"2024-03-31T00:00:00Z","timestamp":1711843200000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program","doi-asserted-by":"publisher","award":["2020YFA0712100"],"award-info":[{"award-number":["2020YFA0712100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32101182"],"award-info":[{"award-number":["32101182"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Shenzhen Science, Technology and Innovation Commission","award":["SGDX20220530110802015"],"award-info":[{"award-number":["SGDX20220530110802015"]}]},{"DOI":"10.13039\/501100013111","name":"Tip-top Scientific and Technical Innovative Youth Talents of Guangdong Special Support Program","doi-asserted-by":"publisher","award":["2019TQ05Y876"],"award-info":[{"award-number":["2019TQ05Y876"]}],"id":[{"id":"10.13039\/501100013111","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>DNA storage is one of the most promising ways for future information storage due to its high data storage density, durable storage time and low maintenance cost. However, errors are inevitable during synthesizing, storing and sequencing. Currently, many error correction algorithms have been developed to ensure accurate information retrieval, but they will decrease storage density or increase computing complexity. Here, we apply the Bloom Filter, a space-efficient probabilistic data structure, to DNA storage to achieve the anti-error, or anti-contamination function. This method only needs the original correct DNA sequences (referred to as target sequences) to produce a corresponding data structure, which will filter out almost all the incorrect sequences (referred to as non-target sequences) during sequencing data analysis. Experimental results demonstrate the universal and efficient filtering capabilities of our method. Furthermore, we employ the Counting Bloom Filter to achieve the file version control function, which significantly reduces synthesis costs when modifying DNA-form files. To achieve cost-efficient file version control function, a modified system based on yin\u2013yang codec is developed.<\/jats:p>","DOI":"10.1093\/bib\/bbae125","type":"journal-article","created":{"date-parts":[[2024,3,31]],"date-time":"2024-03-31T03:29:06Z","timestamp":1711855746000},"source":"Crossref","is-referenced-by-count":3,"title":["DNA Bloom Filter enables anti-contamination and file version control for DNA-based data storage"],"prefix":"10.1093","volume":"25","author":[{"given":"Yiming","family":"Li","sequence":"first","affiliation":[{"name":"BGI Research , Shenzhen, 518083 , China"},{"name":"BGI Research , Changzhou, 213299 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4507-0339","authenticated-orcid":false,"given":"Haoling","family":"Zhang","sequence":"additional","affiliation":[{"name":"BGI Research , Shenzhen, 518083 , China"},{"name":"Living Systems Lab, BESE, CEMSE, King Abdullah University of Science and Technology , Thuwal, 23955 , Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9246-1829","authenticated-orcid":false,"given":"Yuxin","family":"Chen","sequence":"additional","affiliation":[{"name":"BGI Research , Shenzhen, 518083 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3276-7295","authenticated-orcid":false,"given":"Yue","family":"Shen","sequence":"additional","affiliation":[{"name":"BGI Research , Shenzhen, 518083 , China"},{"name":"BGI Research , Changzhou, 213299 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7114-1124","authenticated-orcid":false,"given":"Zhi","family":"Ping","sequence":"additional","affiliation":[{"name":"BGI Research , Shenzhen, 518083 , China"},{"name":"BGI Research , Changzhou, 213299 , China"},{"name":"School of Medicine, The Chinese University of Hong Kong , Shenzhen, 518172 , China"}]}],"member":"286","published-online":{"date-parts":[[2024,3,28]]},"reference":[{"issue":"6102","key":"2024033103280601500_ref1","doi-asserted-by":"crossref","first-page":"1628","DOI":"10.1126\/science.1226355","article-title":"Next-generation digital information storage in dna","volume":"337","author":"Church","year":"2012","journal-title":"Science"},{"issue":"7435","key":"2024033103280601500_ref2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1038\/nature11875","article-title":"Towards practical, high-capacity, low-maintenance information storage in synthesized dna","volume":"494","author":"Goldman","year":"2013","journal-title":"Nature"},{"issue":"8","key":"2024033103280601500_ref3","doi-asserted-by":"crossref","first-page":"2552","DOI":"10.1002\/anie.201411378","article-title":"Robust chemical preservation of digital information on dna in silica with error-correcting codes","volume":"54","author":"Grass","year":"2015","journal-title":"Angew Chem Int Ed"},{"key":"2024033103280601500_ref4","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1016\/j.procs.2016.05.398","article-title":"Forward error correction for dna data storage","volume":"80","author":"Blawat","year":"2016","journal-title":"Procedia Comput Sci"},{"issue":"6328","key":"2024033103280601500_ref5","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1126\/science.aaj2038","article-title":"Dna fountain enables a robust and efficient storage architecture","volume":"355","author":"Erlich","year":"2017","journal-title":"Science"},{"issue":"31","key":"2024033103280601500_ref6","doi-asserted-by":"crossref","first-page":"18489","DOI":"10.1073\/pnas.2004821117","article-title":"Hedges error-correcting code for dna storage corrects indels and allows sequence constraints","volume":"117","author":"Press","year":"2020","journal-title":"Proc Natl Acad Sci"},{"issue":"4","key":"2024033103280601500_ref7","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1038\/s43588-022-00231-2","article-title":"Towards practical and robust dna-based data archiving using the yin\u2013yang codec system","volume":"2","author":"Ping","year":"2022","journal-title":"Nat Comput Sci"},{"issue":"5","key":"2024033103280601500_ref8","doi-asserted-by":"crossref","first-page":"e30","DOI":"10.1093\/nar\/gkab1209","article-title":"Fractal construction of constrained code words for dna storage systems","volume":"50","author":"L\u00f6chel","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024033103280601500_ref9","doi-asserted-by":"crossref","first-page":"107404","DOI":"10.1016\/j.compbiomed.2023.107404","article-title":"Bo-dna: biologically optimized encoding model for a highly-reliable dna data storage","volume":"165","author":"Rasool","year":"2023","journal-title":"Comput Biol Med"},{"key":"2024033103280601500_ref10","article-title":"Spider-web generates coding algorithms with superior error tolerance and real-time information retrieval capacity.","author":"Zhang","year":"2022"},{"issue":"5","key":"2024033103280601500_ref11","doi-asserted-by":"crossref","first-page":"bbac336","DOI":"10.1093\/bib\/bbac336","article-title":"Clover: tree structure-based efficient dna clustering for dna-based data storage","volume":"23","author":"Guanjin","year":"2022","journal-title":"Brief Bioinform"},{"issue":"1","key":"2024033103280601500_ref12","doi-asserted-by":"crossref","first-page":"5361","DOI":"10.1038\/s41467-022-33046-w","article-title":"Robust data storage in dna by de bruijn graph-based de novo strand assembly","volume":"13","author":"Song","year":"2022","journal-title":"Nat Commun"},{"issue":"11","key":"2024033103280601500_ref13","doi-asserted-by":"crossref","first-page":"3322","DOI":"10.1093\/bioinformatics\/btaa140","article-title":"Mesa: automated assessment of synthetic dna fragments and simulation of dna synthesis, storage, sequencing and pcr errors","volume":"36","author":"Schwarz","year":"2020","journal-title":"Bioinformatics"},{"issue":"3","key":"2024033103280601500_ref14","first-page":"412","article-title":"Chamaeleo: an integrated evaluation platform for dna storage","volume":"2","author":"Zhi","year":"2021","journal-title":"Synth Biol J"},{"issue":"1","key":"2024033103280601500_ref15","first-page":"1","article-title":"Desp: a systematic dna storage error simulation pipeline","volume":"23","author":"Yuan","year":"2022","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"2024033103280601500_ref16","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/nbt.4079","article-title":"Random access in large-scale dna data storage","volume":"36","author":"Organick","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2024033103280601500_ref17","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-020-16797-2","article-title":"Dynamic and scalable dna-based information storage","volume":"11","author":"Lin","year":"2020","journal-title":"Nat Commun"},{"issue":"9","key":"2024033103280601500_ref18","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1038\/s41563-021-01021-3","article-title":"Random access dna memory using boolean search in an archival file storage system","volume":"20","author":"Banal","year":"2021","journal-title":"Nat Mater"},{"issue":"1","key":"2024033103280601500_ref19","doi-asserted-by":"crossref","first-page":"3518","DOI":"10.1038\/s41467-021-23669-w","article-title":"Promiscuous molecules for smarter file operations in dna-based data storage. .","volume":"12","author":"Tomek","year":"2021","journal-title":"Nat Commun"},{"issue":"1","key":"2024033103280601500_ref20","doi-asserted-by":"crossref","first-page":"4764","DOI":"10.1038\/s41467-021-24991-z","article-title":"Molecular-level similarity search brings computing to dna data storage","volume":"12","author":"Bee","year":"2021","journal-title":"Nat Commun"},{"issue":"1","key":"2024033103280601500_ref21","doi-asserted-by":"crossref","first-page":"4998","DOI":"10.1038\/s41598-019-41228-8","article-title":"Demonstration of end-to-end automation of dna data storage","volume":"9","author":"Takahashi","year":"2019","journal-title":"Sci Rep"},{"issue":"46","key":"2024033103280601500_ref22","first-page":"eabk0100","article-title":"Electrochemical dna synthesis and sequencing on a single electrode with scalability for integrated data storage. Science","volume":"7","author":"Chengtao","year":"2021","journal-title":"Advances"},{"key":"2024033103280601500_ref23","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-023-38876-w","article-title":"A biological camera that captures and stores images directly into dna","volume":"14","author":"Lim","year":"2023","journal-title":"Nat Commun"},{"issue":"10","key":"2024033103280601500_ref24","doi-asserted-by":"crossref","first-page":"5451","DOI":"10.1093\/nar\/gkab230","article-title":"Uncertainties in synthetic dna-based data storage","volume":"49","author":"Chengtao","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2024033103280601500_ref25","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1137\/0108018","article-title":"Polynomial codes over certain finite fields","volume":"8","author":"Reed","year":"1960","journal-title":"J Soc Ind Appl Math"},{"issue":"1","key":"2024033103280601500_ref26","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1962.1057683","article-title":"Low-density parity-check codes","volume":"8","author":"Gallager","year":"1962","journal-title":"IRE Trans Inf Theory"},{"key":"2024033103280601500_ref27","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1109\/SFCS.2002.1181950","article-title":"Lt codes","volume-title":"The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings","author":"Luby","year":"2002"},{"key":"2024033103280601500_ref28","article-title":"Clustering billions of reads for dna data storage","volume":"30","author":"Rashtchian","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"2024033103280601500_ref29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-023-05237-9","article-title":"Study of the error correction capability of multiple sequence alignment algorithm (mafft) in dna storage","volume":"24","author":"Xie","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2024033103280601500_ref30","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1145\/1242572.1242759","article-title":"Review spam detection","volume-title":"Proceedings of the 16th International Conference on World Wide Web","author":"Jindal","year":"2007"},{"key":"2024033103280601500_ref31","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-42280-0_2","article-title":"Existing deduplication techniques","author":"Kim","year":"2017","journal-title":"Data Deduplication for Data Optimization for Storage and Network Systems"},{"issue":"7","key":"2024033103280601500_ref32","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/362686.362692","article-title":"Space\/time trade-offs in hash coding with allowable errors","volume":"13","author":"Bloom","year":"1970","journal-title":"Commun ACM"},{"key":"2024033103280601500_ref33","volume-title":"The beauty of mathematics in computer science","author":"Jun","year":"2018"},{"issue":"1","key":"2024033103280601500_ref34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13015-016-0066-8","article-title":"Bloom filter trie: an alignment-free and reference-free data structure for pan-genome storage","volume":"11","author":"Holley","year":"2016","journal-title":"Algorithms Mol Biol"},{"issue":"1","key":"2024033103280601500_ref35","doi-asserted-by":"crossref","first-page":"bbac484","DOI":"10.1093\/bib\/bbac484","article-title":"Multiple errors correction for position-limited dna sequences with gc balance and no homopolymer for dna-based data storage","volume":"24","author":"Li","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024033103280601500_ref36","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad548","article-title":"Reducing cost in dna-based data storage by sequence analysis-aided soft information decoding of variable-length reads","volume":"39","author":"Park","year":"2023","journal-title":"Bioinformatics"},{"issue":"2","key":"2024033103280601500_ref37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2180905.2180907","article-title":"Analysis of workload behavior in scientific and historical long-term data repositories","volume":"8","author":"Adams","year":"2012","journal-title":"ACM Trans. Storage"},{"issue":"5","key":"2024033103280601500_ref38","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/nmeth.2918","article-title":"Large-scale de novo dna synthesis: technologies and applications","volume":"11","author":"Kosuri","year":"2014","journal-title":"Nat Methods"},{"key":"2024033103280601500_ref39","article-title":"Quantifying molecular bias in dna data storage","volume":"11","author":"Chen","year":"2020","journal-title":"Nat Commun"},{"key":"2024033103280601500_ref40","doi-asserted-by":"crossref","DOI":"10.1126\/sciadv.abi6714","article-title":"Scaling dna data storage with nanoscale electrode wells","volume":"7","author":"Nguyen","year":"2021","journal-title":"Sci Adv"},{"issue":"41","key":"2024033103280601500_ref41","doi-asserted-by":"crossref","first-page":"22293","DOI":"10.1021\/jacs.3c06500","article-title":"A canvas of spatially arranged dna strands that can produce 24-bit color depth","volume":"145","author":"Keki\u2019c","year":"2023","journal-title":"J Am Chem Soc"},{"issue":"3","key":"2024033103280601500_ref42","first-page":"144","article-title":"Dna synthesis technologies to close the gene writing gap. Nature reviews","volume":"7","author":"Hoose","year":"2023","journal-title":"Chemistry"},{"key":"2024033103280601500_ref43","article-title":"Don\u2019t thrash: How to cache your hash on flash","volume-title":"3rd Workshop on Hot Topics in Storage and File Systems (HotStorage 11)","author":"Bender","year":"2011"},{"issue":"9","key":"2024033103280601500_ref44","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1109\/TC.1984.1676499","article-title":"Compact hash tables using bidirectional linear probing","volume":"C-33","author":"Clerry","year":"1984","journal-title":"IEEE Trans Comput"},{"key":"2024033103280601500_ref45","article-title":"How close are we to storing data in dna?","volume":"42","author":"Gervasio","year":"2023","journal-title":"Trends Biotechnol"},{"issue":"1","key":"2024033103280601500_ref46","doi-asserted-by":"crossref","first-page":"18063","DOI":"10.1038\/s41598-021-97570-3","article-title":"A self-contained and self-explanatory dna storage system","volume":"11","author":"Li","year":"2021","journal-title":"Sci Rep"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae125\/57109007\/bbae125.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/3\/bbae125\/57109007\/bbae125.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,31]],"date-time":"2024-03-31T03:29:43Z","timestamp":1711855783000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae125\/7636770"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,27]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae125","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5]]},"published":{"date-parts":[[2024,3,27]]},"article-number":"bbae125"}}