{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:21:28Z","timestamp":1760710888579,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,6,3]],"date-time":"2022-06-03T00:00:00Z","timestamp":1654214400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation","award":["CNS-1650148, CNS-1901324, and ECCS 2027655"],"award-info":[{"award-number":["CNS-1650148, CNS-1901324, and ECCS 2027655"]}]},{"name":"North Carolina State University Research and Innovation Seed Funding Award","award":["1402-2018-2509"],"award-info":[{"award-number":["1402-2018-2509"]}]},{"name":"Department of Education Graduate Assistance in Areas of Need Fellowship"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2022,7,31]]},"abstract":"<jats:p>As interest in DNA-based information storage grows, the costs of synthesis have been identified as a key bottleneck. A potential direction is to tune synthesis for data. Data strands tend to be composed of a small set of recurring code word sequences, and they contain longer sequences of repeated data. To exploit these properties, we propose a new framework called DINOS. DINOS consists of three key parts: (i) The first is a hierarchical strand assembly algorithm, inspired by gene assembly techniques that can assemble arbitrary data strands from a small set of primitive blocks. (ii) The assembly algorithm relies on our novel formulation for how to construct primitive blocks, spanning a variety of useful configurations from a set of code words and overhangs. Each primitive block is a code word flanked by a pair of overhangs that are created by a cyclic pairing process that keeps the number of primitive blocks small. Using these primitive blocks, any data strand of arbitrary length can be assembled, theoretically. We show a minimal system for a binary code with as few as six primitive blocks, and we generalize our processes to support an arbitrary set of overhangs and code words. (iii) We exploit our hierarchical assembly approach to identify redundant sequences and coalesce the reactions that create them to make assembly more efficient.<\/jats:p>\n          <jats:p>\n            We evaluate DINOS and describe its key characteristics. For example, the number of reactions needed to make a strand can be reduced by increasing the number of overhangs or the number of code words, but increasing the number of overhangs offers a small advantage over increasing code words while requiring substantially fewer primitive blocks. However, density is improved more by increasing the number of code words. We also find that a simple redundancy coalescing technique is able to reduce reactions by 90.6% and 41.2% on average for decompressed and compressed data, respectively, even when the smallest data fragments being assembled are 16 bits. With a simple padding heuristic that finds even more redundancy, we can further decrease reactions for the same operating point up to 91.1% and 59% for decompressed and compressed data, respectively, on average. Our approach offers greater density by up to 80% over a prior general purpose gene assembly technique. Finally, in an analysis of synthesis costs in which we make 1 GB volume using\n            <jats:italic>de novo<\/jats:italic>\n            synthesis versus making only the primitive blocks with\n            <jats:italic>de novo<\/jats:italic>\n            synthesis and otherwise assembling using DINOS, we estimate DINOS as 10\n            <jats:sup>5<\/jats:sup>\n            \u00d7 cheaper than\n            <jats:italic>de novo<\/jats:italic>\n            synthesis.\n          <\/jats:p>","DOI":"10.1145\/3510853","type":"journal-article","created":{"date-parts":[[2022,3,23]],"date-time":"2022-03-23T14:43:44Z","timestamp":1648046624000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["DINOS: Data INspired Oligo Synthesis for DNA Data Storage"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5200-5909","authenticated-orcid":false,"given":"Kevin","family":"Volkel","sequence":"first","affiliation":[{"name":"North Carolina State University, NC"}]},{"given":"Kyle J.","family":"Tomek","sequence":"additional","affiliation":[{"name":"North Carolina State University, NC"}]},{"given":"Albert J.","family":"Keung","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8975-0294","authenticated-orcid":false,"given":"James M.","family":"Tuck","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}]}],"member":"320","published-online":{"date-parts":[[2022,6,3]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"[n.d.]. DNA\/RNA Price List. Eurofin. Retrieved 22 Feb. 2021 from https:\/\/eurofinsgenomics.com\/en\/products\/dnarna-synthesis\/oligo-price-list."},{"key":"e_1_3_3_3_2","unstructured":"[n.d.]. Oligo Pools. Twist Bioscience. Retrieved 29 Aug. 2019 from https:\/\/www.twistbioscience.com\/sites\/default\/files\/resources\/2019-09\/ProductSheet_OligoPools_29Aug19_Rev5.1.pdf."},{"key":"e_1_3_3_4_2","unstructured":"Sebastian Deorowicz. Silesia Compression Corpus. Retrieved from http:\/\/sun.aei.polsl.pl\/sdeor\/index.php?page=silesia."},{"key":"e_1_3_3_5_2","unstructured":"[n.d.]. T4 DNA Ligase. New England BioLabs Inc. Retrieved 22 Feb. 2021 from https:\/\/www.neb.com\/products\/m0202-t4-dna-ligase#Product%20Information_Properties%20&%20Usage."},{"key":"e_1_3_3_6_2"},{"key":"e_1_3_3_7_2"},{"key":"e_1_3_3_8_2"},{"key":"e_1_3_3_9_2"},{"key":"e_1_3_3_10_2","unstructured":"Bryan Bishop Nathan McCorkle and Victor Zhirnov. 2017. Technology working group meeting on future DNA synthesis technologies. (Oct. 2017) 39. Semiconductor Research Corporation."},{"key":"e_1_3_3_11_2"},{"key":"e_1_3_3_12_2"},{"key":"e_1_3_3_13_2"},{"key":"e_1_3_3_14_2"},{"key":"e_1_3_3_15_2"},{"key":"e_1_3_3_16_2"},{"key":"e_1_3_3_17_2"},{"key":"e_1_3_3_18_2"},{"key":"e_1_3_3_19_2"},{"key":"e_1_3_3_20_2"},{"key":"e_1_3_3_21_2"},{"key":"e_1_3_3_22_2","volume-title":"Compilers Principles, Techniques, and Tools (2nd ed.)","author":"Lam Aho","year":"2014","unstructured":"Aho Lam and Sethi Ullman. 2014. Compilers Principles, Techniques, and Tools (2nd ed.). Pearson Education, Edinburgh Gate, Harlow, Essex, England."},{"key":"e_1_3_3_23_2"},{"key":"e_1_3_3_24_2"},{"key":"e_1_3_3_25_2"},{"key":"e_1_3_3_26_2","unstructured":"Matt Mahoney. 2016. ZPAQ. Retrieved from http:\/\/mattmahoney.net\/dc\/zpaq.html."},{"key":"e_1_3_3_27_2","unstructured":"Matt Mahoney. 2019. Silesia Open Source Compression Benchmark. Retrieved from http:\/\/mattmahoney.net\/dc\/silesia.html."},{"key":"e_1_3_3_28_2"},{"key":"e_1_3_3_29_2"},{"key":"e_1_3_3_30_2"},{"key":"e_1_3_3_31_2"},{"key":"e_1_3_3_32_2"},{"key":"e_1_3_3_33_2","unstructured":"David Reinsel John Gantz and John Rydning. 2018. The digitization of the world from edge to core. (2018) 28. IDC White Paper."},{"key":"e_1_3_3_34_2","volume-title":"Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation","author":"Richardson Stephen E.","year":"1992","unstructured":"Stephen E. Richardson. 1992. Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation. Technical Report. Sun Microsystems, Inc."},{"key":"e_1_3_3_35_2","unstructured":"Nathaniel Roquet HyunJun Park and Swapnil P. Bhatia. 2020. Nucleic acid-based data storage. Retrieved from https:\/\/patents.google.com\/patent\/US10650312B2\/en."},{"key":"e_1_3_3_36_2"},{"key":"e_1_3_3_37_2"},{"key":"e_1_3_3_38_2"},{"key":"e_1_3_3_39_2"},{"key":"e_1_3_3_40_2"},{"key":"e_1_3_3_41_2"},{"key":"e_1_3_3_42_2"},{"key":"e_1_3_3_43_2"},{"key":"e_1_3_3_44_2"},{"key":"e_1_3_3_45_2","volume-title":"Algorithmic Self-Assembly of DNA","author":"Winfree E.","year":"1998","unstructured":"E. Winfree. 1998. Algorithmic Self-Assembly of DNA. Ph.D. Dissertation. California Institute of Technology. Retrieved from https:\/\/www.dna.caltech.edu\/winfree\/old_html\/Papers\/thesis.pdf."},{"key":"e_1_3_3_46_2"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1089389"},{"key":"e_1_3_3_48_2"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510853","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3510853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:12Z","timestamp":1750186932000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,3]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,7,31]]}},"alternative-id":["10.1145\/3510853"],"URL":"https:\/\/doi.org\/10.1145\/3510853","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"type":"print","value":"1550-4832"},{"type":"electronic","value":"1550-4840"}],"subject":[],"published":{"date-parts":[[2022,6,3]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}