{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T02:16:13Z","timestamp":1778811373584,"version":"3.51.4"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:00:00Z","timestamp":1679529600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:00:00Z","timestamp":1679529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Synchronization (insertions\u2013deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when there are around 20% errors. Below this critical value, increasing sequencing depth can eventually allow it to approach complete recovery. Otherwise, its performance plateaus at some poor levels. Given a reasonable sequencing depth (\u2264\u00a070), MSA could achieve complete recovery in the low error regime, and effectively correct 90% of the errors in the medium error regime. In addition, MSA is robust to imperfect clustering. It could also be combined with other means such as ECC, repeated markers, or any other code constraints. Furthermore, by selecting an appropriate sequencing depth, this strategy could achieve an optimal trade-off between cost and reading speed. MSA could be a competitive alternative for future DNA storage.<\/jats:p>","DOI":"10.1186\/s12859-023-05237-9","type":"journal-article","created":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T14:03:12Z","timestamp":1679580192000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage"],"prefix":"10.1186","volume":"24","author":[{"given":"Ranze","family":"Xie","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangzhen","family":"Zan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ling","family":"Chu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanqing","family":"Su","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenbin","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,3,23]]},"reference":[{"issue":"1","key":"5237_CR1","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1038\/s41467-021-27846-9","volume":"13","author":"LC Meiser","year":"2022","unstructured":"Meiser LC, Nguyen BH, Chen Y-J, Nivala J, Strauss K, Ceze L, Grass RN. Synthetic DNA applications in information technology. Nat Commun. 2022;13(1):352.","journal-title":"Nat Commun"},{"issue":"5","key":"5237_CR2","doi-asserted-by":"publisher","first-page":"1905","DOI":"10.1021\/acs.nanolett.1c04203","volume":"22","author":"SK Tabatabaei","year":"2022","unstructured":"Tabatabaei SK, Pham B, Pan C, Liu J, Chandak S, Shorkey SA, Hernandez AG, Aksimentiev A, Chen M, Schroeder CM, et al. Expanding the molecular alphabet of DNA-based data storage systems with neural network nanopore readout processing. Nano Lett. 2022;22(5):1905\u201314.","journal-title":"Nano Lett"},{"issue":"6","key":"5237_CR3","doi-asserted-by":"publisher","first-page":"1092","DOI":"10.1093\/nsr\/nwaa007","volume":"7","author":"L Qian","year":"2020","unstructured":"Qian L, Ouyang Q, Ping Z, Sun F, Dong Y. DNA storage: research landscape and future prospects. Natl Sci Rev. 2020;7(6):1092\u2013107.","journal-title":"Natl Sci Rev"},{"issue":"1","key":"5237_CR4","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1186\/s12859-022-04723-w","volume":"23","author":"L Yuan","year":"2022","unstructured":"Yuan L, Xie Z, Wang Y, Wang X. DeSP: a systematic DNA storage error simulation pipeline. BMC Bioinform. 2022;23(1):185.","journal-title":"BMC Bioinform"},{"issue":"6328","key":"5237_CR5","doi-asserted-by":"publisher","first-page":"950","DOI":"10.1126\/science.aaj2038","volume":"355","author":"Y Erlich","year":"2017","unstructured":"Erlich Y, Zielinski D. DNA Fountain enables a robust and efficient storage architecture. Science. 2017;355(6328):950\u20134.","journal-title":"Science"},{"key":"5237_CR6","first-page":"1","volume":"9","author":"R Heckel","year":"2018","unstructured":"Heckel R, Mikutis G, Grass RN. A characterization of the DNA data storage channel. Sci Rep. 2018;9:1\u201312.","journal-title":"Sci Rep"},{"issue":"1","key":"5237_CR7","doi-asserted-by":"publisher","first-page":"3264","DOI":"10.1038\/s41467-020-16958-3","volume":"11","author":"Y-J Chen","year":"2020","unstructured":"Chen Y-J, Takahashi CN, Organick L, Bee C, Ang SD, Weiss P, Peck B, Seelig G, Ceze L, Strauss K. Quantifying molecular bias in DNA data storage. Nat Commun. 2020;11(1):3264.","journal-title":"Nat Commun"},{"issue":"1","key":"5237_CR8","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1038\/s41596-019-0244-5","volume":"15","author":"LC Meiser","year":"2019","unstructured":"Meiser LC, Antkowiak PL, Koch J, Chen WD, Kohll AX, Stark WJ, Heckel R, Grass RN. Reading and writing digital data in DNA. Nat Protoc. 2019;15(1):86\u2013101.","journal-title":"Nat Protoc"},{"issue":"8","key":"5237_CR9","doi-asserted-by":"publisher","first-page":"2552","DOI":"10.1002\/anie.201411378","volume":"54","author":"RN Grass","year":"2015","unstructured":"Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ. Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angew Chem Int Ed Engl. 2015;54(8):2552\u20135.","journal-title":"Angew Chem Int Ed Engl"},{"key":"5237_CR10","doi-asserted-by":"publisher","first-page":"nwab028","DOI":"10.1093\/nsr\/nwab028","volume":"8","author":"W Chen","year":"2021","unstructured":"Chen W, Han M, Zhou J, Ge Q, Wang P, Zhang X, Zhu S, Song L, Yuan Y. An artificial chromosome for data storage. Nat Sci Rev. 2021;8:nwab028.","journal-title":"Nat Sci Rev"},{"key":"5237_CR11","doi-asserted-by":"publisher","first-page":"1011","DOI":"10.1016\/j.procs.2016.05.398","volume":"80","author":"M Blawat","year":"2016","unstructured":"Blawat M, Gaedke K, Huetter I, Chen X-M, Turczyk B, Inverso S, Pruitt B, Church G. Forward error correction for DNA data storage. Proc Comput Sci. 2016;80:1011\u201322.","journal-title":"Proc Comput Sci"},{"issue":"10","key":"5237_CR12","doi-asserted-by":"publisher","first-page":"1580","DOI":"10.1007\/s11427-019-1651-3","volume":"63","author":"WG Chen","year":"2020","unstructured":"Chen WG, Wang LX, Han MZ, Han CC, Li BZ. Sequencing barcode construction and identification methods based on block error-correction codes. Sci China Life Sci. 2020;63(10):1580\u201392.","journal-title":"Sci China Life Sci"},{"issue":"1","key":"5237_CR13","doi-asserted-by":"publisher","first-page":"4998","DOI":"10.1038\/s41598-019-41228-8","volume":"9","author":"CN Takahashi","year":"2019","unstructured":"Takahashi CN, Nguyen BH, Strauss K, Ceze L. Demonstration of end-to-end automation of DNA data storage. Sci Rep. 2019;9(1):4998.","journal-title":"Sci Rep"},{"key":"5237_CR14","doi-asserted-by":"publisher","first-page":"84107","DOI":"10.1109\/ACCESS.2019.2924827","volume":"7","author":"L Deng","year":"2019","unstructured":"Deng L, Wang YX, Noor-A-Rahim M, Guan YL, Shi ZP, Gunawan E, Poh CL. Optimized code design for constrained DNA data storage with asymmetric errors. IEEE Access. 2019;7:84107\u201321.","journal-title":"IEEE Access"},{"key":"5237_CR15","doi-asserted-by":"publisher","first-page":"162892","DOI":"10.1109\/ACCESS.2020.3021700","volume":"8","author":"XZ Lu","year":"2020","unstructured":"Lu XZ, Jeong J, Kim JW, No JS, Park H, No A, Kim S. Error rate-based log-likelihood ratio processing for low-density parity-check codes in DNA storage. IEEE Access. 2020;8:162892\u2013902.","journal-title":"IEEE Access"},{"key":"5237_CR16","unstructured":"Lenz A, Maarouf I, Welter L, Wachter-Zeh A, Amat A. Concatenated codes for recovery from multiple reads of DNA sequences. 2020."},{"issue":"31","key":"5237_CR17","doi-asserted-by":"publisher","first-page":"18489","DOI":"10.1073\/pnas.2004821117","volume":"117","author":"WH Press","year":"2020","unstructured":"Press WH, Hawkins JA, Jones SK, Schaub JM, Finkelstein IJ. HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints. Proc Natl Acad Sci USA. 2020;117(31):18489\u201396.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"1","key":"5237_CR18","doi-asserted-by":"publisher","first-page":"5361","DOI":"10.1038\/s41467-022-33046-w","volume":"13","author":"L Song","year":"2022","unstructured":"Song L, Geng F, Gong Z-Y, Chen X, Tang J, Gong C, Zhou L, Xia R, Han M-Z, Xu J-Y, et al. Robust data storage in DNA by de Bruijn graph-based de novo strand assembly. Nat Commun. 2022;13(1):5361.","journal-title":"Nat Commun"},{"key":"5237_CR19","doi-asserted-by":"crossref","unstructured":"Zan X, Xie R, Yao X, Xu P, Liu W. A robust and efficient DNA storage architecture based on modulation encoding and decoding. bioRxiv 2022.","DOI":"10.1101\/2022.05.25.490755"},{"issue":"1","key":"5237_CR20","doi-asserted-by":"publisher","first-page":"5345","DOI":"10.1038\/s41467-020-19148-3","volume":"11","author":"PL Antkowiak","year":"2020","unstructured":"Antkowiak PL, Lietard J, Darestani MZ, Somoza MM, Stark WJ, Heckel R, Grass RN. Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction. Nat Commun. 2020;11(1):5345.","journal-title":"Nat Commun"},{"key":"5237_CR21","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1038\/s41598-017-05188-1","volume":"7","author":"SMHT Yazdi","year":"2017","unstructured":"Yazdi SMHT, Gabrys R, Milenkovic O. Portable and error-free DNA-based data storage. Sci Rep. 2017;7:6.","journal-title":"Sci Rep"},{"issue":"4","key":"5237_CR22","doi-asserted-by":"publisher","first-page":"772","DOI":"10.1093\/molbev\/mst010","volume":"30","author":"K Katoh","year":"2013","unstructured":"Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772\u201380.","journal-title":"Mol Biol Evol"},{"key":"5237_CR23","unstructured":"Morrison DA. Multiple sequence alignment is not a solved problem. arXiv 2018."},{"key":"5237_CR24","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","volume":"48","author":"S Needleman","year":"1970","unstructured":"Needleman S. Needleman\u2013Wunsch algorithm for sequence similarity searches. J Mol Biol. 1970;48:443\u201353.","journal-title":"J Mol Biol"},{"issue":"5","key":"5237_CR25","doi-asserted-by":"publisher","first-page":"1792","DOI":"10.1093\/nar\/gkh340","volume":"32","author":"RC Edgar","year":"2004","unstructured":"Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792\u20137.","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"5237_CR26","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1093\/bioinformatics\/btz795","volume":"36","author":"T Lassmann","year":"2020","unstructured":"Lassmann T. Kalign 3: multiple sequence alignment of large datasets. Bioinformatics. 2020;36(6):1928\u20139.","journal-title":"Bioinformatics"},{"issue":"1","key":"5237_CR27","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1093\/sysbio\/syr095","volume":"61","author":"K Liu","year":"2012","unstructured":"Liu K, Warnow TJ, Holder MT, Nelesen SM, Yu J, Stamatakis AP, Linder CR. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst Biol. 2012;61(1):90\u2013106.","journal-title":"Syst Biol"},{"issue":"2","key":"5237_CR28","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1101\/gr.2821705","volume":"15","author":"CB Do","year":"2005","unstructured":"Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15(2):330\u201340.","journal-title":"Genome Res"},{"key":"5237_CR29","doi-asserted-by":"publisher","first-page":"19199","DOI":"10.4137\/EBO.S19199","volume":"10","author":"MT Pervez","year":"2014","unstructured":"Pervez MT, Babar ME, Nadeem A, Aslam M, Awan AR, Aslam N, Hussain T, Naveed N, Qadri S, Waheed U, et al. Evaluating the accuracy and efficiency of multiple sequence alignment methods. Evolut Bioinform. 2014;10:19199.","journal-title":"Evolut Bioinform"},{"key":"5237_CR30","doi-asserted-by":"crossref","unstructured":"Srinivasavaradhan SR, Gopi S, Pfister H, Yekhanin S. Trellis BMA: coded trace reconstruction on IDS channels for DNA storage. 2021.","DOI":"10.1109\/ISIT45174.2021.9517821"},{"issue":"1","key":"5237_CR31","doi-asserted-by":"publisher","first-page":"2933","DOI":"10.1038\/s41467-019-10978-4","volume":"10","author":"R Lopez","year":"2019","unstructured":"Lopez R, Chen Y-J, Dumas Ang S, Yekhanin S, Makarychev K, Racz MZ, Seelig G, Strauss K, Ceze L. DNA assembly for nanopore data storage readout. Nat Commun. 2019;10(1):2933.","journal-title":"Nat Commun"},{"key":"5237_CR32","doi-asserted-by":"publisher","first-page":"760","DOI":"10.12688\/f1000research.11354.1","volume":"6","author":"M Jain","year":"2017","unstructured":"Jain M, Tyson JR, Loose M, Ip CLC, Eccles DA, O\u2019Grady J, Malla S, Leggett RM, Wallerman O, Jansen HJ, et al. MinION analysis and reference consortium: phase 2 data release and analysis of R90 chemistry. F1000Res. 2017;6:760\u2013760.","journal-title":"F1000Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05237-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05237-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05237-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,9]],"date-time":"2023-12-09T13:42:04Z","timestamp":1702129324000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05237-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,23]]},"references-count":32,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5237"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05237-9","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,23]]},"assertion":[{"value":"20 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"111"}}