{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:29:04Z","timestamp":1773772144150,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2020,3,13]],"date-time":"2020-03-13T00:00:00Z","timestamp":1584057600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"T\u00dcB\u0130TAK","award":["T\u00dcB\u0130TAK-1001-215E172"],"award-info":[{"award-number":["T\u00dcB\u0130TAK-1001-215E172"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject\u2019s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward\u2013Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Source code is available at https:\/\/github.com\/CMU-SAFARI\/Apollo.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa179","type":"journal-article","created":{"date-parts":[[2020,3,11]],"date-time":"2020-03-11T20:25:49Z","timestamp":1583958349000},"page":"3669-3679","source":"Crossref","is-referenced-by-count":41,"title":["Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6548-7863","authenticated-orcid":false,"given":"Can","family":"Firtina","sequence":"first","affiliation":[{"name":"Department of Computer Science , ETH Zurich, Zurich 8092, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeremie S","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science , ETH Zurich, Zurich 8092, Switzerland"},{"name":"Department of Electrical and Computer Engineering , Carnegie Mellon University, Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Alser","sequence":"additional","affiliation":[{"name":"Department of Computer Science , ETH Zurich, Zurich 8092, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Damla","family":"Senol Cali","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering , Carnegie Mellon University, Pittsburgh, PA 15213, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8613-6619","authenticated-orcid":false,"given":"A Ercument","family":"Cicek","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , Bilkent University, Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering , Bilkent University, Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"Department of Computer Science , ETH Zurich, Zurich 8092, Switzerland"},{"name":"Department of Electrical and Computer Engineering , Carnegie Mellon University, Pittsburgh, PA 15213, USA"},{"name":"Department of Computer Engineering , Bilkent University, Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,3,13]]},"reference":[{"key":"2023063010282440600_btaa179-B1","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.1527","article-title":"Limitations of next-generation genome sequence assembly","volume":"8","author":"`Alkan","year":"2011","journal-title":"Nat. Methods"},{"key":"2023063010282440600_btaa179-B2","doi-asserted-by":"crossref","first-page":"3355","DOI":"10.1093\/bioinformatics\/btx342","article-title":"GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping","volume":"33","author":"Alser","year":"2017","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B3","doi-asserted-by":"crossref","first-page":"4255","DOI":"10.1093\/bioinformatics\/btz234","article-title":"Shouji: a fast and efficient pre-alignment filter for sequence alignment","volume":"35","author":"Alser","year":"2019","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B4","author":"Alser","year":"2019"},{"key":"2023063010282440600_btaa179-B5","doi-asserted-by":"crossref","first-page":"e46679","DOI":"10.1371\/journal.pone.0046679","article-title":"Improving PacBio long read accuracy by short read alignment","volume":"7","author":"Au","year":"2012","journal-title":"PLoS One"},{"key":"2023063010282440600_btaa179-B6","first-page":"1","article-title":"An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process","volume":"3","author":"Baum","year":"1972","journal-title":"Inequalities"},{"key":"2023063010282440600_btaa179-B7","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023063010282440600_btaa179-B40","doi-asserted-by":"crossref","first-page":"27","DOI":"10.21105\/joss.00027","article-title":"sourmash: a library for MinHash sketching of DNA","volume":"1","author":"Brown","year":"2016","journal-title":"J. Open Source Softw"},{"key":"2023063010282440600_btaa179-B8","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1093\/bioinformatics\/bth205","article-title":"Fragment assembly with short reads","volume":"20","author":"Chaisson","year":"2004","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B9","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1186\/1471-2105-13-238","article-title":"Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory","volume":"13","author":"Chaisson","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023063010282440600_btaa179-B10","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nrg3933","article-title":"Genetic variation and the de novo assembly of human genomes","volume":"16","author":"Chaisson","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023063010282440600_btaa179-B11","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023063010282440600_btaa179-B12","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-9-11","article-title":"SeqAn an efficient, generic C++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023063010282440600_btaa179-B13","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2023063010282440600_btaa179-B14","doi-asserted-by":"crossref","first-page":"2243","DOI":"10.1093\/bioinformatics\/btw139","article-title":"On genomic repeats and reproducibility","volume":"32","author":"Firtina","year":"2016","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B15","first-page":"e125","article-title":"Hercules: a profile HMM-based hybrid error correction algorithm for long reads","volume":"46","author":"Firtina","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023063010282440600_btaa179-B16","doi-asserted-by":"crossref","first-page":"759","DOI":"10.1111\/j.1755-0998.2011.03024.x","article-title":"Field guide to next-generation DNA sequencers","volume":"11","author":"Glenn","year":"2011","journal-title":"Mol. Ecol. Resour"},{"key":"2023063010282440600_btaa179-B17","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B18","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1101\/gr.168450.113","article-title":"Reconstructing complex regions of genomes using long-read sequencing technology","volume":"24","author":"Huddleston","year":"2014","journal-title":"Genome Res"},{"key":"2023063010282440600_btaa179-B19","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023063010282440600_btaa179-B20","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1186\/s12864-018-4460-0","article-title":"GRIM-Filter: fast seed location filtering in DNA read mapping using processing-in-memory technologies","volume":"19","author":"Kim","year":"2018","journal-title":"BMC Genomics"},{"key":"2023063010282440600_btaa179-B21","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023063010282440600_btaa179-B22","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023063010282440600_btaa179-B23","doi-asserted-by":"crossref","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","article-title":"Versatile and open software for comparing large genomes","volume":"5","author":"Kurtz","year":"2004","journal-title":"Genome Biol"},{"key":"2023063010282440600_btaa179-B24","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B25","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B26","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B27","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B28","first-page":"1","article-title":"cuHMM: a CUDA implementation of hidden Markov Model training and classification","author":"Liu","year":"2009","journal-title":"Chron. High. Educ"},{"key":"2023063010282440600_btaa179-B29","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nmeth.3444","article-title":"A complete bacterial genome assembled de novo using only nanopore sequencing data","volume":"12","author":"Loman","year":"2015","journal-title":"Nat. Methods"},{"key":"2023063010282440600_btaa179-B30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/JPROC.2016.2645402","article-title":"Building and improving reference genome assemblies","volume":"105","author":"Meltz Steinberg","year":"2017","journal-title":"Proc. IEEE"},{"key":"2023063010282440600_btaa179-B31","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1515\/popets-2017-0042","article-title":"Expectation\u2013maximization tensor factorization for practical location privacy attacks","volume":"2017","author":"Murakami","year":"2017","journal-title":"Proc. Privacy Enhancing Technol"},{"key":"2023063010282440600_btaa179-B32","first-page":". 380","author":"Niwattanakul","year":"2013"},{"key":"2023063010282440600_btaa179-B33","article-title":"BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files","author":"Payne","year":"2018","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B34","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023063010282440600_btaa179-B35","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1016\/j.gpb.2015.08.002","article-title":"PacBio sequencing and its applications","volume":"13","author":"Rhoads","year":"2015","journal-title":"Genomics Proteomics Bioinform"},{"key":"2023063010282440600_btaa179-B36","doi-asserted-by":"crossref","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","article-title":"LoRDEC: accurate and efficient long read error correction","volume":"30","author":"Salmela","year":"2014","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B37","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1093\/bioinformatics\/btw321","article-title":"Accurate self-correction of errors in long reads using de Bruijn graphs","volume":"33","author":"Salmela","year":"2016","journal-title":"Bioinformatics"},{"key":"2023063010282440600_btaa179-B38","doi-asserted-by":"crossref","first-page":"5463","DOI":"10.1073\/pnas.74.12.5463","article-title":"DNA sequencing with chain-terminating inhibitors","volume":"74","author":"Sanger","year":"1977","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023063010282440600_btaa179-B39","doi-asserted-by":"crossref","first-page":"1542","DOI":"10.1093\/bib\/bby017","article-title":"Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions","volume":"20","author":"Senol Cali","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023063010282440600_btaa179-B41","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1101\/gr.214270.116","article-title":"Fast and accurate de novo genome assembly from long uncorrected reads","volume":"27","author":"Vaser","year":"2017","journal-title":"Genome Res"},{"key":"2023063010282440600_btaa179-B42","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1109\/TIT.1967.1054010","article-title":"Error bounds for convolutional codes and an asymptotically optimum decoding algorithm","volume":"13","author":"Viterbi","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023063010282440600_btaa179-B43","doi-asserted-by":"crossref","first-page":"e112963","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PLoS One"},{"key":"2023063010282440600_btaa179-B44","doi-asserted-by":"crossref","first-page":"100","DOI":"10.12688\/f1000research.10571.2","article-title":"Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis","volume":"6","author":"Weirather","year":"2017","journal-title":"F1000Research"},{"key":"2023063010282440600_btaa179-B45","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023063010282440600_btaa179-B46","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2164-14-S1-S13","article-title":"Accelerating read mapping with FastHASH","volume":"14","author":"Xin","year":"2013","journal-title":"BMC Genomics"},{"key":"2023063010282440600_btaa179-B47","first-page":"395","author":"Yu","year":"2014"},{"key":"2023063010282440600_btaa179-B48","first-page":"e890v1","article-title":"Crossing the streams: a framework for streaming analysis of short DNA sequencing reads","volume":"3","author":"Zhang","year":"2015","journal-title":"PeerJ PrePrints"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa179\/33199729\/btaa179.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/12\/3669\/50746247\/bioinformatics_36_12_3669.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/12\/3669\/50746247\/bioinformatics_36_12_3669.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T10:29:15Z","timestamp":1688120955000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/12\/3669\/5804978"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,3,13]]},"references-count":48,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa179","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,6,15]]},"published":{"date-parts":[[2020,3,13]]}}}