{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T15:08:04Z","timestamp":1776006484388,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T00:00:00Z","timestamp":1688515200000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Severo Ochoa Fellowship","award":["BES-2016-076276"],"award-info":[{"award-number":["BES-2016-076276"]}]},{"name":"La Caixa Foundation\u2014Health Research Program","award":["HR20-00635"],"award-info":[{"award-number":["HR20-00635"]}]},{"DOI":"10.13039\/501100004837","name":"Spanish Ministry of Science and Innovation","doi-asserted-by":"publisher","award":["PID2019-111109RB-I00"],"award-info":[{"award-number":["PID2019-111109RB-I00"]}],"id":[{"id":"10.13039\/501100004837","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["104111\/Z\/14\/ZR"],"award-info":[{"award-number":["104111\/Z\/14\/ZR"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["098051"],"award-info":[{"award-number":["098051"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https:\/\/github.com\/ThomasDOtto\/ILRA.<\/jats:p>","DOI":"10.1093\/bib\/bbad248","type":"journal-article","created":{"date-parts":[[2023,6,19]],"date-time":"2023-06-19T19:01:51Z","timestamp":1687201311000},"source":"Crossref","is-referenced-by-count":12,"title":["From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA)"],"prefix":"10.1093","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1901-1888","authenticated-orcid":false,"given":"Jos\u00e9 Luis","family":"Ruiz","sequence":"first","affiliation":[{"name":"Consejo Superior de Investigaciones Cient\u00edficas Instituto de Parasitolog\u00eda y Biomedicina L\u00f3pez-Neyra (IPBLN), , 18016, Granada , Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Susanne","family":"Reimering","sequence":"additional","affiliation":[{"name":"Helmholtz Centre for Infection Research Department for Computational Biology of Infection Research, , Braunschweig , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan David","family":"Escobar-Prieto","sequence":"additional","affiliation":[{"name":"Centro Internacional de Entrenamiento e Investigaciones M\u00e9dicas (CIDEIM) , Cali , Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicolas M B","family":"Brancucci","sequence":"additional","affiliation":[{"name":"University of Glasgow School of Infection & Immunity, MVLS, , Glasgow , UK"},{"name":"Swiss Tropical and Public Health Institute Department of Medical Parasitology and Infection Biology, , 4123 Allschwil , Switzerland"},{"name":"University of Basel , 4001 Basel , Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0301-4478","authenticated-orcid":false,"given":"Diego F","family":"Echeverry","sequence":"additional","affiliation":[{"name":"Centro Internacional de Entrenamiento e Investigaciones M\u00e9dicas (CIDEIM) , Cali , Colombia"},{"name":"Universidad del Valle Departamento de Microbiolog\u00eda, Facultad de Salud, , Cali , Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7989-2125","authenticated-orcid":false,"given":"Abdirahman I","family":"Abdi","sequence":"additional","affiliation":[{"name":"KEMRI-Wellcome Trust Research Programme, CGMRC , Kilifi , Kenya"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1040-9566","authenticated-orcid":false,"given":"Matthias","family":"Marti","sequence":"additional","affiliation":[{"name":"University of Glasgow School of Infection & Immunity, MVLS, , Glasgow , UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4146-9003","authenticated-orcid":false,"given":"Elena","family":"G\u00f3mez-D\u00edaz","sequence":"additional","affiliation":[{"name":"Consejo Superior de Investigaciones Cient\u00edficas Instituto de Parasitolog\u00eda y Biomedicina L\u00f3pez-Neyra (IPBLN), , 18016, Granada , Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1246-7404","authenticated-orcid":false,"given":"Thomas D","family":"Otto","sequence":"additional","affiliation":[{"name":"University of Glasgow School of Infection & Immunity, MVLS, , Glasgow , UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,7,5]]},"reference":[{"key":"2023072020165867400_ref1","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/s41592-021-01057-y","article-title":"Long road to long-read assembly","volume":"18","author":"Marx","year":"2021","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1126\/science.1162986","article-title":"Real-time DNA sequencing from single polymerase molecules","volume":"323","author":"Eid","year":"2009","journal-title":"Science"},{"key":"2023072020165867400_ref3","doi-asserted-by":"crossref","first-page":"1146","DOI":"10.1038\/nbt.1495","article-title":"The potential and challenges of nanopore sequencing","volume":"26","author":"Branton","year":"2008","journal-title":"Nat Biotechnol"},{"key":"2023072020165867400_ref4","doi-asserted-by":"crossref","first-page":"4325","DOI":"10.1073\/pnas.1720115115","article-title":"Earth BioGenome project: sequencing life for the future of life","volume":"115","author":"Lewin","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"5950","key":"2023072020165867400_ref5","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1126\/science.1180614","article-title":"Genome project standards in a new era of sequencing","volume":"326","author":"Chain","year":"2009","journal-title":"Science"},{"key":"2023072020165867400_ref6","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1146\/annurev-animal-090414-014900","article-title":"The genome 10K project: a way forward","volume":"3","author":"Koepfli","year":"2015","journal-title":"Annu Rev Anim Biosci"},{"key":"2023072020165867400_ref7","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1038\/s41587-018-0004-z","article-title":"Errors in long-read assemblies can critically affect protein prediction","volume":"37","author":"Watson","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023072020165867400_ref8","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1038\/s41587-018-0005-y","article-title":"Reply to \u2018Errors in long-read assemblies can critically affect protein prediction\u2019","volume":"37","author":"Koren","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023072020165867400_ref9","doi-asserted-by":"crossref","first-page":"e1007901","DOI":"10.1371\/journal.ppat.1007901","article-title":"Is reliance on an inaccurate genome sequence sabotaging your experiments?","volume":"15","author":"Baptista","year":"2019","journal-title":"PLoS Pathog"},{"key":"2023072020165867400_ref10","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giaa123","article-title":"Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific biosciences sequel II system and ultralong reads of Oxford Nanopore","volume":"9","author":"Lang","year":"2020","journal-title":"Gigascience"},{"key":"2023072020165867400_ref11","doi-asserted-by":"crossref","article-title":"Pseudoalignment facilitates assignment of error-prone ultima genomics reads","author":"Booeshaghi","DOI":"10.1101\/2022.06.04.494845"},{"key":"2023072020165867400_ref12","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1038\/s41592-022-01539-7","article-title":"Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing","volume":"19","author":"Sereika","year":"2022","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref13","doi-asserted-by":"crossref","first-page":"e1010905","DOI":"10.1371\/journal.pcbi.1010905","article-title":"Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing","volume":"19","author":"Wick","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2023072020165867400_ref14","article-title":"Comparison of R9.4.1\/Kit10 and R10\/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction","volume":"9","author":"Sanderson","year":"2023","journal-title":"Microb Genom"},{"key":"2023072020165867400_ref15","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref16","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023072020165867400_ref17","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The MaSuRCA genome assembler","volume":"29","author":"Zimin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref18","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/s41592-019-0669-3","article-title":"Fast and accurate long-read assembly with wtdbg2","volume":"17","author":"Ruan","year":"2020","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/gix137","article-title":"Finding nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly","volume":"7","author":"Tan","year":"2018","journal-title":"Gigascience"},{"key":"2023072020165867400_ref20","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1186\/s12864-020-07041-8","article-title":"Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing","volume":"21","author":"Chen","year":"2020","journal-title":"BMC Genomics"},{"key":"2023072020165867400_ref21","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1101\/gr.213405.116","article-title":"Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm","volume":"27","author":"Zimin","year":"2017","journal-title":"Genome Res"},{"key":"2023072020165867400_ref22","doi-asserted-by":"crossref","DOI":"10.1038\/s41587-023-01662-6","article-title":"Telomere-to-telomere assembly of diploid chromosomes with Verkko","author":"Rautiainen","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2023072020165867400_ref23","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1038\/nprot.2012.068","article-title":"A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs","volume":"7","author":"Swain","year":"2012","journal-title":"Nat Protoc"},{"key":"2023072020165867400_ref24","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1093\/bioinformatics\/btq269","article-title":"Iterative correction of reference nucleotides (iCORN) using second generation sequencing technology","volume":"26","author":"Otto","year":"2010","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref25","doi-asserted-by":"crossref","first-page":"e112963","DOI":"10.1371\/journal.pone.0112963","article-title":"Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement","volume":"9","author":"Walker","year":"2014","journal-title":"PloS One"},{"key":"2023072020165867400_ref26","doi-asserted-by":"crossref","first-page":"889","DOI":"10.1186\/s12864-020-07227-0","article-title":"A comprehensive evaluation of long read error correction methods","volume":"21","author":"Zhang","year":"2020","journal-title":"BMC Genomics"},{"key":"2023072020165867400_ref27","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac305","article-title":"Benchmarking of long-read sequencing, assemblers and polishers for yeast genome","volume":"23","author":"Zhang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023072020165867400_ref28","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giz014","article-title":"Common workflow language (CWL)-based software pipeline for de novo genome assembly from long- and short-read data","volume":"8","author":"Korhonen","year":"2019","journal-title":"Gigascience"},{"key":"2023072020165867400_ref29","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab170","article-title":"ARAMIS: from systematic errors of NGS long reads to accurate assemblies","volume":"22","author":"Sacristan-Horcajada","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023072020165867400_ref30","volume-title":"fmalmeida\/MpGAP: a generic multi-platform genome assembly pipeline","author":"de Almeida"},{"key":"2023072020165867400_ref31","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1038\/s41586-018-0619-8","article-title":"Genome organization and DNA accessibility control antigenic variation in trypanosomes","volume":"563","author":"Muller","year":"2018","journal-title":"Nature"},{"key":"2023072020165867400_ref32","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkw292","article-title":"Companion: a web server for annotation and analysis of parasite genomes","volume":"44","author":"Steinbiss","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023072020165867400_ref33","doi-asserted-by":"crossref","first-page":"58","DOI":"10.12688\/wellcomeopenres.15194.1","article-title":"Progression of the canonical reference malaria parasite genome from 2002-2019","volume":"4","author":"Bohme","year":"2019","journal-title":"Wellcome Open Res"},{"key":"2023072020165867400_ref34","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1093\/bioinformatics\/btn322","article-title":"Database indexing for production MegaBLAST searches","volume":"24","author":"Morgulis","year":"2008","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref35","doi-asserted-by":"crossref","first-page":"1935","DOI":"10.1038\/s41467-020-20536-y","article-title":"Extended haplotype-phasing of long-read de novo genome assemblies using hi-C","volume":"12","author":"Kronenberg","year":"2021","journal-title":"Nat Commun"},{"key":"2023072020165867400_ref36","doi-asserted-by":"crossref","first-page":"180235","DOI":"10.1038\/sdata.2018.235","article-title":"De novo assembly and annotation of three Leptosphaeria genomes using Oxford Nanopore MinION sequencing","volume":"5","author":"Dutreux","year":"2018","journal-title":"Sci Data"},{"key":"2023072020165867400_ref37","doi-asserted-by":"crossref","first-page":"52","DOI":"10.12688\/wellcomeopenres.14571.1","article-title":"Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres","volume":"3","author":"Otto","year":"2018","journal-title":"Wellcome Open Res"},{"key":"2023072020165867400_ref38","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/s41564-018-0162-2","article-title":"Genomes of all known members of a Plasmodium subgenus reveal paths to virulent human malaria","volume":"3","author":"Otto","year":"2018","journal-title":"Nat Microbiol"},{"key":"2023072020165867400_ref39","doi-asserted-by":"crossref","first-page":"3210","DOI":"10.1093\/bioinformatics\/btv351","article-title":"BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs","volume":"31","author":"Simao","year":"2015","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref40","doi-asserted-by":"crossref","first-page":"i142","DOI":"10.1093\/bioinformatics\/bty266","article-title":"Versatile genome assembly evaluation with QUAST-LG","volume":"34","author":"Mikheenko","year":"2018","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref41","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1023\/A:1022913015916","article-title":"Mycoplasma contamination of cell cultures: incidence, sources, effects, detection, elimination, prevention","volume":"39","author":"Drexler","year":"2002","journal-title":"Cytotechnology"},{"key":"2023072020165867400_ref42","doi-asserted-by":"crossref","first-page":"2535","DOI":"10.1093\/nar\/gkv136","article-title":"Assessing the prevalence of mycoplasma contamination in cell culture via a survey of NCBI's RNA-seq archive","volume":"43","author":"Olarerin-George","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023072020165867400_ref43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41592-022-01759-x","article-title":"Method of the year 2022: long-read sequencing","volume":"20","author":"Editorial","year":"2023","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref44","doi-asserted-by":"crossref","first-page":"e0144305","DOI":"10.1371\/journal.pone.0144305","article-title":"Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches","volume":"10","author":"Lin","year":"2015","journal-title":"PloS One"},{"key":"2023072020165867400_ref45","doi-asserted-by":"crossref","DOI":"10.3390\/genes10010062","article-title":"A high-quality de novo genome assembly from a single mosquito using PacBio sequencing","volume":"10","author":"Kingan","year":"2019","journal-title":"Genes (Basel)"},{"key":"2023072020165867400_ref46","doi-asserted-by":"crossref","DOI":"10.1128\/genomeA.00219-18","article-title":"Complete sequence of the intronless mitochondrial genome of the Saccharomyces cerevisiae strain CW252","volume":"6","author":"Naquin","year":"2018","journal-title":"Genome Announc"},{"key":"2023072020165867400_ref47","doi-asserted-by":"crossref","first-page":"i105","DOI":"10.1093\/bioinformatics\/bty279","article-title":"A graph-based approach to diploid genome assembly","volume":"34","author":"Garg","year":"2018","journal-title":"Bioinformatics"},{"key":"2023072020165867400_ref48","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1186\/s12859-021-04118-3","article-title":"Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms","volume":"22","author":"Guiglielmoni","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023072020165867400_ref49","doi-asserted-by":"crossref","first-page":"e1007843","DOI":"10.1371\/journal.pcbi.1007843","article-title":"Ranbow: a fast and accurate method for polyploid haplotype reconstruction","volume":"16","author":"Moeinzadeh","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2023072020165867400_ref50","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s41592-020-01056-5","article-title":"Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm","volume":"18","author":"Cheng","year":"2021","journal-title":"Nat Methods"},{"key":"2023072020165867400_ref51","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/s12859-022-04591-4","article-title":"gcaPDA: a haplotype-resolved diploid assembler","volume":"23","author":"Xie","year":"2022","journal-title":"BMC Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad248\/50917382\/bbad248.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad248\/50917382\/bbad248.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,20]],"date-time":"2023-07-20T16:22:46Z","timestamp":1689870166000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad248\/7219769"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad248","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.07.30.454413","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7]]},"published":{"date-parts":[[2023,7]]},"article-number":"bbad248"}}