{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:20:57Z","timestamp":1772173257607,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011001","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000}}],"reference-count":57,"publisher":"Public Library of Science (PLoS)","issue":"5","license":[{"start":{"date-parts":[[2023,5,1]],"date-time":"2023-05-01T00:00:00Z","timestamp":1682899200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003006","name":"Eidgen\u00f6ssische Technische Hochschule Z\u00fcrich","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003006","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018385","name":"Max-Planck-F\u00f6rderstiftung","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100018385","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Eidgen\u00f6ssische Technische Hochschule Strategic Focus Area - Personalized Health and Related Technologies","award":["project #106"],"award-info":[{"award-number":["project #106"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011001","type":"journal-article","created":{"date-parts":[[2023,5,1]],"date-time":"2023-05-01T13:29:17Z","timestamp":1682947757000},"page":"e1011001","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":8,"title":["ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning"],"prefix":"10.1371","volume":"19","author":[{"given":"Olga","family":"Mineeva","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Danciu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bernhard","family":"Sch\u00f6lkopf","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruth E.","family":"Ley","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gunnar","family":"R\u00e4tsch","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7424-5276","authenticated-orcid":true,"given":"Nicholas D.","family":"Youngblut","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2023,5,1]]},"reference":[{"key":"pcbi.1011001.ref001","article-title":"Hackflex: low cost Illumina Nextera Flex sequencing library construction","author":"D Gaio","year":"2021","journal-title":"bioRxiv"},{"issue":"1","key":"pcbi.1011001.ref002","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1534\/g3.117.300257","article-title":"Large-Scale Low-Cost NGS Library Preparation Using a Robust Tn5 Purification and Tagmentation Protocol","volume":"8","author":"BP Hennig","year":"2018","journal-title":"G3 Genes\u2014Genomes\u2014Genetics"},{"issue":"5","key":"pcbi.1011001.ref003","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1101\/gr.213959.116","article-title":"metaSPAdes: a new versatile metagenomic assembler","volume":"27","author":"S Nurk","year":"2017","journal-title":"Genome research"},{"issue":"10","key":"pcbi.1011001.ref004","doi-asserted-by":"crossref","first-page":"1674","DOI":"10.1093\/bioinformatics\/btv033","article-title":"MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph","volume":"31","author":"D Li","year":"2015","journal-title":"Bioinformatics"},{"issue":"7753","key":"pcbi.1011001.ref005","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1038\/s41586-019-1058-x","article-title":"New insights from uncultivated genomes of the global human gut microbiome","volume":"568","author":"S Nayfach","year":"2019","journal-title":"Nature"},{"issue":"7753","key":"pcbi.1011001.ref006","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/s41586-019-0965-1","article-title":"A new genomic blueprint of the human gut microbiota","volume":"568","author":"A Almeida","year":"2019","journal-title":"Nature"},{"issue":"3","key":"pcbi.1011001.ref007","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1016\/j.cell.2019.01.001","article-title":"Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle","volume":"176","author":"E Pasolli","year":"2019","journal-title":"Cell"},{"issue":"7285","key":"pcbi.1011001.ref008","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"J Qin","year":"2010","journal-title":"nature"},{"issue":"10","key":"pcbi.1011001.ref009","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1038\/nbt.3353","article-title":"A catalog of the mouse gut metagenome","volume":"33","author":"L Xiao","year":"2015","journal-title":"Nature biotechnology"},{"issue":"4","key":"pcbi.1011001.ref010","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/bib\/bbx120","article-title":"A review of methods and databases for metagenomic classification and assembly","volume":"20","author":"FP Breitwieser","year":"2019","journal-title":"Briefings in bioinformatics"},{"issue":"1","key":"pcbi.1011001.ref011","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/s41587-020-0603-3","article-title":"A unified catalog of 204,938 reference genomes from the human gut microbiome","volume":"39","author":"A Almeida","year":"2021","journal-title":"Nature biotechnology"},{"issue":"10","key":"pcbi.1011001.ref012","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nbt.4229","article-title":"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life","volume":"36","author":"DH Parks","year":"2018","journal-title":"Nature biotechnology"},{"key":"pcbi.1011001.ref013","doi-asserted-by":"crossref","first-page":"e12198","DOI":"10.7717\/peerj.12198","article-title":"Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets","volume":"9","author":"ND Youngblut","year":"2021","journal-title":"PeerJ"},{"key":"pcbi.1011001.ref014","doi-asserted-by":"crossref","first-page":"653","DOI":"10.3389\/fmicb.2021.613791","article-title":"Metagenomic data assembly\u2013the way of decoding unknown microorganisms","volume":"12","author":"AL Lapidus","year":"2021","journal-title":"Frontiers in Microbiology"},{"issue":"7","key":"pcbi.1011001.ref015","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1093\/bioinformatics\/btv697","article-title":"MetaQUAST: evaluation of metagenome assemblies","volume":"32","author":"A Mikheenko","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1011001.ref016","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.mib.2014.11.014","article-title":"One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly","volume":"23","author":"S Koren","year":"2015","journal-title":"Current opinion in microbiology"},{"issue":"2","key":"pcbi.1011001.ref017","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1093\/bib\/bbz020","article-title":"New approaches for metagenome assembly with short reads","volume":"21","author":"M Ayling","year":"2020","journal-title":"Briefings in bioinformatics"},{"issue":"10","key":"pcbi.1011001.ref018","doi-asserted-by":"crossref","first-page":"3011","DOI":"10.1093\/bioinformatics\/btaa124","article-title":"DeepMAsED: evaluating the quality of metagenomic assemblies","volume":"36","author":"O Mineeva","year":"2020","journal-title":"Bioinformatics"},{"key":"pcbi.1011001.ref019","article-title":"metaMIC: reference-free Misassembly Identification and Correction of de novo metagenomic assemblies","author":"S Lai","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1011001.ref020","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"issue":"3","key":"pcbi.1011001.ref021","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1016\/j.cell.2018.12.015","article-title":"Predicting Splicing from Primary Sequence with Deep Learning","volume":"176","author":"K Jaganathan","year":"2019","journal-title":"Cell"},{"issue":"10","key":"pcbi.1011001.ref022","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nbt.4229","article-title":"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life","volume":"36","author":"DH Parks","year":"2018","journal-title":"Nat Biotechnol"},{"issue":"4","key":"pcbi.1011001.ref023","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"W Huang","year":"2012","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1011001.ref024","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40168-019-0633-6","article-title":"CAMISIM: simulating metagenomes and microbial communities","volume":"7","author":"A Fritz","year":"2019","journal-title":"Microbiome"},{"issue":"4","key":"pcbi.1011001.ref025","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"B Langmead","year":"2012","journal-title":"Nat Methods"},{"issue":"11","key":"pcbi.1011001.ref026","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1038\/nmeth.4458","article-title":"Critical assessment of metagenome interpretation\u2014a benchmark of metagenomics software","volume":"14","author":"A Sczyrba","year":"2017","journal-title":"Nature methods"},{"issue":"1","key":"pcbi.1011001.ref027","doi-asserted-by":"crossref","first-page":"e00939","DOI":"10.1128\/mSystems.00939-20","article-title":"Genomic Insights into Adaptations of Trimethylamine-Utilizing Methanogens to Diverse Habitats, Including the Human Gut","volume":"6","author":"J de la Cuesta-Zuluaga","year":"2021","journal-title":"mSystems"},{"issue":"6","key":"pcbi.1011001.ref028","doi-asserted-by":"crossref","first-page":"e01045","DOI":"10.1128\/mSystems.01045-20","article-title":"Large-scale metagenome assembly reveals novel animal-associated microbial genomes, biosynthetic gene clusters, and other genetic diversity","volume":"5","author":"ND Youngblut","year":"2020","journal-title":"Msystems"},{"key":"pcbi.1011001.ref029","doi-asserted-by":"crossref","DOI":"10.3389\/fmicb.2019.01252","article-title":"Shotgun Metagenomics Reveals the Benthic Microbial Community Response to Plastic and Bioplastic in a Coastal Marine Environment","volume":"10","author":"LJ Pinnell","year":"2019","journal-title":"Frontiers in Microbiology"},{"issue":"5","key":"pcbi.1011001.ref030","doi-asserted-by":"crossref","first-page":"e01018","DOI":"10.1128\/mSystems.01018-21","article-title":"Metagenomic Sequencing of Multiple Soil Horizons and Sites in Close Vicinity Revealed Novel Secondary Metabolite Diversity","volume":"6","author":"SS Mantri","year":"2021","journal-title":"mSystems"},{"issue":"1","key":"pcbi.1011001.ref031","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1186\/s40793-022-00449-7","article-title":"MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes","volume":"17","author":"MK Nata\u2019ala","year":"2022","journal-title":"Environ Microbiome"},{"issue":"D1","key":"pcbi.1011001.ref032","first-page":"D626","article-title":"TerrestrialMetagenomeDB: a public repository of curated and standardized metadata for terrestrial metagenomes","volume":"48","author":"FB Corr\u00eaa","year":"2019","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"pcbi.1011001.ref033","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/s41597-019-0287-z","article-title":"Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies","volume":"6","author":"V Sevim","year":"2019","journal-title":"Sci Data"},{"key":"pcbi.1011001.ref034","doi-asserted-by":"crossref","first-page":"160081","DOI":"10.1038\/sdata.2016.81","article-title":"Next generation sequencing data of a defined microbial mock community","volume":"3","author":"E Singer","year":"2016","journal-title":"Sci Data"},{"issue":"3","key":"pcbi.1011001.ref035","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1007\/s12275-020-9525-5","article-title":"Microbial community analysis using high-throughput sequencing technology: a beginner\u2019s guide for microbiologists","volume":"58","author":"J Jo","year":"2020","journal-title":"J Microbiol"},{"issue":"6","key":"pcbi.1011001.ref036","doi-asserted-by":"crossref","DOI":"10.1128\/mSystems.00069-18","article-title":"Evaluating the Information Content of Shallow Shotgun Metagenomics","volume":"3","author":"B Hillmann","year":"2018","journal-title":"mSystems"},{"issue":"12","key":"pcbi.1011001.ref037","doi-asserted-by":"crossref","first-page":"1883","DOI":"10.1093\/bioinformatics\/btw088","article-title":"fqtools: an efficient software suite for modern FASTQ file manipulation","volume":"32","author":"AP Droop","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1011001.ref038","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/1471-2105-15-182","article-title":"Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads","volume":"15","author":"H Jiang","year":"2014","journal-title":"BMC Bioinformatics"},{"issue":"19","key":"pcbi.1011001.ref039","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1093\/bioinformatics\/btw354","article-title":"MultiQC: summarize analysis results for multiple tools and samples in a single report","volume":"32","author":"P Ewels","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1011001.ref040","author":"S Ioffe","year":"2015","journal-title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"},{"key":"pcbi.1011001.ref041","author":"DP Kingma","year":"2014","journal-title":"Adam: A Method for Stochastic Optimization"},{"key":"pcbi.1011001.ref042","volume-title":"Advances in Neural Information Processing Systems","author":"SM Lundberg","year":"2017"},{"key":"pcbi.1011001.ref043","author":"A Shrikumar","year":"2017","journal-title":"Learning Important Features Through Propagating Activation Differences"},{"key":"pcbi.1011001.ref044","author":"S Lai","year":"2021","journal-title":"metaMIC: reference-free Misassembly Identification and Correction of de novo metagenomic assemblies"},{"issue":"4","key":"pcbi.1011001.ref045","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1093\/bioinformatics\/bts723","article-title":"ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies","volume":"29","author":"SC Clark","year":"2013","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1011001.ref046","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2907070","article-title":"A survey of predictive modeling on imbalanced domains","volume":"49","author":"P Branco","year":"2016","journal-title":"ACM Computing Surveys (CSUR)"},{"issue":"19","key":"pcbi.1011001.ref047","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"J K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1011001.ref048","author":"M Abadi","year":"2015","journal-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems"},{"key":"pcbi.1011001.ref049","author":"L McInnes","year":"2018","journal-title":"UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction"},{"key":"pcbi.1011001.ref050","article-title":"Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations?","author":"A Azulay","year":"2019","journal-title":"JMLR"},{"key":"pcbi.1011001.ref051","article-title":"Visual Representation Learning Does Not Generalize Strongly within the Same Domain","author":"L Schott","year":"2022","journal-title":"ICLR"},{"key":"pcbi.1011001.ref052","article-title":"The Many Faces of Robustness: A Critical Analysis of Out-of-distribution Generalization","author":"D Hendrycks","year":"2021","journal-title":"ICCV"},{"key":"pcbi.1011001.ref053","first-page":"137","article-title":"Analysis of representations for domain adaptation","author":"S Ben-David","year":"2007","journal-title":"Advances in neural information processing systems"},{"issue":"7540","key":"pcbi.1011001.ref054","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/518486a","article-title":"Artificial intelligence: Learning to see and act","volume":"518","author":"B Schoelkopf","year":"2015","journal-title":"Nature"},{"key":"pcbi.1011001.ref055","article-title":"Recognition in Terra Incognita","author":"S Beery","year":"2018","journal-title":"ECCV"},{"key":"pcbi.1011001.ref056","article-title":"In Search of Lost Domain Generalization","author":"I Gulrajani","year":"2021","journal-title":"ICLR"},{"key":"pcbi.1011001.ref057","article-title":"Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization","author":"JP Miller","year":"2021","journal-title":"ICML"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011001","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011001","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T15:55:04Z","timestamp":1702310104000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011001"}},"subtitle":[],"editor":[{"given":"Luis Pedro","family":"Coelho","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2023,5,1]]},"references-count":57,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5,1]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011001","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.06.23.497335","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,1]]}}}