{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:00Z","timestamp":1772138040259,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T00:00:00Z","timestamp":1598918400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Netherlands Organization for Scientific Research","award":["639.072.309"],"award-info":[{"award-number":["639.072.309"]}]},{"name":"NWO Vidi","award":["864.14.004"],"award-info":[{"award-number":["864.14.004"]}]},{"DOI":"10.13039\/501100004543","name":"Chinese Scholarship Council","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availabilityand implementation<\/jats:title>\n                    <jats:p>Code is made available on Github (https:\/\/github.com\/Marleen1\/OGRE).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa760","type":"journal-article","created":{"date-parts":[[2020,8,25]],"date-time":"2020-08-25T15:13:23Z","timestamp":1598368403000},"page":"905-912","source":"Crossref","is-referenced-by-count":15,"title":["OGRE: Overlap Graph-based metagenomic Read clustEring"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2376-9301","authenticated-orcid":false,"given":"Marleen","family":"Balvert","sequence":"first","affiliation":[{"name":"Life Sciences & Health, Centrum Wiskunde & Informatica , Amsterdam 1098 XG, The Netherlands"},{"name":"Theoretical Biology & Bioinformatics, Utrecht University , Utrecht 3512 JE, The Netherlands"},{"name":"Department of Econometrics & Operations Research, Tilburg University , Tilburg 5000 LE, The Netherlands"}]},{"given":"Xiao","family":"Luo","sequence":"additional","affiliation":[{"name":"Life Sciences & Health, Centrum Wiskunde & Informatica , Amsterdam 1098 XG, The Netherlands"}]},{"given":"Ernestina","family":"Hauptfeld","sequence":"additional","affiliation":[{"name":"Theoretical Biology & Bioinformatics, Utrecht University , Utrecht 3512 JE, The Netherlands"},{"name":"Laboratorium of Microbiology, Wageningen University & Research , Wageningen 6700 HB, The Netherlands"}]},{"given":"Alexander","family":"Sch\u00f6nhuth","sequence":"additional","affiliation":[{"name":"Life Sciences & Health, Centrum Wiskunde & Informatica , Amsterdam 1098 XG, The Netherlands"},{"name":"Theoretical Biology & Bioinformatics, Utrecht University , Utrecht 3512 JE, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2329-7890","authenticated-orcid":false,"given":"Bas E","family":"Dutilh","sequence":"additional","affiliation":[{"name":"Theoretical Biology & Bioinformatics, Utrecht University , Utrecht 3512 JE, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2020,9,1]]},"reference":[{"key":"2023051612180045900_btaa760-B1","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1101\/gr.215038.116","article-title":"De novo assembly of viral quasispecies using overlap graphs","volume":"27","author":"Baaijens","year":"2017","journal-title":"Genome Res"},{"key":"2023051612180045900_btaa760-B2","doi-asserted-by":"crossref","first-page":"4281","DOI":"10.1093\/bioinformatics\/btz255","article-title":"Overlap graph-based generation of haplotigs for diploids and polyploids","volume":"35","author":"Baaijens","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"Spades: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023051612180045900_btaa760-B4","doi-asserted-by":"crossref","first-page":"i649","DOI":"10.1093\/bioinformatics\/btw426","article-title":"Snowball: strain aware gene assembly of metagenomes","volume":"32","author":"Gregor","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B5","volume-title":"MapReduce Tutorial","year":"2019"},{"key":"2023051612180045900_btaa760-B6","doi-asserted-by":"crossref","first-page":"4904","DOI":"10.1073\/pnas.1402564111","article-title":"Tackling soil diversity with the assembly of large, complex metagenomes","volume":"111","author":"Howe","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051612180045900_btaa760-B7","doi-asserted-by":"crossref","first-page":"2964","DOI":"10.1093\/bioinformatics\/btr520","article-title":"Bambus 2: scaffolding metagenomes","volume":"27","author":"Koren","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B8","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B9","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B10","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B11","doi-asserted-by":"crossref","first-page":"10","DOI":"10.14806\/ej.17.1.200","article-title":"Cutadapt removes adapter sequences from high-throughput sequencing reads","volume":"17","author":"Martin","year":"2011","journal-title":"EMBnet. J"},{"key":"2023051612180045900_btaa760-B12","doi-asserted-by":"crossref","first-page":"1088","DOI":"10.1093\/bioinformatics\/btv697","article-title":"Metaquast: evaluation of metagenome assemblies","volume":"32","author":"Mikheenko","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B13","doi-asserted-by":"crossref","first-page":"e155","DOI":"10.1093\/nar\/gks678","article-title":"Metavelvet: an extension of velvet assembler to de novo metagenome assembly from short sequence reads","volume":"40","author":"Namiki","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051612180045900_btaa760-B14","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023051612180045900_btaa760-B15","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1038\/nbt.1754","article-title":"Integrative genomics viewer","volume":"29","author":"Robinson","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023051612180045900_btaa760-B16","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1038\/nmeth.4458","article-title":"Critical assessment of metagenome interpretation-a benchmark of metagenomics software","volume":"14","author":"Sczyrba","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051612180045900_btaa760-B17","doi-asserted-by":"crossref","first-page":"i367","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B18","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1007\/978-3-642-33122-0_32","volume-title":"International Workshop on Algorithms in Bioinformatics","author":"Tanaseichuk","year":"2012"},{"key":"2023051612180045900_btaa760-B19","doi-asserted-by":"crossref","first-page":"i356","DOI":"10.1093\/bioinformatics\/bts397","article-title":"Metacluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample","volume":"28","author":"Wang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051612180045900_btaa760-B20","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1186\/s12859-015-0473-8","article-title":"Mbbc: an efficient approach for metagenomic binning based on clustering","volume":"16","author":"Wang","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023051612180045900_btaa760-B21","first-page":"535","author":"Wu","year":"2011"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa760\/34223034\/btaa760.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/7\/905\/50341332\/btaa760.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/7\/905\/50341332\/btaa760.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T08:19:31Z","timestamp":1684225171000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/7\/905\/5900259"}},"subtitle":[],"editor":[{"given":"Pier","family":"Luigi Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,9,1]]},"references-count":21,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2021,5,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa760","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/511014","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,4,1]]},"published":{"date-parts":[[2020,9,1]]}}}