{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T13:31:10Z","timestamp":1772890270768,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"W1","license":[{"start":{"date-parts":[[2021,5,14]],"date-time":"2021-05-14T00:00:00Z","timestamp":1620950400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research"},{"DOI":"10.13039\/501100018929","name":"de.NBI","doi-asserted-by":"publisher","award":["FKZ 031A533B"],"award-info":[{"award-number":["FKZ 031A533B"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The EDGAR platform, a web server providing databases of precomputed orthology data for thousands of microbial genomes, is one of the most established tools in the field of comparative genomics and phylogenomics. Based on precomputed gene alignments, EDGAR allows quick identification of the differential gene content, i.e. the pan genome, the core genome, or singleton genes. Furthermore, EDGAR features a wide range of analyses and visualizations like Venn diagrams, synteny plots, phylogenetic trees, as well as Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI) matrices. During the last few years, the average number of genomes analyzed in an EDGAR project increased by two orders of magnitude. To handle this massive increase, a completely new technical backend infrastructure for the EDGAR platform was designed and launched as EDGAR3.0. For the calculation of new EDGAR3.0 projects, we are now using a scalable Kubernetes cluster running in a cloud environment. A new storage infrastructure was developed using a file-based high-performance storage backend which ensures timely data handling and efficient access. The new data backend guarantees a memory efficient calculation of orthologs, and parallelization has led to drastically reduced processing times. Based on the advanced technical infrastructure new analysis features could be implemented including POCP and FastANI genomes similarity indices, UpSet intersecting set visualization, and circular genome plots. Also the public database section of EDGAR was largely updated and now offers access to 24,317 genomes in 749 free-to-use projects. In summary, EDGAR 3.0 provides a new, scalable infrastructure for comprehensive microbial comparative gene content analysis. The web server is accessible at http:\/\/edgar3.computational.bio.<\/jats:p>","DOI":"10.1093\/nar\/gkab341","type":"journal-article","created":{"date-parts":[[2021,4,22]],"date-time":"2021-04-22T19:53:13Z","timestamp":1619121193000},"page":"W185-W192","source":"Crossref","is-referenced-by-count":128,"title":["EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure"],"prefix":"10.1093","volume":"49","author":[{"given":"Marius\u00a0Alfred","family":"Dieckmann","sequence":"first","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]},{"given":"Sebastian","family":"Beyvers","sequence":"additional","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]},{"given":"Rudel\u00a0Christian","family":"Nkouamedjo-Fankep","sequence":"additional","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]},{"given":"Patrick\u00a0Harald\u00a0Georg","family":"Hanel","sequence":"additional","affiliation":[{"name":"Institute of Computational Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Ingolst\u00e4dter Landstra\u00dfe 1, D-85764 Neuherberg, Germany"}]},{"given":"Lukas","family":"Jelonek","sequence":"additional","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6455-3622","authenticated-orcid":false,"given":"Jochen","family":"Blom","sequence":"additional","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]},{"given":"Alexander","family":"Goesmann","sequence":"additional","affiliation":[{"name":"Bioinformatics & Systems Biology, Justus Liebig University Gie\u00dfen, Heinrich-Buff-Ring 58, 35390 Gie\u00dfen, Hesse, Germany"}]}],"member":"286","published-online":{"date-parts":[[2021,5,14]]},"reference":[{"key":"2021070812073098600_B1","doi-asserted-by":"crossref","first-page":"13950","DOI":"10.1073\/pnas.0506758102","article-title":"Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial \u2018pan-genome\u2019","volume":"102","author":"Tettelin","year":"2005","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"2021070812073098600_B2","doi-asserted-by":"crossref","first-page":"2567","DOI":"10.1073\/pnas.0409727102","article-title":"Genomic insights that advance the species definition for prokaryotes","volume":"102","author":"Konstantinidis","year":"2005","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"2021070812073098600_B3","doi-asserted-by":"crossref","first-page":"1929","DOI":"10.1098\/rstb.2006.1920","article-title":"The bacterial species definition in the genomic era","volume":"361","author":"Konstantinidis","year":"2006","journal-title":"Philos. T. R. Soc. B"},{"key":"2021070812073098600_B4","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1016\/j.mib.2007.08.006","article-title":"Prokaryotic taxonomy and phylogeny in the genomic era: advancements and challenges ahead","volume":"10","author":"Konstantinidis","year":"2007","journal-title":"Curr. Opin. Microbiol."},{"key":"2021070812073098600_B5","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2021070812073098600_B6","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pbio.1002195","article-title":"Big data: astronomical or genomical?","volume":"13","author":"Stephens","year":"2015","journal-title":"PLoS Biol."},{"key":"2021070812073098600_B7","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1186\/1471-2105-10-154","article-title":"EDGAR: a software framework for the comparative analysis of prokaryotic genomes","volume":"10","author":"Blom","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2021070812073098600_B8","doi-asserted-by":"crossref","first-page":"W22","DOI":"10.1093\/nar\/gkw255","article-title":"EDGAR 2.0: an enhanced software platform for comparative gene content analyses","volume":"44","author":"Blom","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/9781118960608.bm00038","article-title":"EDGAR: A Versatile Tool for Phylogenomics","volume-title":"Bergey's Manual of Systematics of Archaea and Bacteria","author":"Blom","year":"2019"},{"key":"2021070812073098600_B10","doi-asserted-by":"crossref","first-page":"1983","DOI":"10.1109\/TVCG.2014.2346248","article-title":"UpSet: visualization of intersecting sets","volume":"20","author":"Lex","year":"2014","journal-title":"IEEE T. Vis. Comput. Gr."},{"key":"2021070812073098600_B11","doi-asserted-by":"crossref","first-page":"2938","DOI":"10.1093\/bioinformatics\/btx364","article-title":"UpSetR: an R package for the visualization of intersecting sets and their properties","volume":"33","author":"Conway","year":"2017","journal-title":"Bioinformatics"},{"key":"2021070812073098600_B12","doi-asserted-by":"crossref","first-page":"5114","DOI":"10.1038\/s41467-018-07641-9","article-title":"High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries","volume":"9","author":"Jain","year":"2018","journal-title":"Nat. commun."},{"key":"2021070812073098600_B13","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1128\/JB.01688-14","article-title":"A proposed genus boundary for the prokaryotes based on genomic insights","volume":"196","author":"Qin","year":"2014","journal-title":"J. Bacteriol."},{"key":"2021070812073098600_B14","doi-asserted-by":"crossref","first-page":"1740","DOI":"10.1093\/bioinformatics\/btw041","article-title":"BioCircos. js: an interactive Circos JavaScript library for biological data visualization on web applications","volume":"32","author":"Cui","year":"2016","journal-title":"Bioinformatics"},{"key":"2021070812073098600_B15","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/978-1-4939-3167-5_2","article-title":"UniProtKB\/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view","volume-title":"Plant Bioinformatics","author":"Boutet","year":"2016"},{"key":"2021070812073098600_B16","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B17","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1038\/s41587-019-0036-z","article-title":"SignalP 5.0 improves signal peptide predictions using deep neural networks","volume":"37","author":"Armenteros","year":"2019","journal-title":"Nat. Biotechnol."},{"key":"2021070812073098600_B18","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1006\/jmbi.2000.4315","article-title":"Predicting transmembrane protein topology with a hidden markov model: application to complete genomes11Edited by F. Cohen","volume":"305","author":"Krogh","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2021070812073098600_B19","doi-asserted-by":"crossref","first-page":"1005","DOI":"10.1006\/jmbi.2000.3903","article-title":"Predicting subcellular localization of proteins based on their N-terminal amino acid sequence","volume":"300","author":"Emanuelsson","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2021070812073098600_B20","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1186\/1471-2105-5-198","article-title":"GenomeViz: visualizing microbial genomes","volume":"5","author":"Ghai","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2021070812073098600_B21","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1093\/bioinformatics\/btu393","article-title":"circlize implements and enhances circular visualization in R","volume":"30","author":"Gu","year":"2014","journal-title":"Bioinformatics"},{"key":"2021070812073098600_B22","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1101\/gr.092759.109","article-title":"Circos: an information aesthetic for comparative genomics","volume":"19","author":"Krzywinski","year":"2009","journal-title":"Genome Res."},{"key":"2021070812073098600_B23","doi-asserted-by":"crossref","first-page":"D94","DOI":"10.1093\/nar\/gky989","article-title":"GenBank","volume":"47","author":"Sayers","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B24","doi-asserted-by":"crossref","first-page":"D851","DOI":"10.1093\/nar\/gkx1068","article-title":"RefSeq: an update on prokaryotic genome annotation and curation","volume":"46","author":"Haft","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B25","doi-asserted-by":"crossref","first-page":"7696","DOI":"10.1128\/AEM.02411-13","article-title":"GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis","volume":"79","author":"Contreras-Moreira","year":"2013","journal-title":"Appl. Environ. Microb."},{"key":"2021070812073098600_B26","doi-asserted-by":"crossref","first-page":"3691","DOI":"10.1093\/bioinformatics\/btv421","article-title":"Roary: rapid large-scale prokaryote pan genome analysis","volume":"31","author":"Page","year":"2015","journal-title":"Bioinformatics"},{"key":"2021070812073098600_B27","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1002\/0471250953.bi0612s35","article-title":"Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups","volume":"35","author":"Fischer","year":"2011","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2021070812073098600_B28","doi-asserted-by":"crossref","first-page":"D477","DOI":"10.1093\/nar\/gkx1019","article-title":"The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces","volume":"46","author":"Altenhoff","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B29","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1186\/s13059-019-1832-y","article-title":"OrthoFinder: phylogenetic orthology inference for comparative genomics","volume":"20","author":"Emms","year":"2019","journal-title":"Genome Biol."},{"key":"2021070812073098600_B30","doi-asserted-by":"crossref","first-page":"e5","DOI":"10.1093\/nar\/gkx977","article-title":"panX: pan-genome analysis and exploration","volume":"46","author":"Ding","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B31","doi-asserted-by":"crossref","first-page":"D1020","DOI":"10.1093\/nar\/gkaa1105","article-title":"RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation","volume":"49","author":"Li","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"2021070812073098600_B32","doi-asserted-by":"crossref","first-page":"e1007134","DOI":"10.1371\/journal.pcbi.1007134","article-title":"ASA3P: an automatic and scalable pipeline for the assembly, annotation and higher level analysis of closely related bacterial isolates","volume":"16","author":"Schwengers","year":"2020","journal-title":"PLoS Comput. Biol."},{"key":"2021070812073098600_B33","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nbt.4229","article-title":"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life","volume":"36","author":"Parks","year":"2018","journal-title":"Nat. Biotechnol"}],"container-title":["Nucleic Acids Research"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/nar\/article-pdf\/49\/W1\/W185\/38842080\/gkab341.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/nar\/article-pdf\/49\/W1\/W185\/38842080\/gkab341.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,25]],"date-time":"2022-12-25T06:11:49Z","timestamp":1671948709000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/nar\/article\/49\/W1\/W185\/6275665"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,14]]},"references-count":33,"journal-issue":{"issue":"W1","published-online":{"date-parts":[[2021,5,14]]},"published-print":{"date-parts":[[2021,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/nar\/gkab341","relation":{},"ISSN":["0305-1048","1362-4962"],"issn-type":[{"value":"0305-1048","type":"print"},{"value":"1362-4962","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7,2]]},"published":{"date-parts":[[2021,5,14]]}}}