{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T17:28:39Z","timestamp":1781803719367,"version":"3.54.5"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,9,11]],"date-time":"2022-09-11T00:00:00Z","timestamp":1662854400000},"content-version":"vor","delay-in-days":10,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"publisher","award":["310030_192569"],"award-info":[{"award-number":["310030_192569"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006396","name":"Alexion Pharmaceuticals Inc","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100006396","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>A knowledge-based grouping of genes into pathways or functional units is essential for describing and understanding cellular complexity. However, it is not always clear a priori how and at what level of specificity functionally interconnected genes should be partitioned into pathways, for a given application. Here, we assess and compare nine existing and two conceptually novel functional classification systems, with respect to their discovery power and generality in gene set enrichment testing. We base our assessment on a collection of nearly 2000 functional genomics datasets provided by users of the STRING database. With these real-life and diverse queries, we assess which systems typically provide the most specific and complete enrichment results. We find many structural and performance differences between classification systems. Overall, the well-established, hierarchically organized pathway annotation systems yield the best enrichment performance, despite covering substantial parts of the human genome in general terms only. On the other hand, the more recent unsupervised annotation systems perform strongest in understudied areas and organisms, and in detecting more specific pathways, albeit with less informative labels.<\/jats:p>","DOI":"10.1093\/bib\/bbac355","type":"journal-article","created":{"date-parts":[[2022,9,11]],"date-time":"2022-09-11T07:54:06Z","timestamp":1662882846000},"source":"Crossref","is-referenced-by-count":21,"title":["Systematic assessment of pathway databases, based on a diverse collection of user-submitted experiments"],"prefix":"10.1093","volume":"23","author":[{"given":"Annika L","family":"Gable","sequence":"first","affiliation":[{"name":"Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Damian","family":"Szklarczyk","sequence":"additional","affiliation":[{"name":"Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland"},{"name":"Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Lyon","sequence":"additional","affiliation":[{"name":"Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland"},{"name":"Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jo\u00e3o F","family":"Matias Rodrigues","sequence":"additional","affiliation":[{"name":"Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7734-9102","authenticated-orcid":false,"given":"Christian","family":"von Mering","sequence":"additional","affiliation":[{"name":"Department of Molecular Life Sciences, University of Zurich , 8057 Zurich, Switzerland"},{"name":"Swiss Institute of Bioinformatics , 1015 Lausanne, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,9,10]]},"reference":[{"key":"2022092013230889600_ref1","doi-asserted-by":"crossref","first-page":"D330","DOI":"10.1093\/nar\/gky1055","article-title":"The gene ontology resource: 20 years and still GOing strong","volume":"47","author":"Carbon","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref2","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc Natl Acad Sci USA"},{"key":"2022092013230889600_ref3","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/nprot.2008.211","article-title":"Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources","volume":"4","author":"Huang","year":"2009","journal-title":"Nat Protoc"},{"key":"2022092013230889600_ref4","doi-asserted-by":"crossref","first-page":"W90","DOI":"10.1093\/nar\/gkw377","article-title":"Enrichr: a comprehensive gene set enrichment analysis web server 2016 update","volume":"44","author":"Kuleshov","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref5","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkaa1106","article-title":"PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API","volume":"49","author":"Mi","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref6","first-page":"269","article-title":"Gene set databases: a fountain of knowledge or a siren call? ACM-BCB 2019- proc. 10th ACM Int. Conf. Bioinforma","volume":"17","author":"Maleki","year":"2019","journal-title":"J Bioinform Comput Biol"},{"key":"2022092013230889600_ref7","doi-asserted-by":"crossref","first-page":"5115","DOI":"10.1038\/s41598-018-23395-2","article-title":"Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations","volume":"8","author":"Tomczak","year":"2018","journal-title":"Sci Rep"},{"key":"2022092013230889600_ref8","doi-asserted-by":"crossref","first-page":"4092","DOI":"10.1038\/srep04092","article-title":"Importance of collection in gene set enrichment analysis of drug response in cancer cell lines","volume":"4","author":"Bateman","year":"2014","journal-title":"Sci Rep"},{"key":"2022092013230889600_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/srep04191","article-title":"Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets","volume":"4","author":"Glass","year":"2014","journal-title":"Sci Rep"},{"key":"2022092013230889600_ref10","doi-asserted-by":"crossref","first-page":"1362","DOI":"10.1038\/s41598-018-19333-x","article-title":"Gene annotation bias impedes biomedical research","volume":"8","author":"Haynes","year":"2018","journal-title":"Sci Rep"},{"key":"2022092013230889600_ref11","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pbio.2006643","article-title":"Large-scale investigation of the reasons why potentially important genes are ignored","volume":"16","author":"Stoeger","year":"2018","journal-title":"PLoS Biol"},{"key":"2022092013230889600_ref12","doi-asserted-by":"crossref","first-page":"4106","DOI":"10.1038\/s41598-020-60456-x","article-title":"Functionally enigmatic genes in cancer: using TCGA data to map the limitations of annotations","volume":"10","author":"Maertens","year":"2020","journal-title":"Sci Rep"},{"key":"2022092013230889600_ref13","first-page":"175","article-title":"Organizing and computing metabolic pathway data in terms of binary relations","author":"Goto","year":"1997","journal-title":"Pac Symp Biocomput Pac Symp Biocomput"},{"key":"2022092013230889600_ref14","article-title":"Reactome: a knowledgebase of biological pathways","volume":"33","author":"Joshi-Tope","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref15","doi-asserted-by":"crossref","first-page":"115D","DOI":"10.1093\/nar\/gkh131","article-title":"UniProt: the universal protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref16","doi-asserted-by":"crossref","first-page":"5857","DOI":"10.1073\/pnas.95.11.5857","article-title":"SMART, a simple modular architecture research tool: identification of signaling domains","volume":"95","author":"Schultz","year":"1998","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2022092013230889600_ref17","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1093\/bioinformatics\/16.12.1145","article-title":"InterPro--an integrated documentation resource for protein families, domains and functional sites","volume":"16","author":"Apweiler","year":"2000","journal-title":"Bioinformatics"},{"key":"2022092013230889600_ref18","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: a comprehensive database of protein domain families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"Proteins Struct Funct Genet"},{"key":"2022092013230889600_ref19","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2022092013230889600_ref20","doi-asserted-by":"crossref","first-page":"2401","DOI":"10.1039\/c3mb70242a","article-title":"Signalling pathway database usability: lessons learned","volume":"9","author":"Tieri","year":"2013","journal-title":"Mol Biosyst"},{"key":"2022092013230889600_ref21","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1093\/database\/bau126","article-title":"Comparison of human cell signaling pathway databases\u2014evolution, drawbacks and challenges","volume":"2015","author":"Chowdhury","year":"2015","journal-title":"Database"},{"key":"2022092013230889600_ref22","doi-asserted-by":"crossref","first-page":"D605","DOI":"10.1093\/nar\/gkaa1074","article-title":"The STRING database in 2021: customizable protein\u2013protein networks, and functional characterization of user-uploaded gene\/measurement sets","volume":"49","author":"Szklarczyk","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref23","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1093\/bioinformatics\/btt657","article-title":"HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences","volume":"30","author":"Matias Rodrigues","year":"2014","journal-title":"Bioinformatics"},{"key":"2022092013230889600_ref24","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gky1131","article-title":"STRING v11: protein\u2013protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref25","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2022092013230889600_ref26","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1007\/978-1-0716-0775-6_28","article-title":"G-protein-coupled receptor expression and purification. Protein Downstr","volume":"2178","author":"Corin","year":"2021","journal-title":"Methods Mol Biol"},{"key":"2022092013230889600_ref27","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1186\/s12864-021-07502-8","article-title":"Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics","volume":"22","author":"Karp","year":"2021","journal-title":"BMC Genomics"},{"key":"2022092013230889600_ref28","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1093\/database\/bay146","article-title":"Involving community in genes and pathway curation","volume":"2019","author":"Naithani","year":"2019","journal-title":"Database"},{"key":"2022092013230889600_ref29","first-page":"D498","article-title":"The reactome pathway knowledgebase","volume":"48","author":"Jassal","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref30","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1186\/s13059-020-02181-2","article-title":"Pathway information extracted from 25 years of pathway figures","volume":"21","author":"Hanspers","year":"2020","journal-title":"Genome Biol"},{"key":"2022092013230889600_ref31","doi-asserted-by":"crossref","first-page":"D613","DOI":"10.1093\/nar\/gkaa1024","article-title":"WikiPathways: connecting communities","volume":"49","author":"Martens","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022092013230889600_ref32","doi-asserted-by":"crossref","first-page":"e46128","DOI":"10.1371\/journal.pone.0046128","article-title":"Length bias correction in gene ontology enrichment analysis using logistic regression","volume":"7","author":"Mi","year":"2012","journal-title":"PLoS ONE"},{"key":"2022092013230889600_ref33","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1007\/978-1-4939-3743-1_14","article-title":"Gene ontology: pitfalls, biases, and remedies","volume":"1446","author":"Gaudet","year":"2017","journal-title":"Gene Ontol Handb"},{"key":"2022092013230889600_ref34","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1016\/j.jmb.2005.01.071","article-title":"The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins","volume":"347","author":"Doszt\u00e1nyi","year":"2005","journal-title":"J Mol Biol"},{"key":"2022092013230889600_ref35","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bay003","article-title":"TISSUES 2.0: an integrative web resource on mammalian tissue expression","volume":"2018","author":"Palasca","year":"2018","journal-title":"Database"},{"key":"2022092013230889600_ref36","doi-asserted-by":"crossref","first-page":"3163","DOI":"10.1002\/pmic.201400441","article-title":"Version 4.0 of PaxDb: protein abundance data, integrated across model organisms, tissues, and cell-lines","volume":"15","author":"Wang","year":"2015","journal-title":"Proteomics"},{"key":"2022092013230889600_ref37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/nar\/gks461","article-title":"Camera: a competitive gene set test accounting for inter-gene correlation","volume":"40","author":"Wu","year":"2012","journal-title":"Nucleic Acids Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac355\/45936581\/bbac355.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac355\/45936581\/bbac355.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,15]],"date-time":"2023-02-15T20:15:25Z","timestamp":1676492125000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac355\/6695266"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9]]},"references-count":37,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac355","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,9]]},"article-number":"bbac355"}}