{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:17Z","timestamp":1772138057356,"version":"3.50.1"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2022,10,22]],"date-time":"2022-10-22T00:00:00Z","timestamp":1666396800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01MH113005"],"award-info":[{"award-number":["R01MH113005"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01LM012736"],"award-info":[{"award-number":["R01LM012736"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U19MH114821"],"award-info":[{"award-number":["U19MH114821"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,12,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Interactions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect \u2018ground truth\u2019 information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10\u201350 genes), and tissue-specific secondary markers (100\u2013500\u202fgenes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Code for analyses and figures is available at https:\/\/github.com\/yexilein\/pyroc.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac692","type":"journal-article","created":{"date-parts":[[2022,10,20]],"date-time":"2022-10-20T16:51:16Z","timestamp":1666284676000},"page":"5390-5397","source":"Crossref","is-referenced-by-count":5,"title":["Defining the extent of gene function using ROC curvature"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7034-4103","authenticated-orcid":false,"given":"Stephan","family":"Fischer","sequence":"first","affiliation":[{"name":"Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics , Cold Spring Harbor, NY 11724, USA"},{"name":"Institut Pasteur, Universit\u00e9 Paris Cit\u00e9, Bioinformatics and Biostatistics Hub , Paris F-75015, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0936-9774","authenticated-orcid":false,"given":"Jesse","family":"Gillis","sequence":"additional","affiliation":[{"name":"Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics , Cold Spring Harbor, NY 11724, USA"},{"name":"Department of Physiology, University of Toronto , Toronto, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,10,22]]},"reference":[{"key":"2022121418404697800_btac692-B1","doi-asserted-by":"crossref","first-page":"D240","DOI":"10.1093\/nar\/gku1158","article-title":"The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements","volume":"43","author":"Altenhoff","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B2","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1038\/s41592-021-01232-1","article-title":"Graphical assessment of tests and classifiers","volume":"18","author":"Altman","year":"2021","journal-title":"Nat. Methods"},{"key":"2022121418404697800_btac692-B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2022121418404697800_btac692-B4","first-page":"111","article-title":"Comparative cellular analysis of motor cortex in human, marmoset and mouse","volume":"598","author":"Bakken","year":"2021"},{"key":"2022121418404697800_btac692-B5","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1093\/bioinformatics\/btw695","article-title":"EGAD: ultra-fast functional analysis of gene networks","volume":"33","author":"Ballouz","year":"2017","journal-title":"Bioinformatics"},{"key":"2022121418404697800_btac692-B6","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1038\/nrg1272","article-title":"Network biology: understanding the cell\u2019s functional organization","volume":"5","author":"Barab\u00e1si","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2022121418404697800_btac692-B7","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2022121418404697800_btac692-B8","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nrg3433","article-title":"Computational solutions for omics data","volume":"14","author":"Berger","year":"2013","journal-title":"Nat. Rev. Genet"},{"key":"2022121418404697800_btac692-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s43586-021-00093-4","article-title":"High-content CRISPR screening","volume":"2","author":"Bock","year":"2022","journal-title":"Nat. Rev. Methods Primer"},{"key":"2022121418404697800_btac692-B10","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1016\/j.cell.2017.05.038","article-title":"An expanded view of complex traits: from polygenic to omnigenic","volume":"169","author":"Boyle","year":"2017","journal-title":"Cell"},{"key":"2022121418404697800_btac692-B11","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/btm270","article-title":"Predicting functionally important residues from sequence conservation","volume":"23","author":"Capra","year":"2007","journal-title":"Bioinformatics"},{"key":"2022121418404697800_btac692-B12","doi-asserted-by":"crossref","first-page":"6491","DOI":"10.1073\/pnas.1802973116","article-title":"Predictability of human differential gene expression","volume":"116","author":"Crow","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2022121418404697800_btac692-B13","first-page":"233","author":"Davis","year":"2006"},{"key":"2022121418404697800_btac692-B14","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1016\/j.tig.2013.09.005","article-title":"CAFA and the open world of protein function predictions","volume":"29","author":"Dessimoz","year":"2013","journal-title":"Trends Genet"},{"key":"2022121418404697800_btac692-B15","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2022121418404697800_btac692-B16","doi-asserted-by":"crossref","first-page":"103292","DOI":"10.1016\/j.isci.2021.103292","article-title":"How many markers are needed to robustly determine a cell\u2019s type?","volume":"24","author":"Fischer","year":"2021","journal-title":"iScience"},{"key":"2022121418404697800_btac692-B17","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1016\/j.tig.2010.05.001","article-title":"From \u2018differential expression\u2019 to \u2018differential networking\u2019 \u2013 identification of dysfunctional regulatory networks in diseases","volume":"26","author":"de la Fuente","year":"2010","journal-title":"Trends Genet"},{"key":"2022121418404697800_btac692-B18","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1038\/nrg3118","article-title":"Rare and common variants: twenty arguments","volume":"13","author":"Gibson","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2022121418404697800_btac692-B19","doi-asserted-by":"crossref","first-page":"e17258","DOI":"10.1371\/journal.pone.0017258","article-title":"The impact of multifunctional genes on guilt \u201cby association\u201d analysis","volume":"6","author":"Gillis","year":"2011","journal-title":"PLoS One"},{"key":"2022121418404697800_btac692-B20","doi-asserted-by":"crossref","first-page":"1860","DOI":"10.1093\/bioinformatics\/btr288","article-title":"The role of indirect connections in gene networks in predicting function","volume":"27","author":"Gillis","year":"2011","journal-title":"Bioinformatics"},{"key":"2022121418404697800_btac692-B21","doi-asserted-by":"crossref","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nat. Commun"},{"key":"2022121418404697800_btac692-B22","doi-asserted-by":"crossref","first-page":"E5272","DOI":"10.1073\/pnas.1419064111","article-title":"Measuring missing heritability: inferring the contribution of common variants","volume":"111","author":"Golan","year":"2014","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2022121418404697800_btac692-B23","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1177\/0962280209351908","article-title":"Gene set enrichment analysis made simple","volume":"18","author":"Irizarry","year":"2009","journal-title":"Stat. Methods Med. Res"},{"key":"2022121418404697800_btac692-B24","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1093\/ije\/dyz274","article-title":"Reflection on modern methods: revisiting the area under the ROC curve","volume":"49","author":"Janssens","year":"2020","journal-title":"Int. J. Epidemiol"},{"key":"2022121418404697800_btac692-B25","doi-asserted-by":"crossref","first-page":"1219","DOI":"10.1038\/s41588-018-0183-z","article-title":"Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations","volume":"50","author":"Khera","year":"2018","journal-title":"Nat. Genet"},{"key":"2022121418404697800_btac692-B26","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2022121418404697800_btac692-B27","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1093\/bfgp\/elaa013","article-title":"Machine learning-based approaches for disease gene prediction","volume":"19","author":"Le","year":"2020","journal-title":"Brief. Funct. Genomics"},{"key":"2022121418404697800_btac692-B28","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1038\/nrm2281","article-title":"Predicting protein function from sequence and structure","volume":"8","author":"Lee","year":"2007","journal-title":"Nat. Rev. Mol. Cell Biol"},{"key":"2022121418404697800_btac692-B29","doi-asserted-by":"crossref","first-page":"W566","DOI":"10.1093\/nar\/gkaa348","article-title":"CoCoCoNet: conserved and comparative co-expression across a diverse set of species","volume":"48","author":"Lee","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B30","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s13073-020-00742-5","article-title":"Polygenic risk scores: from research tools to clinical instruments","volume":"12","author":"Lewis","year":"2020","journal-title":"Genome Med"},{"key":"2022121418404697800_btac692-B31","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/nrg3920","article-title":"Machine learning applications in genetics and genomics","volume":"16","author":"Libbrecht","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2022121418404697800_btac692-B32","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1007\/978-3-662-44851-9_21","volume-title":"Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science","author":"Lopes","year":"2014"},{"key":"2022121418404697800_btac692-B33","doi-asserted-by":"crossref","first-page":"e11376","DOI":"10.1002\/aps3.11376","article-title":"Machine learning: a powerful tool for gene function prediction in plants","volume":"8","author":"Mahood","year":"2020","journal-title":"Appl. Plant Sci"},{"key":"2022121418404697800_btac692-B34","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1177\/0272989X8900900307","article-title":"Analyzing a portion of the ROC curve","volume":"9","author":"McClish","year":"1989","journal-title":"Med. Decis. Making"},{"key":"2022121418404697800_btac692-B35","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1038\/s41592-021-01302-4","article-title":"The class imbalance problem","volume":"18","author":"Megahed","year":"2021","journal-title":"Nat. Methods"},{"key":"2022121418404697800_btac692-B36","doi-asserted-by":"crossref","first-page":"e1002187","DOI":"10.1371\/journal.pcbi.1002187","article-title":"Heat shock partially dissociates the overlapping modules of the yeast protein\u2013protein interaction network: a systems level model of adaptation","volume":"7","author":"Mihalik","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2022121418404697800_btac692-B38","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s1-s4","article-title":"GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function","volume":"9","author":"Mostafavi","year":"2008","journal-title":"Genome Biol"},{"key":"2022121418404697800_btac692-B39","doi-asserted-by":"crossref","first-page":"D529","DOI":"10.1093\/nar\/gky1079","article-title":"The BioGRID interaction database: 2019 update","volume":"47","author":"Oughtred","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B40","doi-asserted-by":"crossref","first-page":"18026","DOI":"10.1073\/pnas.1114759108","article-title":"Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants","volume":"108","author":"Park","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2022121418404697800_btac692-B41","doi-asserted-by":"crossref","first-page":"e1000054","DOI":"10.1371\/journal.pcbi.1000054","article-title":"Predicting co-complexed protein pairs from heterogeneous data","volume":"4","author":"Qiu","year":"2008","journal-title":"PLoS Comput. Biol"},{"key":"2022121418404697800_btac692-B42","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.1016\/j.cell.2022.05.013","article-title":"Mapping information-rich genotype\u2013phenotype landscapes with genome-scale Perturb-seq","volume":"185","author":"Replogle","year":"2022","journal-title":"Cell"},{"key":"2022121418404697800_btac692-B43","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1002\/prot.25416","article-title":"Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants","volume":"86","author":"Rifaioglu","year":"2018","journal-title":"Proteins Struct. Funct. Bioinform"},{"key":"2022121418404697800_btac692-B44","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision\u2013recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2022121418404697800_btac692-B45","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","author":"Schaum","year":"2018","journal-title":"Nature"},{"key":"2022121418404697800_btac692-B48","doi-asserted-by":"crossref","first-page":"D535","DOI":"10.1093\/nar\/gkj109","article-title":"BioGRID: a general repository for interaction datasets","volume":"34","author":"Stark","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B49","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2022121418404697800_btac692-B50","doi-asserted-by":"crossref","first-page":"D380","DOI":"10.1093\/nar\/gkv1277","article-title":"STITCH 5: augmenting protein\u2013chemical interaction networks with tissue and affinity data","volume":"44","author":"Szklarczyk","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B51","doi-asserted-by":"crossref","first-page":"D325","DOI":"10.1093\/nar\/gkaa1113","article-title":"The Gene Ontology resource: enriching a GOld mine","volume":"49","author":"The Gene Ontology Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B52","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"The UniProt Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022121418404697800_btac692-B53","doi-asserted-by":"crossref","first-page":"e1002386","DOI":"10.1371\/journal.pcbi.1002386","article-title":"On the use of gene ontology annotations to assess functional similarity among orthologs and paralogs: a short report","volume":"8","author":"Thomas","year":"2012","journal-title":"PLoS Comput. Biol"},{"key":"2022121418404697800_btac692-B54","doi-asserted-by":"crossref","first-page":"2025","DOI":"10.1002\/sim.2103","article-title":"The partial area under the summary ROC curve","volume":"24","author":"Walter","year":"2005","journal-title":"Stat. Med"},{"key":"2022121418404697800_btac692-B55","first-page":"774","article-title":"Decoding disease: From genomes to networks to phenotypes","volume":"22","author":"Wong","year":"2021"},{"key":"2022121418404697800_btac692-B56","doi-asserted-by":"crossref","first-page":"i262","DOI":"10.1093\/bioinformatics\/btab270","article-title":"DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction","volume":"37","author":"You","year":"2021","journal-title":"Bioinformatics"},{"key":"2022121418404697800_btac692-B57","doi-asserted-by":"crossref","first-page":"e1003644","DOI":"10.1371\/journal.pcbi.1003644","article-title":"Negative example selection for protein function prediction: the NoGO database","volume":"10","author":"Youngs","year":"2014","journal-title":"PLoS Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac692\/46908390\/btac692.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/24\/5390\/47886961\/btac692.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/24\/5390\/47886961\/btac692.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,9]],"date-time":"2023-03-09T05:02:57Z","timestamp":1678338177000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/24\/5390\/6769888"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,10,22]]},"references-count":54,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,10,22]]},"published-print":{"date-parts":[[2022,12,13]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac692","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.09.03.458825","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,12,15]]},"published":{"date-parts":[[2022,10,22]]}}}