{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:20Z","timestamp":1772138060711,"version":"3.50.1"},"reference-count":64,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Sony Research Award"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini\u2019s performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings\u2019 performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Gemini can be accessed at: https:\/\/github.com\/MinxZ\/Gemini.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad247","type":"journal-article","created":{"date-parts":[[2023,5,24]],"date-time":"2023-05-24T20:28:39Z","timestamp":1684960119000},"page":"i504-i512","source":"Crossref","is-referenced-by-count":1,"title":["Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling"],"prefix":"10.1093","volume":"39","author":[{"given":"Addie","family":"Woicik","sequence":"first","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingxin","family":"Zhang","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanwen","family":"Xu","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sara","family":"Mostafavi","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"2023063008144653000_btad247-B1","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nrg2918","article-title":"Network medicine: a network-based approach to human disease","volume":"12","author":"Barab\u00e1si","year":"2011","journal-title":"Nat Rev Genet"},{"key":"2023063008144653000_btad247-B2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1038\/ncb1086","article-title":"A physical and functional map of the human TNF-alpha\/NF-kappa B signal transduction pathway","volume":"6","author":"Bouwmeester","year":"2004","journal-title":"Nat Cell Biol"},{"key":"2023063008144653000_btad247-B3","doi-asserted-by":"crossref","first-page":"e1002446","DOI":"10.1371\/journal.pgen.1002446","article-title":"A gene regulatory network for root epidermis cell differentiation in arabidopsis","volume":"8","author":"Bruex","year":"2012","journal-title":"PLoS Genet"},{"key":"2023063008144653000_btad247-B4","doi-asserted-by":"crossref","first-page":"D262","DOI":"10.1093\/nar\/gkh021","article-title":"The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology","volume":"32","author":"Camon","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023063008144653000_btad247-B5","doi-asserted-by":"crossref","first-page":"i219","DOI":"10.1093\/bioinformatics\/btu263","article-title":"New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence","volume":"30","author":"Cao","year":"2014","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B6","doi-asserted-by":"crossref","first-page":"e76339","DOI":"10.1371\/journal.pone.0076339","article-title":"Going the distance for protein function prediction: a new distance metric for protein interaction networks","volume":"8","author":"Cao","year":"2013","journal-title":"PLoS One"},{"key":"2023063008144653000_btad247-B7","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1186\/1471-2105-10-73","article-title":"Disease candidate gene identification and prioritization using protein interaction networks","volume":"10","author":"Chen","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023063008144653000_btad247-B8","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1755-8794-8-S3-S2","article-title":"A fast and high performance multiple data integration algorithm for identifying human disease genes","volume":"8(Suppl 3","author":"Chen","year":"2015","journal-title":"BMC Med Genomics"},{"key":"2023063008144653000_btad247-B9","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.1007\/s11427-014-4745-8","article-title":"Disease gene identification by using graph kernels and markov random fields","volume":"57","author":"Chen","year":"2014","journal-title":"Sci China Life Sci"},{"key":"2023063008144653000_btad247-B10","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1755-8794-7-S2-S2","article-title":"Identifying disease genes by integrating multiple data sources","volume":"7 (Suppl 2)","author":"Chen","year":"2014","journal-title":"BMC Med Genomics"},{"key":"2023063008144653000_btad247-B11","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1016\/j.cels.2016.10.017","article-title":"Compact integration of multi-network topology for functional analysis of genes","volume":"3","author":"Cho","year":"2016","journal-title":"Cell Syst"},{"key":"2023063008144653000_btad247-B12","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1007\/978-3-319-16706-0_9","article-title":"Diffusion component analysis: unraveling functional topology in biological networks","volume":"9029","author":"Cho","year":"2015","journal-title":"Res Comput Mol Biol"},{"key":"2023063008144653000_btad247-B13","doi-asserted-by":"crossref","first-page":"R171","DOI":"10.1093\/hmg\/ddi335","article-title":"Interactome: gateway into systems biology","volume":"14 (Spec No. 2","author":"Cusick","year":"2005","journal-title":"Hum Mol Genet"},{"key":"2023063008144653000_btad247-B14","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1038\/s41592-022-01616-x","article-title":"BIONIC: biological network integration using convolutions","volume":"19","author":"Forster","year":"2022","journal-title":"Nat Methods"},{"key":"2023063008144653000_btad247-B15","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1126\/science.1136800","article-title":"Clustering by passing messages between data points","volume":"315","author":"Frey","year":"2007","journal-title":"Science"},{"key":"2023063008144653000_btad247-B16","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1038\/415141a","article-title":"Functional organization of the yeast proteome by systematic analysis of protein complexes","volume":"415","author":"Gavin","year":"2002","journal-title":"Nature"},{"key":"2023063008144653000_btad247-B17","doi-asserted-by":"crossref","first-page":"3873","DOI":"10.1093\/bioinformatics\/bty440","article-title":"deepNF: deep network fusion for protein function prediction","volume":"34","author":"Gligorijevic","year":"2018","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B18","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1038\/415180a","article-title":"Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry","volume":"415","author":"Ho","year":"2002","journal-title":"Nature"},{"key":"2023063008144653000_btad247-B19","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classification"},{"key":"2023063008144653000_btad247-B20","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023063008144653000_btad247-B21","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/978-1-4757-1904-8_7","volume-title":"Principal Component Analysis","author":"Jolliffe","year":"1986"},{"key":"2023063008144653000_btad247-B22","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"20","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B23","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1101\/gr.118992.110","article-title":"Prioritizing candidate disease genes by network-based boosting of genome-wide association data","volume":"21","author":"Lee","year":"2011","journal-title":"Genome Res"},{"key":"2023063008144653000_btad247-B24","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1089\/omi.2006.10.40","article-title":"Diffusion kernel-based logistic regression models for protein function prediction","volume":"10","author":"Lee","year":"2006","journal-title":"OMICS"},{"key":"2023063008144653000_btad247-B25","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1038\/ng.3168","article-title":"Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes","volume":"47","author":"Leiserson","year":"2015","journal-title":"Nat Genet"},{"key":"2023063008144653000_btad247-B26","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1038\/s41597-020-0516-5","article-title":"Large-scale metabolic interaction network of the mouse and human gut microbiota","volume":"7","author":"Lim","year":"2020","journal-title":"Sci Data"},{"key":"2023063008144653000_btad247-B27","doi-asserted-by":"crossref","first-page":"1219","DOI":"10.1093\/bioinformatics\/btq108","article-title":"Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B28","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1109\/TCBB.2016.2550432","article-title":"Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources","volume":"14","author":"Liu","year":"2017","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023063008144653000_btad247-B29","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.4083","article-title":"A scored human protein-protein interaction network to catalyze genomic interpretation","volume":"14","author":"Li","year":"2017","journal-title":"Nat Methods"},{"key":"2023063008144653000_btad247-B30","doi-asserted-by":"crossref","first-page":"3586","DOI":"10.1038\/s41467-019-11581-3","article-title":"A consensus S. cerevisiae metabolic model yeast8 and its ecosystem for comprehensively probing cellular metabolism","volume":"10","author":"Lu","year":"2019","journal-title":"Nat Commun"},{"key":"2023063008144653000_btad247-B31","doi-asserted-by":"crossref","first-page":"1257601","DOI":"10.1126\/science.1257601","article-title":"Disease networks. uncovering disease-disease relationships through the incomplete interactome","volume":"347","author":"Menche","year":"2015","journal-title":"Science"},{"key":"2023063008144653000_btad247-B32","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1038\/nrg3552","article-title":"Integrative approaches for finding modular structure in biological networks","volume":"14","author":"Mitra","year":"2013","journal-title":"Nat Rev Genet"},{"key":"2023063008144653000_btad247-B33","doi-asserted-by":"crossref","first-page":"1759","DOI":"10.1093\/bioinformatics\/btq262","article-title":"Fast integration of heterogeneous data sources for predicting gene function with limited annotation","volume":"26","author":"Mostafavi","year":"2010","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B34","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s1-s4","article-title":"GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function","volume":"9(Suppl 1","author":"Mostafavi","year":"2008","journal-title":"Genome Biol"},{"key":"2023063008144653000_btad247-B35","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/pro.3978","article-title":"The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions","volume":"30","author":"Oughtred","year":"2021","journal-title":"Protein Sci"},{"key":"2023063008144653000_btad247-B36","author":"Patro","year":"2015"},{"key":"2023063008144653000_btad247-B37","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1109\/TCBB.2015.2394314","article-title":"Predicting protein functions by using unbalanced random walk algorithm on three biological networks","volume":"14","author":"Peng","year":"2017","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023063008144653000_btad247-B38","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1093\/bib\/bbaa036","article-title":"Integrating multi-network topology for gene function prediction using deep neural networks","volume":"22","author":"Peng","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023063008144653000_btad247-B39","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1038\/s41586-021-04115-9","article-title":"A multi-scale map of cell structure fusing protein images and interactions","volume":"600","author":"Qin","year":"2021","journal-title":"Nature"},{"key":"2023063008144653000_btad247-B40","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1080\/01621459.1987.10478441","article-title":"Model-based direct adjustment","volume":"82","author":"Rosenbaum","year":"1987","journal-title":"J Am Stat Assoc"},{"key":"2023063008144653000_btad247-B41","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1038\/82360","article-title":"A network of protein-protein interactions in yeast","volume":"18","author":"Schwikowski","year":"2000","journal-title":"Nat Biotechnol"},{"key":"2023063008144653000_btad247-B42","doi-asserted-by":"crossref","first-page":"1974","DOI":"10.1073\/pnas.0409522102","article-title":"Conserved patterns of protein interaction in multiple species","volume":"102","author":"Sharan","year":"2005","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023063008144653000_btad247-B43","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1038\/msb4100129","article-title":"Network-based prediction of protein function","volume":"3","author":"Sharan","year":"2007","journal-title":"Mol Syst Biol"},{"key":"2023063008144653000_btad247-B44","doi-asserted-by":"crossref","first-page":"i264","DOI":"10.1093\/bioinformatics\/btac258","article-title":"Topsy-Turvy: integrating a global view into sequence-based PPI prediction","volume":"38","author":"Singh","year":"2022","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B45","doi-asserted-by":"crossref","first-page":"3215","DOI":"10.1093\/bioinformatics\/btu508","article-title":"Walking the interactome for candidate prioritization in exome sequencing studies of mendelian diseases","volume":"30","author":"Smedley","year":"2014","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B46","doi-asserted-by":"crossref","first-page":"e1005708","DOI":"10.1371\/journal.pcbi.1005708","article-title":"Analysis of the relationship between coexpression domains and chromatin 3D organization","volume":"13","author":"Soler-Oliva","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2023063008144653000_btad247-B47","doi-asserted-by":"crossref","first-page":"D607","DOI":"10.1093\/nar\/gky1131","article-title":"STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023063008144653000_btad247-B48","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1126\/science.1262110","article-title":"The Genotype-Tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans","volume":"348","author":"The GTEx Consortium","year":"2015","journal-title":"Science"},{"key":"2023063008144653000_btad247-B49","doi-asserted-by":"crossref","first-page":"808","DOI":"10.1126\/science.1091317","article-title":"Global mapping of the yeast genetic interaction network","volume":"303","author":"Tong","year":"2004","journal-title":"Science"},{"key":"2023063008144653000_btad247-B50","doi-asserted-by":"crossref","first-page":"i326","DOI":"10.1093\/bioinformatics\/bth906","article-title":"Learning kernels from biological networks by maximizing entropy","volume":"20(Suppl 1)","author":"Tsuda","year":"2004","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B51","doi-asserted-by":"crossref","first-page":"ii59","DOI":"10.1093\/bioinformatics\/bti1110","article-title":"Fast protein classification with multiple networks","volume":"21(Suppl 2","author":"Tsuda","year":"2005","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B52","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/35001009","article-title":"A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae","volume":"403","author":"Uetz","year":"2000","journal-title":"Nature"},{"key":"2023063008144653000_btad247-B53","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2023063008144653000_btad247-B54","doi-asserted-by":"crossref","first-page":"986","DOI":"10.1016\/j.cell.2011.02.016","article-title":"Interactome networks and human disease","volume":"144","author":"Vidal","year":"2011","journal-title":"Cell"},{"key":"2023063008144653000_btad247-B55","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1186\/1471-2105-10-297","article-title":"Finding local communities in protein networks","volume":"10","author":"Voevodski","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023063008144653000_btad247-B56","doi-asserted-by":"crossref","first-page":"i357","DOI":"10.1093\/bioinformatics\/btv260","article-title":"Exploiting ontology graph for predicting sparsely annotated gene function","volume":"31","author":"Wang","year":"2015","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B57","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1007\/978-3-030-45257-5_36","volume-title":"Research in Computational Molecular Biology","author":"Wang","year":"2020"},{"key":"2023063008144653000_btad247-B58","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat Methods"},{"key":"2023063008144653000_btad247-B59","doi-asserted-by":"crossref","first-page":"W214","DOI":"10.1093\/nar\/gkq537","article-title":"The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function","volume":"38","author":"Warde-Farley","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023063008144653000_btad247-B60","doi-asserted-by":"crossref","first-page":"1805","DOI":"10.1093\/bioinformatics\/btv039","article-title":"Prediction of potential disease-associated microRNAs based on random walk","volume":"31","author":"Xuan","year":"2015","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B61","author":"Zhang","year":"2017"},{"key":"2023063008144653000_btad247-B62","first-page":"3082","author":"Zhang","year":"2018"},{"key":"2023063008144653000_btad247-B63","doi-asserted-by":"crossref","first-page":"i230","DOI":"10.1093\/bioinformatics\/btv258","article-title":"Gene network inference by fusing data from diverse distributions","volume":"31","author":"\u017ditnik","year":"2015","journal-title":"Bioinformatics"},{"key":"2023063008144653000_btad247-B64","doi-asserted-by":"crossref","first-page":"e1000140","DOI":"10.1371\/journal.pcbi.1000140","article-title":"Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality","volume":"4","author":"Zotenko","year":"2008","journal-title":"PLoS Comput Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i504\/50741462\/btad247.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/Supplement_1\/i504\/50741462\/btad247.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T18:20:28Z","timestamp":1702491628000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/39\/Supplement_1\/i504\/7210443"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,1]]},"references-count":64,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2023,6,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad247","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.01.21.525026","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,6,1]]},"published":{"date-parts":[[2023,6,1]]}}}