{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,22]],"date-time":"2026-02-22T02:59:55Z","timestamp":1771729195501,"version":"3.50.1"},"reference-count":61,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2023,12,28]],"date-time":"2023-12-28T00:00:00Z","timestamp":1703721600000},"content-version":"vor","delay-in-days":36,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004543","name":"China Scholarship Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australia Research Council","doi-asserted-by":"crossref","award":["LP180101085"],"award-info":[{"award-number":["LP180101085"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000923","name":"Australia Research Council","doi-asserted-by":"crossref","award":["LP220200614"],"award-info":[{"award-number":["LP220200614"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The identification and characterization of essential genes are central to our understanding of the core biological functions in eukaryotic organisms, and has important implications for the treatment of diseases caused by, for example, cancers and pathogens. Given the major constraints in testing the functions of genes of many organisms in the laboratory, due to the absence of in vitro cultures and\/or gene perturbation assays for most metazoan species, there has been a need to develop in silico tools for the accurate prediction or inference of essential genes to underpin systems biological investigations. Major advances in machine learning approaches provide unprecedented opportunities to overcome these limitations and accelerate the discovery of essential genes on a genome-wide scale. Here, we developed and evaluated a large language model- and graph neural network (LLM\u2013GNN)-based approach, called \u2018Bingo\u2019, to predict essential protein-coding genes in the metazoan model organisms Caenorhabditis elegans and Drosophila melanogaster as well as in Mus musculus and Homo sapiens (a HepG2 cell line) by integrating LLM and GNNs with adversarial training. Bingo predicts essential genes under two \u2018zero-shot\u2019 scenarios with transfer learning, showing promise to compensate for a lack of high-quality genomic and proteomic data for non-model organisms. In addition, the attention mechanisms and GNNExplainer were employed to manifest the functional sites and structural domain with most contribution to essentiality. In conclusion, Bingo provides the prospect of being able to accurately infer the essential genes of little- or under-studied organisms of interest, and provides a biological explanation for gene essentiality.<\/jats:p>","DOI":"10.1093\/bib\/bbad472","type":"journal-article","created":{"date-parts":[[2023,12,28]],"date-time":"2023-12-28T10:56:18Z","timestamp":1703760978000},"source":"Crossref","is-referenced-by-count":22,"title":["\u2018Bingo\u2019\u2014a large language model- and graph neural network-based workflow for the prediction of essential genes from protein data"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8126-0431","authenticated-orcid":false,"given":"Jiani","family":"Ma","sequence":"first","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"School of Information and Control Engineering, China University of Mining and Technology , Xuzhou 221116 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8031-9086","authenticated-orcid":false,"given":"Jiangning","family":"Song","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University , Melbourne, Victoria 3800 , Australia"}]},{"given":"Neil D","family":"Young","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"}]},{"given":"Bill C H","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"}]},{"given":"Pasi K","family":"Korhonen","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"}]},{"given":"Tulio L","family":"Campos","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"Bioinformatics Core Facility, Instituto Aggeu Magalhaes, Funda\u00e7ao Oswaldo Cruz (IAM-Fiocruz), Recife , Pernambuco , Brazil"}]},{"given":"Hui","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information and Control Engineering, China University of Mining and Technology , Xuzhou 221116 , China"}]},{"given":"Robin B","family":"Gasser","sequence":"additional","affiliation":[{"name":"Department of Veterinary Biosciences , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"},{"name":"The University of Melbourne , Melbourne Veterinary School, , Parkville, Victoria 3010 , Australia"}]}],"member":"286","published-online":{"date-parts":[[2023,12,27]]},"reference":[{"key":"2023122808592446100_ref1","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1126\/science.142.3597.1269","article-title":"Lethal genes and analysis of differentiation","volume":"142","author":"Gluecksohn-Waelsch","year":"1963","journal-title":"Science"},{"key":"2023122808592446100_ref2","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/s10142-002-0059-1","article-title":"Yeast and drug discovery","volume":"2","author":"Hughes","year":"2002","journal-title":"Funct Integr Genomics"},{"key":"2023122808592446100_ref3","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1101\/gr.87702","article-title":"Essential genes are more evolutionarily conserved than are nonessential genes in bacteria","volume":"12","author":"Jordan","year":"2002","journal-title":"Genome Res"},{"key":"2023122808592446100_ref4","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgen.1003484","article-title":"From mouse to human: evolutionary genomics analysis of human orthologs of essential genes","volume":"9","author":"Georgi","year":"2013","journal-title":"PLoS Genet"},{"key":"2023122808592446100_ref5","doi-asserted-by":"crossref","first-page":"6396","DOI":"10.1002\/anie.201609229","article-title":"Synthetic biology\u2014the synthesis of biology","volume":"56","author":"Ausla\u00a8nder","year":"2017","journal-title":"Angew Chem Int Ed Engl"},{"key":"2023122808592446100_ref6","doi-asserted-by":"crossref","first-page":"eaap7847","DOI":"10.1126\/science.aap7847","article-title":"Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis","volume":"360","author":"Zhang","year":"2018","journal-title":"Science"},{"key":"2023122808592446100_ref7","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1186\/1471-2164-11-222","article-title":"Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes","volume":"11","author":"Doyle","year":"2010","journal-title":"BMC Genomics"},{"key":"2023122808592446100_ref8","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1038\/nbt.3235","article-title":"Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains","volume":"33","author":"Shi","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2023122808592446100_ref9","doi-asserted-by":"crossref","first-page":"e1500248","DOI":"10.1126\/sciadv.1500248","article-title":"A Candida albicans CRISPR system permits genetic engineering of essential genes and gene families","volume":"1","author":"Vyas","year":"2015","journal-title":"Sci Adv"},{"issue":"54","key":"2023122808592446100_ref10","doi-asserted-by":"crossref","first-page":"107822","DOI":"10.1016\/j.biotechadv.2021.107822","article-title":"Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes-biotechnological implications","volume":"2022","author":"Campos","year":"2022","journal-title":"Biotechnol Adv"},{"key":"2023122808592446100_ref11","doi-asserted-by":"crossref","first-page":"dmm034546","DOI":"10.1242\/dmm.034546","article-title":"Identifying mouse developmental essential genes using machine learning","volume":"11","author":"Tian","year":"2018","journal-title":"Dis Model Mech"},{"key":"2023122808592446100_ref12","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1093\/bib\/bbz072","article-title":"New insights on human essential genes based on integrated analysis and the construction of the HEGIAP web-based platform","volume":"21","author":"Chen","year":"2019","journal-title":"Brief Bioinform"},{"key":"2023122808592446100_ref13","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1016\/j.csbj.2020.02.022","article-title":"Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features","volume":"18","author":"Aromolaran","year":"2020","journal-title":"Comput Struct BiotechnolJ"},{"key":"2023122808592446100_ref14","doi-asserted-by":"crossref","first-page":"bbab128","DOI":"10.1093\/bib\/bbab128","article-title":"Machine learning approach to gene essentiality prediction: a review","volume":"22","author":"Aromolaran","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023122808592446100_ref15","first-page":"296","article-title":"A deep learning framework for identifying essential proteins by integrating multiple types of biological information","volume":"18","author":"Zeng","year":"2021","journal-title":"TCBB"},{"key":"2023122808592446100_ref16","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/nrg.2017.74","article-title":"Emerging and evolving concepts in gene essentiality","volume":"19","author":"Rancati","year":"2018","journal-title":"Nat Rev Genet"},{"key":"2023122808592446100_ref17","first-page":"171","article-title":"Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment","volume":"21","author":"Dong","year":"2020","journal-title":"Brief Bioinform"},{"key":"2023122808592446100_ref18","first-page":"75","article-title":"Predicting essential genes and proteins based on machine learning and network topological features: a comprehensive review","volume":"7","author":"Zhang","year":"2016","journal-title":"Front Physiol"},{"key":"2023122808592446100_ref19","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1093\/bib\/bbz017","article-title":"Network-based methods for predicting essential genes or proteins: a survey","volume":"21","author":"Li","year":"2020","journal-title":"Brief Bioinform"},{"key":"2023122808592446100_ref20","doi-asserted-by":"crossref","first-page":"lqab110","DOI":"10.1093\/nargab\/lqab110","article-title":"Identifying essential genes across eukaryotes by machine learning","volume":"3","author":"Beder","year":"2021","journal-title":"Nar Genom Bioinform"},{"key":"2023122808592446100_ref21","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1016\/j.csbj.2019.05.008","article-title":"An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features","volume":"17","author":"Campos","year":"2019","journal-title":"Comput Struct Biotechnol J"},{"key":"2023122808592446100_ref22","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/j.csbj.2020.05.008","article-title":"Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine learning","volume":"18","author":"Campos","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2023122808592446100_ref23","doi-asserted-by":"crossref","first-page":"lqaa051","DOI":"10.1093\/nargab\/lqaa051","article-title":"Combined use of feature engineering and machine learning to predict essential genes in Drosophila melanogaster","volume":"2","author":"Campos","year":"2020","journal-title":"NAR Genomics Bioinform"},{"key":"2023122808592446100_ref24","doi-asserted-by":"crossref","first-page":"5056","DOI":"10.3390\/ijms22105056","article-title":"Cross-predicting essential genes between two model eukaryotic species using machine learning","volume":"22","author":"Campos","year":"2021","journal-title":"Int J Mol Sci"},{"key":"2023122808592446100_ref25","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1186\/s12859-019-3076-y","article-title":"DeepEP: a deep learning framework for identifying essential proteins","volume":"20","author":"Zeng","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023122808592446100_ref26","first-page":"1615","article-title":"EPGAT: gene essentiality prediction with graph attention networks","volume":"19","author":"Schapke","year":"2022","journal-title":"TCBB"},{"key":"2023122808592446100_ref27","doi-asserted-by":"crossref","first-page":"759","DOI":"10.1093\/bioinformatics\/15.9.759","article-title":"Finding families for genomic ORFans","volume":"15","author":"Fischer","year":"1999","journal-title":"Bioinformatics"},{"key":"2023122808592446100_ref28","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1038\/nrg3053","article-title":"The evolutionary origin of orphan genes","volume":"12","author":"Tautz","year":"2011","journal-title":"Nat Rev Genet"},{"key":"2023122808592446100_ref29","doi-asserted-by":"crossref","first-page":"22071","DOI":"10.1073\/pnas.1900654116","article-title":"Definitions, methods, and applications in interpretable machine learning","volume":"116","author":"Murdoch","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2023122808592446100_ref30","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2023122808592446100_ref31","volume-title":"33rd Conference on Neural Information Processing Systems (NeurIPS)","author":"Ying","year":"2019"},{"key":"2023122808592446100_ref32","first-page":"170","article-title":"Mouse models of human disease an evolutionary perspective","volume":"1","author":"Perlman","year":"2016","journal-title":"Evol Med Public Health"},{"issue":"10","key":"2023122808592446100_ref33","doi-asserted-by":"crossref","first-page":"1512","DOI":"10.1016\/j.humpath.2009.07.003","article-title":"Hep G2 is a hepatoblastoma-derived cell line","volume":"40","author":"L\u00f3pez-Terrada","year":"2009","journal-title":"Hum Pathol"},{"key":"2023122808592446100_ref34","doi-asserted-by":"crossref","first-page":"D998","DOI":"10.1093\/nar\/gkaa884","article-title":"OGEE v3: online GEne essentiality database with increased coverage of organisms and human cell lines","volume":"49","author":"Gurumayum","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023122808592446100_ref35","doi-asserted-by":"crossref","first-page":"D561","DOI":"10.1093\/nar\/gks1114","article-title":"BioGPS and MyGene.info: organizing online, gene-centric information","volume":"41","author":"Wu","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023122808592446100_ref36","volume-title":"5th International Conference on Learning Representations (ICLR)","author":"Kipf","year":"2017"},{"key":"2023122808592446100_ref37","volume-title":"6th International Conference on Learning Representations(ICLR)","author":"Veli\u010dkovi\u0107","year":"2018"},{"key":"2023122808592446100_ref38","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems(NeurIPS)","author":"Hamilton","year":"2018"},{"key":"2023122808592446100_ref39","article-title":"How powerful are graph neural networks?","volume-title":"7th International Conference on Learning Representations (ICLR)","author":"Xu","year":"2019"},{"key":"2023122808592446100_ref40","first-page":"11","article-title":"The development of crystallographic enzymology","volume":"30","author":"Phillips","year":"1970","journal-title":"Biochem Soc Symp"},{"key":"2023122808592446100_ref41","volume-title":"3rd International Conference on Learning Representations (ICLR)","author":"Goodfellow","year":"2015"},{"key":"2023122808592446100_ref42","volume-title":"5th International Conference on Learning Representations (ICLR)","author":"Miyato","year":"2017"},{"key":"2023122808592446100_ref43","first-page":"119","volume-title":"Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Morris","year":"2020"},{"key":"2023122808592446100_ref44","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v36i10.21289","article-title":"Adversarial training for improving model robustness: look at both prediction and interpretation","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chen"},{"key":"2023122808592446100_ref45","volume-title":"31st Annual Conference on Neural Information Processing Systems (NeurIPS)","author":"Vaswani","year":"2017"},{"key":"2023122808592446100_ref46","first-page":"73","volume-title":"Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation","author":"Shu","year":"2015"},{"key":"2023122808592446100_ref47","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.3115\/v1\/D14-1181","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Kim","year":"2014"},{"key":"2023122808592446100_ref48","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2023122808592446100_ref49","doi-asserted-by":"crossref","first-page":"btac779","DOI":"10.1093\/bioinformatics\/btac779","article-title":"DeepCellEss: cell line-specific essential protein prediction with attention-based interpretable deep learning","volume":"39","author":"Li","year":"2023","journal-title":"Bioinformatics"},{"key":"2023122808592446100_ref50","doi-asserted-by":"crossref","first-page":"3263","DOI":"10.1109\/TCBB.2021.3122294","article-title":"Accurate prediction of human essential proteins using ensemble deep learning","volume":"19","author":"Li","year":"2022","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023122808592446100_ref51","first-page":"98","volume-title":"IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Zeng","year":"2021"},{"key":"2023122808592446100_ref52","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1093\/nar\/28.1.231","article-title":"SMART: a web-based tool for the study of genetically mobile domains","volume":"28","author":"Schultz","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023122808592446100_ref53","doi-asserted-by":"crossref","first-page":"1605","DOI":"10.1002\/jcc.20084","article-title":"UCSF chimera - a visualization system for exploratory research and analysis","volume":"25","author":"Pettersen","year":"2004","journal-title":"J Comput Chem"},{"key":"2023122808592446100_ref54","doi-asserted-by":"crossref","first-page":"6411","DOI":"10.1021\/ja01653a051","article-title":"Enzymatic oxidation of uridine diphosphate glucose to uridine diphsphate glucuronic acid","volume":"76","author":"Strominger","year":"1954","journal-title":"J Am Chem Soc"},{"key":"2023122808592446100_ref55","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1093\/genetics\/157.1.237","article-title":"DSP1, an HMG-like protein, is involved in the regulation of homeotic genes","volume":"157","author":"Decoville","year":"2001","journal-title":"Genetics"},{"key":"2023122808592446100_ref56","doi-asserted-by":"crossref","first-page":"1559","DOI":"10.1126\/science.1112014","article-title":"The transcriptional landscape of the mammalian genome","volume":"309","author":"Carninci","year":"2005","journal-title":"Science"},{"key":"2023122808592446100_ref57","first-page":"3733","article-title":"95-Kilodalton B-RAF serine theronine kinase-identificiation of the protein and its major autophosphorylation stie","volume":"12","author":"Stephens","year":"1992","journal-title":"Mol Cell Biol"},{"key":"2023122808592446100_ref58","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1016\/S0168-9525(03)00140-9","article-title":"Human housekeeping genes are compact","volume":"19","author":"Eisenberg","year":"2003","journal-title":"Trends Genet"},{"key":"2023122808592446100_ref59","doi-asserted-by":"crossref","DOI":"10.1016\/j.patter.2021.100390","article-title":"Exploring complex and heterogeneous correlations on hypergraph for the prediction of drug-target interactions","volume":"2","author":"Ruan","year":"2021","journal-title":"Patterns"},{"key":"2023122808592446100_ref60","doi-asserted-by":"crossref","first-page":"5498","DOI":"10.1038\/s41467-022-32980-z","article-title":"Deciphering multi-way interactions in the human genome","volume":"13","author":"Dotson","year":"2022","journal-title":"Nat Commun"},{"key":"2023122808592446100_ref61","doi-asserted-by":"crossref","first-page":"e2206151","DOI":"10.1002\/advs.202206151","article-title":"Explainable deep hypergraph learning modeling the peptide secondary structure prediction","volume":"10","author":"Jiang","year":"2023","journal-title":"Adv Sci"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad472\/54910122\/bbad472.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad472\/54910122\/bbad472.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,4]],"date-time":"2024-11-04T14:13:12Z","timestamp":1730729592000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad472\/7502683"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,22]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad472","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,11,22]]},"article-number":"bbad472"}}