{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T20:15:03Z","timestamp":1780344903633,"version":"3.54.1"},"reference-count":58,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2024,1,27]],"date-time":"2024-01-27T00:00:00Z","timestamp":1706313600000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National High Level Hospital Clinical Research Funding","award":["2023SF50"],"award-info":[{"award-number":["2023SF50"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32270703"],"award-info":[{"award-number":["32270703"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,1,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The identification of human-herpesvirus protein\u2013protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https:\/\/github.com\/XiaodiYangpku\/MultimodalPPI\/.<\/jats:p>","DOI":"10.1093\/bib\/bbae005","type":"journal-article","created":{"date-parts":[[2024,1,27]],"date-time":"2024-01-27T09:10:20Z","timestamp":1706346620000},"source":"Crossref","is-referenced-by-count":15,"title":["Multi-modal features-based human-herpesvirus protein\u2013protein interaction prediction by using LightGBM"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3229-5865","authenticated-orcid":false,"given":"Xiaodi","family":"Yang","sequence":"first","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Wuchty","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Miami , Miami FL, 33146 , USA"},{"name":"Department of Biology, University of Miami , Miami FL, 33146 , USA"},{"name":"Institute of Data Science and Computation, University of Miami , Miami, FL 33146 , USA"},{"name":"Sylvester Comprehensive Cancer Center, University of Miami , Miami, FL 33136 , USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zeyin","family":"Liang","sequence":"additional","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Li","family":"Ji","sequence":"additional","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bingjie","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jialin","family":"Zhu","sequence":"additional","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9296-571X","authenticated-orcid":false,"given":"Ziding","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Animal Biotech Breeding , College of Biological Sciences, , Beijing 100193 , China"},{"name":"China Agricultural University , College of Biological Sciences, , Beijing 100193 , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yujun","family":"Dong","sequence":"additional","affiliation":[{"name":"Department of Hematology, Peking University First Hospital , Beijing , China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,1,26]]},"reference":[{"key":"2024012709100944500_ref1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511545313","volume-title":"Human Herpesviruses: Biology, Therapy, and Immunoprophylaxis","author":"Arvin","year":"2007"},{"key":"2024012709100944500_ref2","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1002\/ajh.26579","article-title":"EBV-positive diffuse large B-cell lymphoma, not otherwise specified: 2022 update on diagnosis, risk-stratification, and management","volume":"97","author":"Malpica","year":"2022","journal-title":"Am J Hematol"},{"key":"2024012709100944500_ref3","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1038\/s41564-018-0334-0","article-title":"Defective Epstein\u2013Barr virus in chronic active infection and haematological malignancy","volume":"4","author":"Okuno","year":"2019","journal-title":"Nat Microbiol"},{"key":"2024012709100944500_ref4","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1093\/infdis\/jiaa060","article-title":"Kaposi sarcoma-associated herpesvirus infection and endemic Burkitt lymphoma","volume":"222","author":"Oluoch","year":"2020","journal-title":"J Infect Dis"},{"key":"2024012709100944500_ref5","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1093\/infdis\/jix048","article-title":"Cytomegalovirus (CMV) DNA quantitation in bronchoalveolar lavage fluid from hematopoietic stem cell transplant recipients with CMV pneumonia","volume":"215","author":"Boeckh","year":"2017","journal-title":"J Infect Dis"},{"key":"2024012709100944500_ref6","doi-asserted-by":"crossref","DOI":"10.1002\/rmv.1972","article-title":"Human herpesvirus portal proteins: structure, function, and antiviral prospects","volume":"28","author":"Kornfeind","year":"2018","journal-title":"Rev Med Virol"},{"key":"2024012709100944500_ref7","doi-asserted-by":"crossref","DOI":"10.1002\/rmv.2081","article-title":"Immunomodulatory roles of human herpesvirus-encoded microRNA in host-virus interaction","volume":"30","author":"Naqvi","year":"2020","journal-title":"Rev Med Virol"},{"key":"2024012709100944500_ref8","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1038\/s41579-020-00448-w","article-title":"The structural basis of herpesvirus entry","volume":"19","author":"Connolly","year":"2021","journal-title":"Nat Rev Microbiol"},{"key":"2024012709100944500_ref9","doi-asserted-by":"crossref","first-page":"759","DOI":"10.1038\/s41579-021-00582-z","article-title":"Pathogenesis of human cytomegalovirus in the immunocompromised host","volume":"19","author":"Griffiths","year":"2021","journal-title":"Nat Rev Microbiol"},{"key":"2024012709100944500_ref10","doi-asserted-by":"crossref","first-page":"7606","DOI":"10.1073\/pnas.0702332104","article-title":"Epstein-Barr virus and virus human protein interaction maps","volume":"104","author":"Calderwood","year":"2007","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024012709100944500_ref11","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/nature11288","article-title":"Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins","volume":"487","author":"Rozenblatt-Rosen","year":"2012","journal-title":"Nature"},{"key":"2024012709100944500_ref12","doi-asserted-by":"crossref","first-page":"e49894","DOI":"10.7554\/eLife.49894","article-title":"Human cytomegalovirus interactome analysis identifies degradation hubs, domain associations and viral protein functions","volume":"8","author":"Nobre","year":"2019","journal-title":"Elife"},{"key":"2024012709100944500_ref13","doi-asserted-by":"crossref","DOI":"10.1016\/j.celrep.2022.110788","article-title":"KSHV episome tethering sites on host chromosomes and regulation of latency-lytic switch by CHD4","volume":"39","author":"Kumar","year":"2022","journal-title":"Cell Rep"},{"key":"2024012709100944500_ref14","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1016\/j.molcel.2014.11.026","article-title":"Global mapping of herpesvirus-host protein complexes reveals a novel transcription strategy for late genes","volume":"57","author":"Davis","year":"2015","journal-title":"Mol Cell"},{"key":"2024012709100944500_ref15","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature11289","article-title":"Viral immune modulators perturb the human molecular network by common and unique strategies","volume":"487","author":"Pichlmair","year":"2012","journal-title":"Nature"},{"key":"2024012709100944500_ref16","doi-asserted-by":"crossref","first-page":"e1003514","DOI":"10.1371\/journal.ppat.1003514","article-title":"A systematic analysis of host factors reveals a Med23-interferon-\u03bb regulatory axis against herpes simplex virus type 1 replication","volume":"9","author":"Griffiths","year":"2013","journal-title":"PLoS Pathog"},{"key":"2024012709100944500_ref17","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1101\/gr.1774904","article-title":"Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs","volume":"14","author":"Yu","year":"2004","journal-title":"Genome Res"},{"key":"2024012709100944500_ref18","doi-asserted-by":"crossref","first-page":"e1005368","DOI":"10.1371\/journal.pcbi.1005368","article-title":"Identification of entry factors involved in hepatitis C virus infection based on host-mimicking short linear motifs","volume":"13","author":"Chiang","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2024012709100944500_ref19","doi-asserted-by":"crossref","first-page":"1526","DOI":"10.1016\/j.cell.2019.08.005","article-title":"A structure-informed atlas of human-virus interactions","volume":"178","author":"Lasso","year":"2019","journal-title":"Cell"},{"key":"2024012709100944500_ref20","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.csbj.2019.12.005","article-title":"Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method","volume":"18","author":"Yang","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2024012709100944500_ref21","doi-asserted-by":"crossref","first-page":"4771","DOI":"10.1093\/bioinformatics\/btab533","article-title":"Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction","volume":"37","author":"Yang","year":"2021","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref22","doi-asserted-by":"crossref","first-page":"bbab228","DOI":"10.1093\/bib\/bbab228","article-title":"LSTM-PHV: prediction of human-virus protein\u2013protein interactions by LSTM with word2vec","volume":"22","author":"Tsukiyama","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref23","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1093\/bioinformatics\/btab147","article-title":"DeepViral: prediction of novel virus\u2013host interactions from protein sequences and infectious disease phenotypes","volume":"37","author":"Liu-Wei","year":"2021","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref24","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1007\/s40484-020-0222-5","article-title":"Prediction and analysis of human-herpes simplex virus type 1 protein-protein interactions by integrating multiple methods","volume":"8","author":"Lian","year":"2020","journal-title":"Quant Biol"},{"key":"2024012709100944500_ref25","doi-asserted-by":"crossref","first-page":"2322","DOI":"10.1016\/j.csbj.2022.05.017","article-title":"Proteome-wide prediction and analysis of the Cryptosporidium parvum protein\u2013protein interaction network through integrative methods","volume":"20","author":"Ren","year":"2022","journal-title":"Comput Struct Biotechnol J"},{"key":"2024012709100944500_ref26","doi-asserted-by":"crossref","first-page":"bbac125","DOI":"10.1093\/bib\/bbac125","article-title":"deepHPI: a comprehensive deep learning platform for accurate prediction and visualization of host\u2013pathogen protein\u2013protein interactions","volume":"23","author":"Kaundal","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref27","doi-asserted-by":"crossref","first-page":"bbad020","DOI":"10.1093\/bib\/bbad020","article-title":"SGPPI: structure-aware prediction of protein\u2013protein interactions in rigorous conditions with graph convolutional network","volume":"24","author":"Huang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref28","article-title":"Deep learning-powered prediction of human-virus protein-protein interactions","volume":"13","author":"Yang","year":"2022","journal-title":"Front Microbiol"},{"key":"2024012709100944500_ref29","first-page":"1188","article-title":"Distributed representations of sentences and documents","volume":"14","author":"Le","year":"2014","journal-title":"Proc Int Conf Mach Learn"},{"key":"2024012709100944500_ref30","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1093\/bioinformatics\/bty914","article-title":"Bastion3: a two-layer ensemble predictor of type III secreted effectors","volume":"35","author":"Wang","year":"2019","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref31","doi-asserted-by":"crossref","first-page":"bbac244","DOI":"10.1093\/bib\/bbac244","article-title":"TSNAPred: predicting type-specific nucleic acid binding residues via an ensemble approach","volume":"23","author":"Nie","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref32","doi-asserted-by":"crossref","first-page":"bbab046","DOI":"10.1093\/bib\/bbab046","article-title":"PreDTIs:prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques","volume":"22","author":"Mahmud","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref33","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"UniProt: the Universal Protein Knowledgebase in 2023","volume":"51","author":"Consortium TU","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref34","doi-asserted-by":"crossref","first-page":"D648","DOI":"10.1093\/nar\/gkab1006","article-title":"The IntAct database: efficient access to fine-grained molecular interaction data","volume":"50","author":"Toro","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref35","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/pro.3978","article-title":"The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions","volume":"30","author":"Oughtred","year":"2021","journal-title":"Protein Sci"},{"key":"2024012709100944500_ref36","doi-asserted-by":"crossref","first-page":"D583","DOI":"10.1093\/nar\/gku1121","article-title":"VirHostNet 2.0: surfing on the web of virus\/host molecular interactions data","volume":"43","author":"Guirimand","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref37","doi-asserted-by":"crossref","first-page":"D588","DOI":"10.1093\/nar\/gku830","article-title":"VirusMentha: a new resource for virus-host protein interactions","volume":"43","author":"Calderone","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref38","doi-asserted-by":"crossref","first-page":"D330","DOI":"10.1093\/nar\/gky1055","article-title":"The gene ontology resource: 20 years and still GOing strong","volume":"47","author":"The Gene Ontology Consortium","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref39","doi-asserted-by":"crossref","first-page":"baw103","DOI":"10.1093\/database\/baw103","article-title":"HPIDB 2.0: a curated database for host-pathogen interactions","volume":"2016","author":"Ammari","year":"2016","journal-title":"Database"},{"key":"2024012709100944500_ref40","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1093\/bioinformatics\/btv737","article-title":"DeNovo: virus-host sequence-based protein-protein interaction prediction","volume":"32","author":"Eid","year":"2016","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref41","doi-asserted-by":"crossref","first-page":"1945","DOI":"10.1093\/bioinformatics\/btv077","article-title":"Evolutionary profiles improve protein-protein interaction prediction from sequence","volume":"31","author":"Hamp","year":"2015","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref42","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1038\/nmeth.2259","article-title":"A flaw in the typical evaluation scheme for pair-input computational predictions","volume":"9","author":"Park","year":"2012","journal-title":"Nat Methods"},{"key":"2024012709100944500_ref43","doi-asserted-by":"crossref","first-page":"2642","DOI":"10.1093\/bioinformatics\/bty178","article-title":"Learned protein embeddings for machine learning","volume":"34","author":"Yang","year":"2018","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref44","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2024012709100944500_ref45","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PloS One"},{"key":"2024012709100944500_ref46","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1093\/bib\/bbaa425","article-title":"HVIDB: a comprehensive database for human-virus protein-protein interactions","volume":"22","author":"Yang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024012709100944500_ref47","first-page":"45","article-title":"Software framework for topic modelling with large corpora","volume-title":"Conference: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks","author":"Rehurek","year":"2010"},{"key":"2024012709100944500_ref48","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann Stat"},{"key":"2024012709100944500_ref49","first-page":"3146","article-title":"LightGBM: a highly efficient gradient boosting decision tree","volume-title":"31st Conference on Neural Information Processing Systems","author":"Ke","year":"2017"},{"key":"2024012709100944500_ref50","doi-asserted-by":"crossref","first-page":"D587","DOI":"10.1093\/nar\/gkac963","article-title":"KEGG for taxonomy-based analysis of pathways and genomes","volume":"51","author":"Kanehisa","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024012709100944500_ref51","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0011796","article-title":"Viral organization of human proteins","volume":"5","author":"Wuchty","year":"2010","journal-title":"PloS One"},{"key":"2024012709100944500_ref52","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1002\/rmv.299","article-title":"Human herpesvirus-6 and -7 in transplantation","volume":"11","author":"Dockrell","year":"2001","journal-title":"Rev Med Virol"},{"key":"2024012709100944500_ref53","doi-asserted-by":"crossref","DOI":"10.1371\/journal.ppat.1010478","article-title":"Regulation of EBNA1 protein stability and DNA replication activity by PLOD1 lysine hydroxylase","volume":"19","author":"Dheekollu","year":"2023","journal-title":"PLoS Pathog"},{"key":"2024012709100944500_ref54","doi-asserted-by":"crossref","first-page":"1732","DOI":"10.1038\/s41564-023-01433-8","article-title":"Spatially resolved protein map of intact human cytomegalovirus virions","volume":"8","author":"Bogdanow","year":"2023","journal-title":"Nat Microbiol"},{"key":"2024012709100944500_ref55","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024012709100944500_ref56","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2024012709100944500_ref57","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1038\/s42256-023-00741-2","article-title":"Protein\u2013protein contact prediction by geometric triangle-aware protein language models","volume":"5","author":"Lin","year":"2023","journal-title":"Nat Mach Intell"},{"key":"2024012709100944500_ref58","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1038\/s41587-022-01618-2","article-title":"Large language models generate functional protein sequences across diverse families","volume":"41","author":"Madani","year":"2023","journal-title":"Nat Biotechnol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/2\/bbae005\/56428122\/bbae005.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/2\/bbae005\/56428122\/bbae005.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,27]],"date-time":"2024-01-27T09:10:51Z","timestamp":1706346651000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae005\/7590318"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":58,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,1,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae005","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,3,1]]},"published":{"date-parts":[[2024,1,22]]},"article-number":"bbae005"}}