{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T11:01:50Z","timestamp":1781089310985,"version":"3.54.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T00:00:00Z","timestamp":1624406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19H04208"],"award-info":[{"award-number":["19H04208"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19F19377"],"award-info":[{"award-number":["19F19377"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Viral infection involves a large number of protein\u2013protein interactions (PPIs) between human and virus. The PPIs range from the initial binding of viral coat proteins to host membrane receptors to the hijacking of host transcription machinery. However, few interspecies PPIs have been identified, because experimental methods including mass spectrometry are time-consuming and expensive, and molecular dynamic simulation is limited only to the proteins whose 3D structures are solved. Sequence-based machine learning methods are expected to overcome these problems. We have first developed the LSTM model with word2vec to predict PPIs between human and virus, named LSTM-PHV, by using amino acid sequences alone. The LSTM-PHV effectively learnt the training data with a highly imbalanced ratio of positive to negative samples and achieved AUCs of 0.976 and 0.973 and accuracies of 0.984 and 0.985 on the training and independent datasets, respectively. In predicting PPIs between human and unknown or new virus, the LSTM-PHV learned greatly outperformed the existing state-of-the-art PPI predictors. Interestingly, learning of only sequence contexts as words is sufficient for PPI prediction. Use of uniform manifold approximation and projection demonstrated that the LSTM-PHV clearly distinguished the positive PPI samples from the negative ones. We presented the LSTM-PHV online web server and support data that are freely available at http:\/\/kurata35.bio.kyutech.ac.jp\/LSTM-PHV.<\/jats:p>","DOI":"10.1093\/bib\/bbab228","type":"journal-article","created":{"date-parts":[[2021,5,25]],"date-time":"2021-05-25T15:12:12Z","timestamp":1621955532000},"source":"Crossref","is-referenced-by-count":121,"title":["LSTM-PHV: prediction of human-virus protein\u2013protein interactions by LSTM with word2vec"],"prefix":"10.1093","volume":"22","author":[{"given":"Sho","family":"Tsukiyama","sequence":"first","affiliation":[{"name":"Department of Interdisciplinary Informatics in the Kyushu Institute of Technology, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Md Mehedi","family":"Hasan","sequence":"additional","affiliation":[{"name":"Tulane University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Satoshi","family":"Fujii","sequence":"additional","affiliation":[{"name":"Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hiroyuki","family":"Kurata","sequence":"additional","affiliation":[{"name":"Department of Bioscience and Bioinformatics in the Kyushu Institute of Technology, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,6,23]]},"reference":[{"key":"2021110815065069000_ref1","volume-title":"Coronavirus disease (covid-19) situation dashboard","author":"World Health Organization"},{"key":"2021110815065069000_ref2","doi-asserted-by":"crossref","first-page":"e00303","DOI":"10.1128\/mSystems.00303-18","article-title":"Understanding human-virus protein-protein interactions using a human protein complex-based analysis framework","volume":"4","author":"Yang","year":"2019","journal-title":"mSystems"},{"issue":"2","key":"2021110815065069000_ref3","doi-asserted-by":"crossref","first-page":"e32","DOI":"10.1371\/journal.ppat.0040032","article-title":"The landscape of human proteins interacting with viruses and other pathogens","volume":"4","author":"Dyer","year":"2008","journal-title":"PLoS Pathog"},{"issue":"3","key":"2021110815065069000_ref4","doi-asserted-by":"crossref","first-page":"e42","DOI":"10.1371\/journal.pcbi.0030042","article-title":"Deciphering protein-protein interactions. Part I. experimental techniques and databases","volume":"3","author":"Shoemaker","year":"2007","journal-title":"PLoS Comput Biol"},{"issue":"8","key":"2021110815065069000_ref5","doi-asserted-by":"crossref","first-page":"4569","DOI":"10.1073\/pnas.061034498","article-title":"A comprehensive two-hybrid analysis to explore the yeast protein interactome","volume":"98","author":"Ito","year":"2001","journal-title":"Proc Natl Acad Sci"},{"issue":"6","key":"2021110815065069000_ref6","doi-asserted-by":"crossref","first-page":"454","DOI":"10.2174\/1389202921999200625103936","article-title":"Evolution of sequence-based bioinformatics tools for protein-protein interaction prediction","volume":"21","author":"Khatun","year":"2020","journal-title":"Curr Genomics"},{"issue":"1","key":"2021110815065069000_ref7","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1186\/s12859-016-1035-4","article-title":"Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding","volume":"17","author":"Huang","year":"2016","journal-title":"BMC Bioinformatics"},{"issue":"12","key":"2021110815065069000_ref8","doi-asserted-by":"crossref","first-page":"1945","DOI":"10.1093\/bioinformatics\/btv077","article-title":"Evolutionary profiles improve protein-protein interaction prediction from sequence","volume":"31","author":"Hamp","year":"2015","journal-title":"Bioinformatics"},{"key":"2021110815065069000_ref9","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1093\/bioinformatics\/btv737","article-title":"DeNovo: virus-host sequence-based protein-protein interaction prediction","volume":"32","author":"Eid","year":"2016","journal-title":"Bioinformatics"},{"key":"2021110815065069000_ref10","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.jmb.2004.02.040","article-title":"ProMate: a structure based prediction program to identify the location of protein-protein binding sites","volume":"338","author":"Neuvirth","year":"2004","journal-title":"J Mol Biol"},{"key":"2021110815065069000_ref11","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1186\/s12864-018-4924-2","article-title":"A generalized approach to predicting protein-protein interactions between virus and host","volume":"19","year":"2018","journal-title":"BMC Genomics"},{"key":"2021110815065069000_ref12","doi-asserted-by":"publisher","DOI":"10.1101\/2021.02.16.431420","article-title":"Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction","volume-title":"bioRxiv","author":"Yang","year":"2021"},{"key":"2021110815065069000_ref13","article-title":"Protein-protein interactions prediction using a novel local conjoint triad descriptor of amino acid sequences","volume":"18","year":"2017","journal-title":"Int J Mol Sci"},{"issue":"9","key":"2021110815065069000_ref14","doi-asserted-by":"crossref","first-page":"3025","DOI":"10.1093\/nar\/gkn159","article-title":"Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences","volume":"36","author":"Guo","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2021110815065069000_ref15","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1109\/BIBE.2018.00030","volume-title":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","author":"Khatun","year":"2018"},{"key":"2021110815065069000_ref16","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.csbj.2019.12.005","article-title":"Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method","volume":"18","author":"Yang","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"issue":"8","key":"2021110815065069000_ref17","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2021110815065069000_ref18","doi-asserted-by":"crossref","first-page":"baw103","DOI":"10.1093\/database\/baw103","article-title":"HPIDB 2.0: a curated database for host-pathogen interactions","volume":"2016","author":"Ammari","year":"2016","journal-title":"Database (Oxford)"},{"issue":"D1","key":"2021110815065069000_ref19","doi-asserted-by":"crossref","first-page":"D841","DOI":"10.1093\/nar\/gkr1088","article-title":"The IntAct molecular interaction database in 2012","volume":"40","author":"Kerrien","year":"2012","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2021110815065069000_ref20","doi-asserted-by":"crossref","first-page":"D583","DOI":"10.1093\/nar\/gku1121","article-title":"VirHostNet 2.0: surfing on the web of virus\/host molecular interactions data","volume":"43","author":"Guirimand","year":"2015","journal-title":"Nucleic Acids Res"},{"issue":"23","key":"2021110815065069000_ref21","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"issue":"5","key":"2021110815065069000_ref22","first-page":"438","article-title":"Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins","volume":"43","author":"Dey","year":"2020","journal-title":"Biom J"},{"issue":"D1","key":"2021110815065069000_ref23","doi-asserted-by":"crossref","first-page":"D158","DOI":"10.1093\/nar\/gkw1099","article-title":"UniProt: the universal protein knowledgebase","volume":"45","author":"The UniProt Consortium","year":"2017","journal-title":"Nucleic Acids Res"},{"issue":"90001","key":"2021110815065069000_ref24","doi-asserted-by":"crossref","first-page":"D535","DOI":"10.1093\/nar\/gkj109","article-title":"BioGRID: a general repository for interaction datasets","volume":"34","author":"Stark","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2021110815065069000_ref25","first-page":"1301.3781","article-title":"Efficient estimation of word representations in vector space","volume-title":"arXiv","author":"Mikolov","year":"2013"},{"key":"2021110815065069000_ref26","first-page":"1188","article-title":"Distributed representations of sentences and documents","volume":"31","author":"Le","year":"2014","journal-title":"International Conference on International Conference on Machine Learning"},{"key":"2021110815065069000_ref27","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov","year":"2013"},{"issue":"12","key":"2021110815065069000_ref28","doi-asserted-by":"crossref","first-page":"2009","DOI":"10.1093\/bioinformatics\/bty937","article-title":"Identifying antimicrobial peptides using word embedding with deep recurrent neural networks","volume":"35","author":"Hamid","year":"2019","journal-title":"Bioinformatics"},{"issue":"1","key":"2021110815065069000_ref29","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1186\/s12859-019-3006-z","article-title":"PTPD: predicting therapeutic peptides by deep learning and word2vec","volume":"20","author":"Wu","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2021110815065069000_ref30","first-page":"45","volume-title":"Proceedings of LREC 2010 Workshop on New Challenges for NLP Frameworks","author":"\u0158eh\u016f\u0159ek","year":"2010"},{"key":"2021110815065069000_ref31","article-title":"Sequence to sequence learning with neural networks","volume-title":"arXiv","author":"Sutskever","year":"2014"},{"key":"2021110815065069000_ref32","volume-title":"NIPS 2017 Workshop on Autodiff","author":"Paszke","year":"2017"},{"key":"2021110815065069000_ref33","article-title":"On the variance of the adaptive learning rate and beyond","volume-title":"arXiv","author":"Liu","year":"2019"},{"key":"2021110815065069000_ref34","article-title":"Class-balanced loss based on effective number of samples","year":"2019"},{"key":"2021110815065069000_ref35","first-page":"2825\u201330","article-title":"Scikitlearn: machine learning in python","volume":"12","author":"Pedregosa","year":"2012","journal-title":"J Mach Learn Res"},{"key":"2021110815065069000_ref36","first-page":"861","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","volume-title":"J. Open Source Softw","author":"McInnes","year":"2018"},{"issue":"24","key":"2021110815065069000_ref37","doi-asserted-by":"crossref","first-page":"3745","DOI":"10.1093\/bioinformatics\/btw560","article-title":"Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types","volume":"32","author":"Lin","year":"2016","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2021110815065069000_ref38","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.ab.2014.12.009","article-title":"iDNA-methyl: identifying DNA methylation sites via pseudo trinucleotide composition","volume":"474","author":"Liu","year":"2015","journal-title":"Anal Biochem"},{"issue":"1","key":"2021110815065069000_ref39","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2019","journal-title":"Nat Biotechnol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab228\/41088941\/bbab228.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab228\/41088941\/bbab228.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T10:11:51Z","timestamp":1636366311000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab228\/6308200"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,23]]},"references-count":39,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab228","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.02.26.432975","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,6,23]]},"article-number":"bbab228"}}