{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T13:47:23Z","timestamp":1767275243160},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: In an infectious disease, the pathogen's strategy to enter the host organism and breach its immune defenses often involves interactions between the host and pathogen proteins. Currently, the experimental data on host\u2013pathogen interactions (HPIs) are scattered across multiple databases, which are often specialized to target a specific disease or host organism. An accurate and efficient method for the automated extraction of HPIs from biomedical literature is crucial for creating a unified repository of HPI data.<\/jats:p>\n               <jats:p>Results: Here, we introduce and compare two new approaches to automatically detect whether the title or abstract of a PubMed publication contains HPI data, and extract the information about organisms and proteins involved in the interaction. The first approach is a feature-based supervised learning method using support vector machines (SVMs). The SVM models are trained on the features derived from the individual sentences. These features include names of the host\/pathogen organisms and corresponding proteins or genes, keywords describing HPI-specific information, more general protein\u2013protein interaction information, experimental methods and other statistical information. The language-based method employed a link grammar parser combined with semantic patterns derived from the training examples. The approaches have been trained and tested on manually curated HPI data. When compared to a na\u00efve approach based on the existing protein\u2013protein interaction literature mining method, our approaches demonstrated higher accuracy and recall in the classification task. The most accurate, feature-based, approach achieved 66\u201373% accuracy, depending on the test protocol.<\/jats:p>\n               <jats:p>Availability: Both approaches are available through PHILM web-server: http:\/\/korkinlab.org\/philm.html<\/jats:p>\n               <jats:p>Contact: \u00a0korkin@korkinlab.org<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts042","type":"journal-article","created":{"date-parts":[[2012,1,28]],"date-time":"2012-01-28T05:35:03Z","timestamp":1327728903000},"page":"867-875","source":"Crossref","is-referenced-by-count":34,"title":["Literature mining of host\u2013pathogen interactions: comparing feature-based supervised learning and language-based approaches"],"prefix":"10.1093","volume":"28","author":[{"given":"Thanh","family":"Thieu","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."}]},{"given":"Sneha","family":"Joshi","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."}]},{"given":"Samantha","family":"Warren","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."}]},{"given":"Dmitry","family":"Korkin","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."},{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."},{"name":"1 Department of Computer Science, 2MU Informatics Institute and 3Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA."}]}],"member":"286","published-online":{"date-parts":[[2012,1,27]]},"reference":[{"key":"2023012512202414400_B1","first-page":"54","article-title":"IntEx: a syntactic role driven protein-protein interaction extractor for bio-medical text","volume-title":"Proceedings of the ACL-ISMB Workshop on Linking Biological Literature. Ontologies and Databases: Mining Biological Semantics.","author":"Ahmed","year":"2005"},{"key":"2023012512202414400_B2","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1038\/280361a0","article-title":"Population biology of infectious diseases: Part I","volume":"280","author":"Anderson","year":"1979","journal-title":"Nature"},{"key":"2023012512202414400_B3","doi-asserted-by":"crossref","first-page":"D525","DOI":"10.1093\/nar\/gkp878","article-title":"The IntAct molecular interaction database in 2010","volume":"38","author":"Aranda","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B4","doi-asserted-by":"crossref","first-page":"D154","DOI":"10.1093\/nar\/gki070","article-title":"The Universal Protein Resource (UniProt)","volume":"33","author":"Bairoch","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B5","first-page":"123","article-title":"The potential use of SUISEKI as a protein interaction discovery tool","author":"Blaschke","year":"2001","journal-title":"Genome Inform."},{"key":"2023012512202414400_B6","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/nar\/gkg095","article-title":"The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003","volume":"31","author":"Boeckmann","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B7","doi-asserted-by":"crossref","first-page":"D532","DOI":"10.1093\/nar\/gkp983","article-title":"MINT, the molecular interaction database: 2009 update","volume":"38","author":"Ceol","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B8","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-7-41","article-title":"Discovering semantic features in the literature: a foundation for building functional associations","volume":"7","author":"Chagoyen","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012512202414400_B9","doi-asserted-by":"crossref","first-page":"3206","DOI":"10.1093\/bioinformatics\/bth386","article-title":"BioRAT: extracting biological information from full-length papers","volume":"20","author":"Corney","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B10","doi-asserted-by":"crossref","first-page":"2585","DOI":"10.1110\/ps.073228407","article-title":"Host pathogen protein interactions predicted by comparative modeling","volume":"16","author":"Davis","year":"2007","journal-title":"Protein Sci."},{"key":"2023012512202414400_B11","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-4-11","article-title":"PreBIND and Textomy\u2013mining the biomedical literature for protein-protein interactions using a support vector machine","volume":"4","author":"Donaldson","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012512202414400_B12","doi-asserted-by":"crossref","first-page":"D647","DOI":"10.1093\/nar\/gkn799","article-title":"PIG\u2013the pathogen interaction gateway","volume":"37","author":"Driscoll","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B13","doi-asserted-by":"crossref","first-page":"i159","DOI":"10.1093\/bioinformatics\/btm208","article-title":"Computational prediction of host-pathogen protein-protein interactions","volume":"23","author":"Dyer","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B14","doi-asserted-by":"crossref","first-page":"e12089","DOI":"10.1371\/journal.pone.0012089","article-title":"The human-bacterial pathogen protein interaction networks of Bacillus anthracis, Francisella tularensis, and Yersinia pestis","volume":"5","author":"Dyer","year":"2010","journal-title":"PLoS ONE"},{"key":"2023012512202414400_B15","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7287.001.0001","article-title":"WordNet : an Electronic Lexical Database","volume-title":"Language, speech, and communication.","author":"Fellbaum","year":"1998"},{"key":"2023012512202414400_B16","doi-asserted-by":"crossref","first-page":"10538","DOI":"10.1073\/pnas.1101440108","article-title":"Structural principles within the human-virus protein-protein interaction network","volume":"108","author":"Franzosa","year":"2011","journal-title":"Proc. Natl Acad. Sci.."},{"key":"2023012512202414400_B17","doi-asserted-by":"crossref","first-page":"S74","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74","article-title":"GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles","volume":"17","author":"Friedman","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B18","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/bioinformatics\/btl616","article-title":"RelEx\u2013relation extraction using dependency parse trees","volume":"23","author":"Fundel","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B19","doi-asserted-by":"crossref","first-page":"3294","DOI":"10.1093\/bioinformatics\/bti493","article-title":"Discovering patterns to extract protein-protein interactions from the literature: Part II","volume":"21","author":"Hao","year":"2005","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"2023012512202414400_B20","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of BioCreAtIvE: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012512202414400_B21","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/0024-3841(78)90006-2","article-title":"Resolving pronoun references","volume":"44","author":"Hobbs","year":"1978","journal-title":"Lingua"},{"key":"2023012512202414400_B22","doi-asserted-by":"crossref","first-page":"pe21","DOI":"10.1126\/stke.2832005pe21","article-title":"Text mining for metabolic pathways, signaling cascades, and protein networks","volume":"2005","author":"Hoffmann","year":"2005","journal-title":"Sci. STKE"},{"issue":"Suppl 2","key":"2023012512202414400_B23","doi-asserted-by":"crossref","first-page":"ii252","DOI":"10.1093\/bioinformatics\/bti1142","article-title":"Implementing the iHOP concept for navigation of biomedical literature","volume":"21","author":"Hoffmann","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B24","doi-asserted-by":"crossref","first-page":"2759","DOI":"10.1093\/bioinformatics\/bti390","article-title":"Literature mining and database annotation of protein phosphorylation using a rule-based system","volume":"21","author":"Hu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B25","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/gb-2008-9-s2-s12","article-title":"Mining physical protein-protein interactions from the literature","volume":"9","author":"Huang","year":"2008","journal-title":"Genome Biol."},{"key":"2023012512202414400_B26","doi-asserted-by":"crossref","first-page":"W411","DOI":"10.1093\/nar\/gkn281","article-title":"PIE: an online prediction system for protein-protein interactions from text","volume":"36","author":"Kim","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B27","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.cell.2008.07.032","article-title":"Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication","volume":"135","author":"Konig","year":"2008","journal-title":"Cell"},{"issue":"Suppl 2","key":"2023012512202414400_B28","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s2-s4","article-title":"Overview of the protein-protein interaction annotation extraction task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"issue":"Suppl 2","key":"2023012512202414400_B29","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/gb-2008-9-s2-s1","article-title":"Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"key":"2023012512202414400_B30","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1186\/gb-2005-6-7-224","article-title":"Text-mining and information-retrieval services for molecular biology","volume":"6","author":"Krallinger","year":"2005","journal-title":"Genome Biol."},{"key":"2023012512202414400_B31","doi-asserted-by":"crossref","first-page":"S16","DOI":"10.1186\/1471-2105-11-S6-S16","article-title":"HPIDB-a unified resource for host-pathogen interactions","volume":"11","author":"Kumar","year":"2010","journal-title":"BMC Bioinformatics."},{"key":"2023012512202414400_B32","doi-asserted-by":"crossref","first-page":"W416","DOI":"10.1093\/nar\/gkn286","article-title":"E3Miner: a text mining tool for ubiquitin-protein ligases","volume":"36","author":"Lee","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B33","first-page":"350","article-title":"Filling preposition-based templates to capture information from medical abstracts","volume":"2002","author":"Leroy","year":"2002","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012512202414400_B34","first-page":"205","article-title":"New and emerging infectious diseases","volume":"109","author":"Mandell","year":"1998","journal-title":"Trans. Am. Clin. Climatol. Assoc."},{"key":"2023012512202414400_B35","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1093\/bioinformatics\/17.4.359","article-title":"Mining literature for protein-protein interactions","volume":"17","author":"Marcotte","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B36","doi-asserted-by":"crossref","first-page":"i241","DOI":"10.1093\/bioinformatics\/bth904","article-title":"Protein names precisely peeled off free text","volume":"20","author":"Mika","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B37","first-page":"15","article-title":"Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions","volume-title":"International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA).","author":"Pyysalo","year":"2004"},{"key":"2023012512202414400_B38","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-7-S3-S2","article-title":"Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches","volume":"7","author":"Pyysalo","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012512202414400_B39","doi-asserted-by":"crossref","first-page":"e1000597","DOI":"10.1371\/journal.pcbi.1000597","article-title":"Biomedical text mining and its applications","volume":"5","author":"Rodriguez-Esteban","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012512202414400_B40","doi-asserted-by":"crossref","first-page":"D449","DOI":"10.1093\/nar\/gkh086","article-title":"The Database of Interacting Proteins: 2004 update","volume":"32","author":"Salwinski","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B41","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1093\/bioinformatics\/bti165","article-title":"Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction","volume":"21","author":"Santos","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B42","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1016\/j.ipm.2004.02.006","article-title":"A hybrid approach to protein name identification in biomedical texts","volume":"41","author":"Seki","year":"2005","journal-title":"Inform. Process. Manag."},{"key":"2023012512202414400_B43","doi-asserted-by":"crossref","first-page":"1410","DOI":"10.1093\/bioinformatics\/btm115","article-title":"SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data","volume":"23","author":"Shatkay","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512202414400_B44","first-page":"91","article-title":"Parsing English with a Link Grammar","volume-title":"Third International Workshop on Parsing Technologies.","author":"Sleator","year":"1995"},{"key":"2023012512202414400_B45","first-page":"483","article-title":"Detecting gene relations from Medline abstracts","volume":"2001","author":"Stephens","year":"2001","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012512202414400_B46","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-6-S1-S3","article-title":"GENETAG: a tagged corpus for gene\/protein named entity recognition","volume":"6","author":"Tanabe","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012512202414400_B47","article-title":"Statistical learning theory","volume-title":"Adaptive and Learning Systems for Signal Processing, Communications, and Control.","author":"Vapnik","year":"1998"},{"key":"2023012512202414400_B48","doi-asserted-by":"crossref","first-page":"D173","DOI":"10.1093\/nar\/gkj158","article-title":"Database resources of the National Center for Biotechnology information","volume":"34","author":"Wheeler","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B49","doi-asserted-by":"crossref","first-page":"D572","DOI":"10.1093\/nar\/gkm858","article-title":"PHI-base update: additions to the pathogen host interaction database","volume":"36","author":"Winnenburg","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512202414400_B50","doi-asserted-by":"crossref","first-page":"2228","DOI":"10.1016\/j.eswa.2007.12.014","article-title":"BioPPIExtractor: a protein-protein interaction extraction system for biomedical literature","volume":"36","author":"Yang","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"2023012512202414400_B51","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.artmed.2010.04.003","article-title":"Document classification for mining host pathogen protein-protein interactions","volume":"49","author":"Yin","year":"2010","journal-title":"Artif. Intell. Med."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/867\/48879700\/bioinformatics_28_6_867.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/867\/48879700\/bioinformatics_28_6_867.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:38:39Z","timestamp":1674661119000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/6\/867\/311962"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1,27]]},"references-count":51,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2012,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts042","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,3,15]]},"published":{"date-parts":[[2012,1,27]]}}}