{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:12Z","timestamp":1740185112754,"version":"3.37.3"},"reference-count":65,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,9,22]],"date-time":"2020-09-22T00:00:00Z","timestamp":1600732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01GM074255"],"award-info":[{"award-number":["R01GM074255"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DBI1565107","DBI1917263"],"award-info":[{"award-number":["DBI1565107","DBI1917263"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Procedures for structural modeling of protein\u2013protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein\u2013protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training\/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data\/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availabilityand implementation<\/jats:title>\n                  <jats:p>The code and the datasets generated in this study are available at https:\/\/gitlab.ku.edu\/vakser-lab-public\/text-mining\/-\/tree\/2020-09-04.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa823","type":"journal-article","created":{"date-parts":[[2020,9,9]],"date-time":"2020-09-09T11:11:42Z","timestamp":1599649902000},"page":"497-505","source":"Crossref","is-referenced-by-count":3,"title":["Text mining for modeling of protein complexes enhanced by machine learning"],"prefix":"10.1093","volume":"37","author":[{"given":"Varsha D","family":"Badal","sequence":"first","affiliation":[{"name":"Computational Biology Program"}]},{"given":"Petras J","family":"Kundrotas","sequence":"additional","affiliation":[{"name":"Computational Biology Program"}]},{"given":"Ilya A","family":"Vakser","sequence":"additional","affiliation":[{"name":"Computational Biology Program"},{"name":"The University of Kansas Department of Molecular Biosciences, , Lawrence, KS 66045, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,9,22]]},"reference":[{"key":"2023051800332241700_btaa823-B1","doi-asserted-by":"crossref","first-page":"e1004630","DOI":"10.1371\/journal.pcbi.1004630","article-title":"Text mining for protein docking","volume":"11","author":"Badal","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051800332241700_btaa823-B2","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/s12859-018-2079-4","article-title":"Natural language processing in text mining for structural modeling of protein complexes","volume":"19","author":"Badal","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023051800332241700_btaa823-B3","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Patt. Anal. Mach. Intell"},{"year":"2007","author":"Brants","key":"2023051800332241700_btaa823-B4"},{"first-page":"640","year":"2008","author":"Caporaso","key":"2023051800332241700_btaa823-B5"},{"key":"2023051800332241700_btaa823-B6","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1042\/ETLS20190003","article-title":"New advances in extracting and learning from protein\u2013protein interactions within unstructured biomedical text data","volume":"3","author":"Caufield","year":"2019","journal-title":"Emerg. Top. Life Sci"},{"key":"2023051800332241700_btaa823-B7","doi-asserted-by":"crossref","first-page":"20170387","DOI":"10.1098\/rsif.2017.0387","article-title":"Opportunities and obstacles for deep learning in biology and medicine","volume":"15","author":"Ching","year":"2018","journal-title":"J. R. Soc. Interface"},{"key":"2023051800332241700_btaa823-B8","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1093\/bib\/6.1.57","article-title":"A survey of current work in biomedical text mining","volume":"6","author":"Cohen","year":"2005","journal-title":"Brief. Bioinf"},{"key":"2023051800332241700_btaa823-B9","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1186\/1471-2105-11-492","article-title":"The structural and content aspects of abstracts versus bodies of full text journal articles are different","volume":"11","author":"Cohen","year":"2010","journal-title":"BMC Bioinformatics"},{"first-page":"160","year":"2008","author":"Collobert","key":"2023051800332241700_btaa823-B10"},{"key":"2023051800332241700_btaa823-B11","doi-asserted-by":"crossref","first-page":"3206","DOI":"10.1093\/bioinformatics\/bth386","article-title":"BioRAT: extracting biological information from full-length papers","volume":"20","author":"Corney","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B12","doi-asserted-by":"crossref","first-page":"2012","DOI":"10.1002\/jcc.25381","article-title":"Computational feasibility of an exhaustive search of side-chain conformations in protein\u2013protein docking","volume":"39","author":"Dauzhenka","year":"2018","journal-title":"J. Comput. Chem"},{"first-page":"338","year":"2008","author":"De Marneffe","key":"2023051800332241700_btaa823-B13"},{"key":"2023051800332241700_btaa823-B14","first-page":"1","volume-title":"Association for Computational Linguistics, Manchester, UK","author":"De Marneffe","year":"2008"},{"key":"2023051800332241700_btaa823-B15","article-title":"The BioC-BioGRID corpus: full text articles annotated for curation of protein\u2013protein and genetic interactions","volume":"2017, baw147","author":"Dogan","year":"2017","journal-title":"Database"},{"key":"2023051800332241700_btaa823-B16","doi-asserted-by":"crossref","first-page":"W385","DOI":"10.1093\/nar\/gkn317","article-title":"BioLit: integrating biological literature with databases","volume":"36","author":"Fink","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023051800332241700_btaa823-B17","doi-asserted-by":"crossref","first-page":"S74","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74","article-title":"GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles","volume":"17","author":"Friedman","year":"2001","journal-title":"Bioinformatics"},{"first-page":"72","year":"2010","author":"Gerner","key":"2023051800332241700_btaa823-B18"},{"key":"2023051800332241700_btaa823-B19","doi-asserted-by":"crossref","first-page":"2154","DOI":"10.1093\/bioinformatics\/bts332","article-title":"BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events","volume":"28","author":"Gerner","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B20","doi-asserted-by":"crossref","first-page":"I37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep learning with word embeddings improves biomedical named entity recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B21","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1109\/TCBB.2010.51","article-title":"Efficient extraction of protein\u2013protein interactions from full-text articles","volume":"7","author":"Hakenberg","year":"2010","journal-title":"IEEE-ACM Trans. Comput. Biol. Bioinf"},{"key":"2023051800332241700_btaa823-B22","doi-asserted-by":"crossref","first-page":"3604","DOI":"10.1093\/bioinformatics\/bth451","article-title":"Discovering patterns to extract protein\u2013protein interactions from full texts","volume":"20","author":"Huang","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B23","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1002\/prot.21930","article-title":"The size of the intermolecular energy funnel in protein\u2013protein interactions","volume":"72","author":"Hunjan","year":"2008","journal-title":"Proteins"},{"key":"2023051800332241700_btaa823-B24","first-page":"2096","article-title":"Deep recursive neural networks for compositionality in language","author":"Irsoy","year":"2014"},{"year":"2014","author":"Irsoy","key":"2023051800332241700_btaa823-B25"},{"key":"2023051800332241700_btaa823-B26","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/BFb0026683","volume-title":"Machine Learning: ECML-98","author":"Joachims","year":"1998"},{"key":"2023051800332241700_btaa823-B27","first-page":"169","volume-title":"Advances in Kernel Methods","author":"Joachims","year":"1999"},{"year":"2017","author":"Jurafsky","key":"2023051800332241700_btaa823-B28"},{"key":"2023051800332241700_btaa823-B29","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s2-s4","article-title":"Overview of the protein\u2013protein interaction annotation extraction task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol"},{"key":"2023051800332241700_btaa823-B30","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1002\/pro.3295","article-title":"Dockground: a comprehensive data resource for modeling of protein complexes","volume":"27","author":"Kundrotas","year":"2018","journal-title":"Protein Sci"},{"key":"2023051800332241700_btaa823-B31","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1109\/TCBB.2010.49","article-title":"Empirical investigations into full-text protein interaction Article Categorization Task (ACT) in the BioCreative II. 5 Challenge","volume":"7","author":"Lan","year":"2010","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf. (TCBB)"},{"key":"2023051800332241700_btaa823-B32","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023051800332241700_btaa823-B33","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/j.neucom.2015.11.110","article-title":"A text feature-based approach for literature mining of lncRNA\u2013protein interactions","volume":"206","author":"Li","year":"2016","journal-title":"Neurocomputing"},{"key":"2023051800332241700_btaa823-B34","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1186\/1471-2105-10-46","article-title":"Is searching full text more effective than searching abstracts?","volume":"10","author":"Lin","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051800332241700_btaa823-B35","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1093\/bioinformatics\/btv476","article-title":"Large-scale extraction of gene interactions from full-text literature using DeepDive","volume":"32","author":"Mallory","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B36","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1007\/978-3-540-30478-4_9","volume-title":"Knowledge Exploration in Life Science Informatics","author":"Martin","year":"2004"},{"key":"2023051800332241700_btaa823-B37","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1186\/1471-2105-10-311","article-title":"Challenges for automatically extracting molecular interactions from full-text articles","volume":"10","author":"McIntosh","year":"2009","journal-title":"BMC Bioinformatics"},{"year":"2012","author":"Mikolov","key":"2023051800332241700_btaa823-B38"},{"year":"2013","author":"Mikolov","key":"2023051800332241700_btaa823-B39"},{"key":"2023051800332241700_btaa823-B40","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov","year":"2013","journal-title":"Advances Neural Information Processing Systems"},{"year":"2013","author":"Mikolov","key":"2023051800332241700_btaa823-B41"},{"year":"1999","author":"Morik","key":"2023051800332241700_btaa823-B42"},{"key":"2023051800332241700_btaa823-B43","first-page":"D8","article-title":"Database resources of the National Center for Biotechnology Information","volume":"41","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023051800332241700_btaa823-B44","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.ymeth.2014.10.026","article-title":"Protein\u2013protein interaction predictions using text mining methods","volume":"74","author":"Papanikolaou","year":"2015","journal-title":"Methods"},{"key":"2023051800332241700_btaa823-B45","doi-asserted-by":"crossref","first-page":"baw072","DOI":"10.1093\/database\/baw072","article-title":"BioC-compatible full-text passage detection for protein\u2013protein interactions using extended dependency graph","volume":"2016","author":"Peng","year":"2016","journal-title":"Database"},{"first-page":"1532","year":"2014","author":"Pennington","key":"2023051800332241700_btaa823-B46"},{"key":"2023051800332241700_btaa823-B47","doi-asserted-by":"crossref","first-page":"e4375","DOI":"10.7717\/peerj.4375","article-title":"The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles","volume":"6","author":"Piwowar","year":"2018","journal-title":"PeerJ"},{"key":"2023051800332241700_btaa823-B48","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1007\/978-1-4939-9873-9_2","article-title":"Automated extraction and visualization of protein\u2013protein interaction networks and beyond: a text-mining protocol","volume":"2074","author":"Raja","year":"2020","journal-title":"Methods Mol. Biol. (Clifton, N.J.)"},{"key":"2023051800332241700_btaa823-B49","doi-asserted-by":"crossref","first-page":"e1000597","DOI":"10.1371\/journal.pcbi.1000597","article-title":"Biomedical text mining and its applications","volume":"5","author":"Rodriguez-Esteban","year":"2009","journal-title":"PLoS Comput. Biol"},{"key":"2023051800332241700_btaa823-B50","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"2023051800332241700_btaa823-B51","doi-asserted-by":"crossref","first-page":"2597","DOI":"10.1093\/bioinformatics\/bth291","article-title":"Distribution of information in biomedical abstracts and full-text publications","volume":"20","author":"Schuemie","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051800332241700_btaa823-B52","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1016\/j.csl.2006.09.003","article-title":"Continuous space language models","volume":"21","author":"Schwenk","year":"2007","journal-title":"Comput. Speech Lang"},{"key":"2023051800332241700_btaa823-B53","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/1471-2105-4-20","article-title":"Information extraction from full text scientific articles: where are the keywords?","volume":"4","author":"Shah","year":"2003","journal-title":"BMC Bioinformatics"},{"first-page":"129","year":"2011","author":"Socher","key":"2023051800332241700_btaa823-B54"},{"first-page":"151","year":"2011","author":"Socher","key":"2023051800332241700_btaa823-B55"},{"first-page":"1642","year":"2013","author":"Socher","key":"2023051800332241700_btaa823-B56"},{"key":"2023051800332241700_btaa823-B57","doi-asserted-by":"crossref","first-page":"e1007239","DOI":"10.1371\/journal.pcbi.1007239","article-title":"ProtFus: a comprehensive method characterizing protein\u2013protein interactions of fusion proteins","volume":"15","author":"Tagore","year":"2019","journal-title":"PLoS Comput. Biol"},{"key":"2023051800332241700_btaa823-B58","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1162\/tacl_a_00233","article-title":"Distributional semantics beyond words: supervised learning of analogy and paraphrase","volume":"1","author":"Turney","year":"2013","journal-title":"Trans. Assoc. Comput. Linguist. (TACL)"},{"key":"2023051800332241700_btaa823-B59","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/(SICI)1097-0282(199609)39:3<455::AID-BIP16>3.0.CO;2-A","article-title":"Low-resolution docking: prediction of complexes for underdetermined structures","volume":"39","author":"Vakser","year":"1998","journal-title":"Biopolymers"},{"key":"2023051800332241700_btaa823-B60","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.1016\/j.bpj.2014.08.033","article-title":"Protein\u2013protein docking: from interaction to interactome","volume":"107","author":"Vakser","year":"2014","journal-title":"Biophys. J"},{"key":"2023051800332241700_btaa823-B61","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023051800332241700_btaa823-B62","doi-asserted-by":"crossref","first-page":"e1005962","DOI":"10.1371\/journal.pcbi.1005962","article-title":"A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts","volume":"14","author":"Westergaard","year":"2018","journal-title":"PLoS Comput. Biol"},{"first-page":"2764","year":"2011","author":"Weston","key":"2023051800332241700_btaa823-B63"},{"key":"2023051800332241700_btaa823-B64","doi-asserted-by":"crossref","first-page":"e7126","DOI":"10.7717\/peerj.7126","article-title":"An integration of deep learning with feature embedding for protein\u2013protein interaction prediction","volume":"7","author":"Yao","year":"2019","journal-title":"PeerJ"},{"key":"2023051800332241700_btaa823-B65","article-title":"Automatic extraction of protein\u2013protein interactions using grammatical relationship graph","volume":"18","author":"Yu","year":"2018","journal-title":"BMC Med. Inf. Decis. Mak"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa823\/33780178\/btaa823.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/497\/50359871\/btaa823.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/497\/50359871\/btaa823.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T00:34:41Z","timestamp":1684370081000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/4\/497\/5909988"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,9,22]]},"references-count":65,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa823","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,2,15]]},"published":{"date-parts":[[2020,9,22]]}}}