{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:31:53Z","timestamp":1772119913348,"version":"3.50.1"},"reference-count":65,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2017,12,26]],"date-time":"2017-12-26T00:00:00Z","timestamp":1514246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1218393"],"award-info":[{"award-number":["IIS-1218393"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1514204"],"award-info":[{"award-number":["IIS-1514204"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>The overwhelming amount of research articles in the domain of bio-medicine might cause important connections to remain unnoticed. Literature Based Discovery is a sub-field within biomedical text mining that peruses these articles to formulate high confident hypotheses on possible connections between medical concepts. Although many alternate methodologies have been proposed over the last decade, they still suffer from scalability issues. The primary reason, apart from the dense inter-connections between biological concepts, is the absence of information on the factors that lead to the edge-formation. In this work, we formulate this problem as a collaborative filtering task and leverage a relatively new concept of word-vectors to learn and mimic the implicit edge-formation process. Along with single-class classifier, we prune the search-space of redundant and irrelevant hypotheses to increase the efficiency of the system and at the same time maintaining and in some cases even boosting the overall accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We show that our proposed framework is able to prune up to 90% of the hypotheses while still retaining high recall in top-K results. This level of efficiency enables the discovery algorithm to look for higher-order hypotheses, something that was infeasible until now. Furthermore, the generic formulation allows our approach to be agile to perform both open and closed discovery. We also experimentally validate that the core data-structures upon which the system bases its decision has a high concordance with the opinion of the experts.This coupled with the ability to understand the edge formation process provides us with interpretable results without any manual intervention.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The relevant JAVA codes are available at: https:\/\/github.com\/vishrawas\/Medline\u2013Code_v2.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx837","type":"journal-article","created":{"date-parts":[[2017,12,23]],"date-time":"2017-12-23T13:16:01Z","timestamp":1514034961000},"page":"2103-2115","source":"Crossref","is-referenced-by-count":19,"title":["Towards self-learning based hypotheses generation in biomedical text domain"],"prefix":"10.1093","volume":"34","author":[{"given":"Vishrawas","family":"Gopalakrishnan","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA"}]},{"given":"Kishlay","family":"Jha","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA"}]},{"given":"Guangxu","family":"Xun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA"}]},{"given":"Hung Q","family":"Ngo","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA"}]},{"given":"Aidong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,12,26]]},"reference":[{"key":"2023012810011140200_btx837-B1","first-page":"17","author":"Aronson","year":"2001"},{"key":"2023012810011140200_btx837-B2","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio","year":"2003","journal-title":"J. Machine Learn. Res"},{"key":"2023012810011140200_btx837-B3","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1056\/NEJMoa1002853","article-title":"Effectiveness of sensor-augmented insulin-pump therapy in type 1 diabetes","volume":"363","author":"Bergenstal","year":"2010","journal-title":"N. Engl. J. Med"},{"key":"2023012810011140200_btx837-B4","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1186\/1743-0003-2-2","article-title":"Advances in wearable technology and applications in physical medicine and rehabilitation","volume":"2","author":"Bonato","year":"2005","journal-title":"J. Neuroeng. Rehab"},{"key":"2023012810011140200_btx837-B5","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1016\/j.jbi.2015.01.014","article-title":"Context-driven automatic subgraph creation for literature-based discovery","volume":"54","author":"Cameron","year":"2015","journal-title":"J. Biomed. Inform"},{"key":"2023012810011140200_btx837-B6","author":"Chiu","year":"2016"},{"key":"2023012810011140200_btx837-B7","first-page":"371","author":"Choi","year":"2003"},{"key":"2023012810011140200_btx837-B8","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1093\/bib\/6.1.57","article-title":"A survey of current work in biomedical text mining","volume":"6","author":"Cohen","year":"2005","journal-title":"Brief. Bioinformatics"},{"key":"2023012810011140200_btx837-B9","doi-asserted-by":"crossref","first-page":"21","DOI":"10.5210\/disco.v5i0.3090","article-title":"EpiphaNet: an interactive tool to support biomedical discoveries","volume":"5","author":"Cohen","year":"2010","journal-title":"J. Biomed. Discov. Collab"},{"key":"2023012810011140200_btx837-B10","first-page":"2493","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert","year":"2011","journal-title":"J. Machine Learn. Res"},{"key":"2023012810011140200_btx837-B11","doi-asserted-by":"crossref","first-page":"113037.","DOI":"10.1088\/1367-2630\/17\/11\/113037","article-title":"Common neighbours and the local-community-paradigm for topological link prediction in bipartite networks","volume":"17","author":"Daminelli","year":"2015","journal-title":"New J. Phys"},{"key":"2023012810011140200_btx837-B12","author":"G\u00e4rtner","year":"2007"},{"key":"2023012810011140200_btx837-B13","author":"Goldberg","year":"2014"},{"key":"2023012810011140200_btx837-B14","first-page":"232","author":"Goodwin","year":"2012"},{"key":"2023012810011140200_btx837-B15","first-page":"23","author":"Gopalakrishnan","year":"2016"},{"key":"2023012810011140200_btx837-B16","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1002\/(SICI)1097-4571(199806)49:8<674::AID-ASI2>3.0.CO;2-T","article-title":"Using latent semantic indexing for literature based discovery","volume":"49","author":"Gordon","year":"1998","journal-title":"J. Am. Soc. Inf. Sci"},{"key":"2023012810011140200_btx837-B17","first-page":"349","article-title":"Exploiting semantic relations for literature-based discovery","author":"Hristovski","year":"2006","journal-title":"AMIA Annu. Symp. Proc"},{"key":"2023012810011140200_btx837-B18","first-page":"53","volume-title":"In: Linking Literature, Information, and Knowledge for Biology: Workshop of the BioLink Special Interest Group, ISMB\/ECCB 2009, Stockholm, June 28\u201329, 2009, Revised Selected Papers.","author":"Hristovski","year":"2010"},{"key":"2023012810011140200_btx837-B19","first-page":"200","author":"Hu","year":"2006"},{"key":"2023012810011140200_btx837-B20","first-page":"207","article-title":"Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule","volume":"25","author":"Hu","year":"2010","journal-title":"Int. J. Intelligent Syst"},{"key":"2023012810011140200_btx837-B21","doi-asserted-by":"crossref","first-page":"444","DOI":"10.2337\/diacare.21.3.444","article-title":"Advances toward the implantable artificial pancreas for treatment of diabetes","volume":"21","author":"Jaremko","year":"1998","journal-title":"Diabetes Care"},{"key":"2023012810011140200_btx837-B22","first-page":"317","author":"Jha","year":"2016"},{"key":"2023012810011140200_btx837-B23","doi-asserted-by":"crossref","first-page":"e102188.","DOI":"10.1371\/journal.pone.0102188","article-title":"Large-scale structure of a network of co-occurring mesh terms: statistical analysis of macroscopic properties","volume":"9","author":"Kastrin","year":"2014","journal-title":"PLoS One"},{"key":"2023012810011140200_btx837-B24","doi-asserted-by":"crossref","first-page":"340","DOI":"10.3414\/ME15-01-0108","article-title":"Link prediction on a network of co-occurring mesh terms: towards literature-based discovery","volume":"55","author":"Kastrin","year":"2016","journal-title":"Methods Inform. Med"},{"key":"2023012810011140200_btx837-B25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0925-2312(98)00030-7","article-title":"The self-organizing map","volume":"21","author":"Kohonen","year":"1998","journal-title":"Neurocomputing"},{"key":"2023012810011140200_btx837-B26","author":"Kostoff","year":"2005"},{"key":"2023012810011140200_btx837-B27","first-page":"380","author":"Kunegis","year":"2010"},{"key":"2023012810011140200_btx837-B28","doi-asserted-by":"crossref","first-page":"265","DOI":"10.7551\/mitpress\/7287.003.0018","article-title":"Combining local context and wordnet similarity for word sense identification","volume":"49","author":"Leacock","year":"1998","journal-title":"WordNet: Electronic Lexical Database"},{"key":"2023012810011140200_btx837-B29","first-page":"2177","author":"Levy","year":"2014"},{"key":"2023012810011140200_btx837-B30","first-page":"283","author":"Li","year":"2010"},{"key":"2023012810011140200_btx837-B31","first-page":"848","author":"Li","year":"2011"},{"key":"2023012810011140200_btx837-B32","first-page":"289","author":"Li","year":"2014"},{"key":"2023012810011140200_btx837-B33","doi-asserted-by":"crossref","first-page":"1019","DOI":"10.1002\/asi.20591","article-title":"The link-prediction problem for social networks","volume":"58","author":"Liben-Nowell","year":"2007","journal-title":"J. Assoc. Inform. Sci. Technol"},{"key":"2023012810011140200_btx837-B34","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1016\/j.physa.2010.11.027","article-title":"Link prediction in complex networks: a survey","volume":"390","author":"L\u00fc","year":"2011","journal-title":"Physica A: Statist. Mechan. Appl"},{"key":"2023012810011140200_btx837-B35","doi-asserted-by":"crossref","first-page":"baq036.","DOI":"10.1093\/database\/baq036","article-title":"Pubmed and beyond: a survey of web tools for searching biomedical literature","volume":"2011","author":"Lu","year":"2011","journal-title":"Database"},{"key":"2023012810011140200_btx837-B36","first-page":"2579","article-title":"Visualizing data using t-sne","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Machine Learn. Res"},{"key":"2023012810011140200_btx837-B37","author":"McInnes","year":"2017"},{"key":"2023012810011140200_btx837-B38","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023012810011140200_btx837-B39","doi-asserted-by":"crossref","first-page":"1213","DOI":"10.1007\/s11517-012-0991-8","article-title":"Electrochemotherapy: technological advancements for efficient electroporation-based treatment of internal tumors","volume":"50","author":"Miklav\u010di\u010d","year":"2012","journal-title":"Medical Biol. Eng. Comput"},{"key":"2023012810011140200_btx837-B40","author":"Mikolov","year":"2010"},{"key":"2023012810011140200_btx837-B41","first-page":"279","article-title":"A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men","volume":"35","author":"Miller","year":"2012","journal-title":"Sleep"},{"key":"2023012810011140200_btx837-B42","first-page":"1081","author":"Mnih","year":"2009"},{"key":"2023012810011140200_btx837-B43","author":"Moen","year":"2013"},{"key":"2023012810011140200_btx837-B44","first-page":"158","author":"Muneeb","year":"2015"},{"key":"2023012810011140200_btx837-B45","first-page":"623","author":"Nguyen","year":"2006"},{"key":"2023012810011140200_btx837-B46","author":"Novacek","year":"2015"},{"key":"2023012810011140200_btx837-B47","first-page":"572","volume-title":"AMIA Annu. Symp. Proc","author":"Pakhomov","year":"2010"},{"key":"2023012810011140200_btx837-B48","doi-asserted-by":"crossref","first-page":"3635","DOI":"10.1093\/bioinformatics\/btw529","article-title":"Corpus domain effects on distributional semantic modeling of medical terms","volume":"32","author":"Pakhomov","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012810011140200_btx837-B49","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1016\/j.jbi.2006.06.004","article-title":"Measures of semantic similarity and relatedness in the biomedical domain","volume":"40","author":"Pedersen","year":"2007","journal-title":"J. Biomed. Informatics"},{"key":"2023012810011140200_btx837-B50","first-page":"105","author":"Pratt","year":"2003"},{"key":"2023012810011140200_btx837-B51","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/21.24528","article-title":"Development and application of a metric on semantic nets","volume":"19","author":"Rada","year":"1989","journal-title":"IEEE Trans. Syst. Man Cybernetics"},{"key":"2023012810011140200_btx837-B52","doi-asserted-by":"crossref","first-page":"15","DOI":"10.3233\/ISU-2011-0627","article-title":"Semantic MEDLINE: an advanced information management application for biomedicine","volume":"31","author":"Rindflesch","year":"2011","journal-title":"Inform. Serv. Use"},{"key":"2023012810011140200_btx837-B53","doi-asserted-by":"crossref","first-page":"1024","DOI":"10.7326\/0003-4819-134-11-200106050-00008","article-title":"Home monitoring service improves mean arterial pressure in patients with essential hypertensiona randomized, controlled trial","volume":"134","author":"Rogers","year":"2001","journal-title":"Ann. Internal Med"},{"key":"2023012810011140200_btx837-B54","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1016\/j.pmr.2012.11.005","article-title":"Technological advances in interventions to enhance poststroke gait","volume":"24","author":"Sheffler","year":"2013","journal-title":"Phys. Med. Rehab. Clin. North Am"},{"key":"2023012810011140200_btx837-B55","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1002\/asi.10389","article-title":"Text mining: generating hypotheses from medline","volume":"55","author":"Srinivasan","year":"2004","journal-title":"J. Assoc. Inf. Sci. Technol"},{"key":"2023012810011140200_btx837-B56","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1353\/pbm.1986.0087","article-title":"Fish oil, Raynaud\u2019s syndrome, and undiscovered public knowledge","volume":"30","author":"Swanson","year":"1986","journal-title":"Perspect. Biol. Med"},{"key":"2023012810011140200_btx837-B57","author":"Tax","year":"2015"},{"key":"2023012810011140200_btx837-B58","doi-asserted-by":"crossref","first-page":"355.","DOI":"10.2147\/tcrm.2006.2.4.355","article-title":"In vitro fertilization (ivf): a review of 3 decades of clinical innovation and technological advancement","volume":"2","author":"Wang","year":"2006","journal-title":"Therapeutics Clin. Risk Manage"},{"key":"2023012810011140200_btx837-B59","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1002\/asi.1104","article-title":"Using concepts in literature-based discovery: simulating swanson\u2019s Raynaud\u2013Fish oil and Migraine\u2013magnesium discoveries","volume":"52","author":"Weeber","year":"2001","journal-title":"J. Assoc. Inf. Sci. Technol"},{"key":"2023012810011140200_btx837-B60","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1093\/bib\/6.3.277","article-title":"Online tools to support literature-based discovery in the life sciences","volume":"6","author":"Weeber","year":"2005","journal-title":"Brief. Bioinformatics"},{"key":"2023012810011140200_btx837-B61","doi-asserted-by":"crossref","first-page":"28.","DOI":"10.1186\/s13326-015-0021-5","article-title":"Discovering relations between indirectly connected biomedical concepts","volume":"6","author":"Weissenborn","year":"2015","journal-title":"J. Biomed. Semantics"},{"key":"2023012810011140200_btx837-B62","first-page":"1514","article-title":"Graph-based methods for discovery browsing with semantic predications","volume":"2011","author":"Wilkowski","year":"2011","journal-title":"AMIA Annu. Symp. Proc"},{"key":"2023012810011140200_btx837-B63","doi-asserted-by":"crossref","first-page":"145.","DOI":"10.1186\/1471-2105-5-145","article-title":"Extending the mutual information measure to rank inferred literature relationships","volume":"5","author":"Wren","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012810011140200_btx837-B64","first-page":"133","author":"Wu","year":"1994"},{"key":"2023012810011140200_btx837-B65","first-page":"43","author":"Yu","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2103\/48935864\/bioinformatics_34_12_2103.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/12\/2103\/48935864\/bioinformatics_34_12_2103.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T17:43:31Z","timestamp":1719942211000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/12\/2103\/4774696"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,12,26]]},"references-count":65,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx837","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,6,15]]},"published":{"date-parts":[[2017,12,26]]}}}