{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:58:38Z","timestamp":1774630718807,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"S16","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T00:00:00Z","timestamp":1575244800000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>In recent years, deep learning methods have been applied to many natural language processing tasks to achieve state-of-the-art performance. However, in the biomedical domain, they have not out-performed supervised word sense disambiguation (WSD) methods based on support vector machines or random forests, possibly due to inherent similarities of medical word senses.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>In this paper, we propose two deep-learning-based models for supervised WSD: a model based on bi-directional long short-term memory (BiLSTM) network, and an attention model based on self-attention architecture. Our result shows that the BiLSTM neural network model with a suitable upper layer structure performs even better than the existing state-of-the-art models on the MSH WSD dataset, while our attention model was 3 or 4 times faster than our BiLSTM model with good accuracy. In addition, we trained \u201cuniversal\u201d models in order to disambiguate all ambiguous words together. That is, we concatenate the embedding of the target ambiguous word to the max-pooled vector in the universal models, acting as a \u201chint\u201d. The result shows that our universal BiLSTM neural network model yielded about 90 percent accuracy.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusion<\/jats:title>\n<jats:p>Deep contextual models based on sequential information processing methods are able to capture the relative contextual information from pre-trained input word embeddings, in order to provide state-of-the-art results for supervised biomedical WSD tasks.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-019-3079-8","type":"journal-article","created":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T07:00:35Z","timestamp":1575270035000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks"],"prefix":"10.1186","volume":"20","author":[{"given":"Canlin","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Daniel","family":"Bi\u015b","sequence":"additional","affiliation":[]},{"given":"Xiuwen","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Zhe","family":"He","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,12,2]]},"reference":[{"issue":"6","key":"3079_CR1","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1016\/j.jbi.2008.02.003","volume":"41","author":"Guergana K. Savova","year":"2008","unstructured":"Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, Groen PCd, Chute CG. Word sense disambiguation across two domains: Biomedical literature and clinical notes. J Biomed Inform. 2008; 41(6):1088\u2013100. https:\/\/doi.org\/10.1016\/j.jbi.2008.02.003.","journal-title":"Journal of Biomedical Informatics"},{"issue":"2","key":"3079_CR2","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/1459352.1459355","volume":"41","author":"R Navigli","year":"2009","unstructured":"Navigli R. Word sense disambiguation: A survey. ACM Comput Surv (CSUR). 2009; 41(2):10.","journal-title":"ACM Comput Surv (CSUR)"},{"key":"3079_CR3","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1197\/jamia.M1533","volume":"11 4","author":"H Liu","year":"2004","unstructured":"Liu H, Teller V, Friedman C. Research paper: A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc JAMIA. 2004; 11 4:320\u201331.","journal-title":"J Am Med Inform Assoc JAMIA"},{"key":"3079_CR4","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1186\/1471-2105-7-334","volume":"7","author":"H Xu","year":"2006","unstructured":"Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics. 2006; 7:334.","journal-title":"BMC Bioinformatics"},{"issue":"7","key":"3079_CR5","doi-asserted-by":"publisher","first-page":"800","DOI":"10.1093\/jamia\/ocy013","volume":"25","author":"Y Wang","year":"2018","unstructured":"Wang Y, Zheng K, Xu H, Mei Q. Interactive medical word sense disambiguation through informed learning. J Am Med Inform Assoc. 2018; 25(7):800\u20138.","journal-title":"J Am Med Inform Assoc"},{"key":"3079_CR6","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1006\/jbin.2001.1023","volume":"34 4","author":"H Liu","year":"2001","unstructured":"Liu H, Lussier YA, Friedman C. Disambiguating ambiguous biomedical terms in biomedical narrative text: An unsupervised method. J Biomed Inform. 2001; 34 4:249\u201361.","journal-title":"J Biomed Inform"},{"issue":"2","key":"3079_CR7","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1016\/j.jbi.2006.06.001","volume":"40","author":"H Yu","year":"2007","unstructured":"Yu H, Kim W, Hatzivassiloglou V, Wilbur WJ. Using medline as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles. J Biomed Inform. 2007; 40(2):150\u20139.","journal-title":"J Biomed Inform"},{"key":"3079_CR8","unstructured":"Xu H, Stetson PD, Friedman C. Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. In: AMIA Annual Symposium Proceedings. American Medical Informatics Association: 2012. p. 1004\u201313."},{"key":"3079_CR9","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1016\/j.artmed.2018.03.002","volume":"87","author":"A Duque","year":"2018","unstructured":"Duque A, Stevenson M, Martinez-Romo J, Araujo L. Co-occurrence graphs for word sense disambiguation in the biomedical domain. Artif Intell Med. 2018; 87:9\u201319.","journal-title":"Artif Intell Med"},{"key":"3079_CR10","doi-asserted-by":"crossref","unstructured":"Jimeno-Yepes A, Aronson AR. Knowledge-based biomedical word sense disambiguation: comparison of approaches. BMC Bioinformatics. 2010; 11(1).","DOI":"10.1186\/1471-2105-11-569"},{"key":"3079_CR11","doi-asserted-by":"publisher","unstructured":"Sabbir A, Jimeno-Yepes A, Kavuluru R. Knowledge-based biomedical word sense disambiguation with neural concept embeddings. In: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE): 2017. p. 163\u201370. https:\/\/doi.org\/10.1109\/BIBE.2017.00-61.","DOI":"10.1109\/BIBE.2017.00-61"},{"key":"3079_CR12","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems: 2013. p. 3111\u20139."},{"key":"3079_CR13","doi-asserted-by":"publisher","unstructured":"Rais M, Lachkar A. Biomedical word sense disambiguation context-based: Improvement of senserelate method. In: 2016 International Conference on Information Technology for Organizations Development (IT4OD): 2016. p. 1\u20136. https:\/\/doi.org\/10.1109\/IT4OD.2016.7479309.","DOI":"10.1109\/IT4OD.2016.7479309"},{"key":"3079_CR14","first-page":"8","volume":"236","author":"S Festag","year":"2017","unstructured":"Festag S, Spreckelsen C. Word sense disambiguation of medical terms via recurrent convolutional neural networks. Stud Health Technol Inform. 2017; 236:8\u201315. IOS Press.","journal-title":"Stud Health Technol Inform."},{"key":"3079_CR15","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1016\/j.jbi.2017.08.001","volume":"73","author":"AJ Yepes","year":"2017","unstructured":"Yepes AJ. Word embeddings and recurrent neural networks based on long-short term memory nodes in supervised biomedical word sense disambiguation. J Biomed Inform. 2017; 73:137\u201347.","journal-title":"J Biomed Inform"},{"key":"3079_CR16","doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. arXiv preprint arXiv:1802.05365. 2018. https:\/\/arxiv.org\/abs\/1802.05365.","DOI":"10.18653\/v1\/N18-1202"},{"key":"3079_CR17","doi-asserted-by":"crossref","unstructured":"Bis D, Zhang C, Liu X, He Z. Layered Multistep Bidirectional Long Short-Term Memory Networks for Biomedical Word Sense Disambiguation. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine. IEEE: 2018. p. 313\u2013320.","DOI":"10.1109\/BIBM.2018.8621383"},{"key":"3079_CR18","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need: 2017. p 5998\u20136008."},{"issue":"8","key":"3079_CR19","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9(8):1735\u201380.","journal-title":"Neural Comput"},{"issue":"10","key":"3079_CR20","doi-asserted-by":"publisher","first-page":"2451","DOI":"10.1162\/089976600300015015","volume":"12","author":"F Gers","year":"2000","unstructured":"Gers F, Schmidhuber J, Cummins F. Learning to forget: Continual prediction with lstm. Neural Comput. 2000; 12(10):2451\u201371.","journal-title":"Neural Comput"},{"issue":"5-6","key":"3079_CR21","doi-asserted-by":"publisher","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","volume":"18","author":"A Graves","year":"2005","unstructured":"Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 2005; 18(5-6):602\u201310.","journal-title":"Neural Netw"},{"key":"3079_CR22","unstructured":"Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329. 2014. https:\/\/arxiv.org\/abs\/1409.2329."},{"key":"3079_CR23","first-page":"161","volume":"20","author":"L Bottou","year":"2008","unstructured":"Bottou L, Olivier B. The tradeoffs of large scale learning. Adv Neural Inf Process Syst. 2008; 20:161\u20138.","journal-title":"Adv Neural Inf Process Syst"},{"key":"3079_CR24","unstructured":"Ramsunder B. Tensorflow tutorial. Presentation of Stanford machine learning course. https:\/\/cs224d.stanford.edu\/lectures\/CS224d-Lecture7.pdf. Accessed 1 Mar 2019."},{"issue":"1","key":"3079_CR25","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1186\/1471-2105-12-223","volume":"12","author":"AJ Jimeno-Yepes","year":"2011","unstructured":"Jimeno-Yepes AJ, McInnes BT, Aronson AR. Exploiting mesh indexing in medline to generate a data set for word sense disambiguation. BMC Bioinformatics. 2011; 12(1):223.","journal-title":"BMC Bioinformatics"},{"key":"3079_CR26","unstructured":"Graves A, Jaitly N. Towards end-to-end speech recognition with recurrent neural networks. In: Proc. 31st International Conference on Machine Learning, vol 32: 2014. p. 1764\u201372."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3079-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3079-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3079-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T19:15:43Z","timestamp":1606763743000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3079-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":26,"journal-issue":{"issue":"S16","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3079"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3079-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"2 December 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"502"}}