{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T03:02:27Z","timestamp":1760151747031,"version":"build-2065373602"},"reference-count":18,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,4,18]],"date-time":"2022-04-18T00:00:00Z","timestamp":1650240000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004826","name":"Beijing Natural Science Foundation","doi-asserted-by":"publisher","award":["4192007"],"award-info":[{"award-number":["4192007"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The aim of Medical Knowledge Graph Completion is to automatically predict one of three parts (head entity, relationship, and tail entity) in RDF triples from medical data, mainly text data. Following their introduction, the use of pretrained language models, such as Word2vec, BERT, and XLNET, to complete Medical Knowledge Graphs has become a popular research topic. The existing work focuses mainly on relationship completion and has rarely solved entities and related triples. In this paper, a framework to predict RDF triples for Medical Knowledge Graphs based on word embeddings (named PTMKG-WE) is proposed, for the specific use for the completion of entities and triples. The framework first formalizes existing samples for a given relationship from the Medical Knowledge Graph as prior knowledge. Second, it trains word embeddings from big medical data according to prior knowledge through Word2vec. Third, it can acquire candidate triples from word embeddings based on analogies from existing samples. In this framework, the paper proposes two strategies to improve the relation features. One is used to refine the relational semantics by clustering existing triple samples. Another is used to accurately embed the expression of the relationship through means of existing samples. These two strategies can be used separately (called PTMKG-WE-C and PTMKG-WE-M, respectively), and can also be superimposed (called PTMKG-WE-C-M) in the framework. Finally, in the current study, PubMed data and the National Drug File-Reference Terminology (NDF-RT) were collected, and a series of experiments was conducted. The experimental results show that the framework proposed in this paper and the two improvement strategies can be used to predict new triples for Medical Knowledge Graphs, when medical data are sufficiently abundant and the Knowledge Graph has appropriate prior knowledge. The two strategies designed to improve the relation features have a significant effect on the lifting precision, and the superposition effect becomes more obvious. Another conclusion is that, under the same parameter setting, the semantic precision of word embedding can be improved by extending the breadth and depth of data, and the precision of the prediction framework in this paper can be further improved in most cases. Thus, collecting and training big medical data is a viable method to learn more useful knowledge.<\/jats:p>","DOI":"10.3390\/info13040205","type":"journal-article","created":{"date-parts":[[2022,4,18]],"date-time":"2022-04-18T22:04:02Z","timestamp":1650319442000},"page":"205","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Medical Knowledge Graph Completion Based on Word Embeddings"],"prefix":"10.3390","volume":"13","author":[{"given":"Mingxia","family":"Gao","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Jianguo","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada"}]},{"given":"Furong","family":"Chen","sequence":"additional","affiliation":[{"name":"TravelSky Technology Limited, Beijing 101300, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,18]]},"reference":[{"key":"ref_1","first-page":"317803","article-title":"Ontology-oriented diagnostic system for traditional Chinese medicine based on relation refinement","volume":"57","author":"Gu","year":"2013","journal-title":"Comput. Math. Methods Med."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1007\/s10916-009-9360-z","article-title":"Mohammad. A unified architecture for biomedical search engines based on semantic web technologies","volume":"35","author":"Jalali","year":"2011","journal-title":"J. Med. Syst."},{"key":"ref_3","first-page":"829","article-title":"Xmqas\u2014An ontology based medical question answering system","volume":"5","author":"Midhunlal","year":"2016","journal-title":"Int. J. Adv. Res. Comput. Commun. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1016\/j.jbi.2010.08.005","article-title":"Using text to build semantic networks for pharmacogenomics","volume":"43","author":"Coulet","year":"2010","journal-title":"J. Biomed. Inform."},{"key":"ref_5","first-page":"1","article-title":"Automatic extraction of semantic relations between medical entities: A rule based approach","volume":"2","author":"Zweigenbaum","year":"2011","journal-title":"J. Biomed. Semant."},{"key":"ref_6","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_8","unstructured":"Jevlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_9","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv."},{"key":"ref_10","unstructured":"Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q.V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Su, M., and Su, H. (2021, January 23\u201325). Deep learning for knowledge graph completion with XLNET. Proceedings of the ICDLT 2021, Qingdao, China.","DOI":"10.1145\/3480001.3480022"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Jaradeh, M.Y., Singh, K., Stocker, M., and Auer, S. (2021, January 2\u20133). Triple classification for scholarly knowledge graph completion. Proceedings of the K-CAP\u201921, Virtual Event.","DOI":"10.1145\/3460210.3493582"},{"key":"ref_13","unstructured":"Minarro-Gimenez, J.A., Mar\u0131n-Alonso, O., and Samwald, M. (2015). Applying deep learning techniques on medical corpora from the world wide web: A prototypical system and evaluation. arXiv."},{"key":"ref_14","unstructured":"Casteleiro, M.A., Demetriou, G., Read, W.J., Prieto, M.J.F., MasedaFernandez, D., Nenadic, G., Klein, J., Keane, J.A., and Stevens, R. (2016, January 29\u201330). Deep learning meets semantic web: A feasibility study with the cardiovascular disease ontology and pubmed citations. Proceedings of the 7th Workshop on Ontologies and Data in Life Sciences 2016, Halle (Saale), Germany."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-021-01622-7","article-title":"Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion","volume":"21","author":"Lan","year":"2021","journal-title":"BMC Med. Inform. Decis. Mak."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, B., Zhou, D., Xiao, J., Jiang, X., Liu, Q., Yuan, N.J., and Xu, T. (2020, January 16\u201320). BERT-MK: Integrating graph contextualized knowledge into pre-trained language models. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Hong Kong, China.","DOI":"10.18653\/v1\/2020.findings-emnlp.207"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Arnaud, \u00c9., Elbattah, M., Gignon, M., and Dequen, G. (2022, January 9\u201311). Learning embeddings from free-text triage notes using pretrained transformer models. Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022), Vienna, Austria.","DOI":"10.5220\/0011012800003123"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1093\/jamiaopen\/ooaa022","article-title":"Generating contextual embeddings for emergency department chief complaints","volume":"3","author":"Chang","year":"2020","journal-title":"JAMIA Open"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/205\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:55:51Z","timestamp":1760136951000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/205"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,18]]},"references-count":18,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["info13040205"],"URL":"https:\/\/doi.org\/10.3390\/info13040205","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,4,18]]}}}