{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,13]],"date-time":"2025-05-13T22:00:17Z","timestamp":1747173617841,"version":"3.40.5"},"reference-count":51,"publisher":"Cambridge University Press (CUP)","issue":"5","license":[{"start":{"date-parts":[[2022,2,25]],"date-time":"2022-02-25T00:00:00Z","timestamp":1645747200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents MHeTRep, a multilingual medical terminology and the methodology followed for its compilation. The multilingual terminology is organised into one vocabulary for each language. All the terms in the collection are semantically tagged with a tagset corresponding to the top categories of Snomed-CT ontology. When possible, the individual terms are linked to their equivalent in the other languages. Even though many <jats:italic>NLP<\/jats:italic> resources and tools claim to be domain independent, their application to specific tasks can be restricted to specific domains, otherwise their performance degrades notably. As the accuracy of <jats:italic>NLP<\/jats:italic> resources drops heavily when applied in environments different from which they were built, a tuning to the new environment is needed. Usually, having a domain terminology facilitates and accelerates the adaptation of general domain NLP applications to a new domain. This is particularly important in medicine, a domain living moments of great expansion. The proposed method takes Snomed-CT as starting point. From this point and using 13 multilingual resources, covering the most relevant medical concepts such as drugs, anatomy, clinical findings and procedures, we built a large resource covering seven languages totalling more than two million semantically tagged terms. The resulting collection has been intensively evaluated in several ways for the involved languages and domain categories. Our hypothesis is that MHeTRep can be used advantageously over the original resources for a number of NLP use cases and likely extended to other languages.<\/jats:p>","DOI":"10.1017\/s1351324922000055","type":"journal-article","created":{"date-parts":[[2022,2,25]],"date-time":"2022-02-25T10:08:05Z","timestamp":1645783685000},"page":"1364-1401","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":0,"title":["MHeTRep: A multilingual semantically tagged health terms repository"],"prefix":"10.1017","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1491-8691","authenticated-orcid":false,"given":"Jorge","family":"Vivaldi","sequence":"first","affiliation":[]},{"given":"Horacio","family":"Rodr\u00edguez","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,2,25]]},"reference":[{"key":"S1351324922000055_ref14","first-page":"279","article-title":"SNOMED-CT: The advanced terminology and coding system for eHealth","volume":"121","author":"Donnelly","year":"2006","journal-title":"Studies in Health Technology and Informatics"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref44","DOI":"10.1016\/j.jbi.2011.01.002"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref11","DOI":"10.1007\/978-3-642-22218-4_23"},{"doi-asserted-by":"crossref","unstructured":"Bay, M. , Brune\u00ff, D. , Herold, M. , Schulze, C. , Guckert, M. and Minor, M. (2021). Term extraction from medical documents using word embeddings. Proceedings of 2020 6th IEEE Congress on Information Science and Technology (CiSt), pp. 328\u2013333.","key":"S1351324922000055_ref4","DOI":"10.1109\/CiSt49399.2021.9357263"},{"unstructured":"Bond, F. and Foster, R. (2013). Linking and Extending an Open Multilingual Wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL, 4-9 August 2013, Sofia, Bulgaria. pp. 1352\u20131362.","key":"S1351324922000055_ref6"},{"unstructured":"Intxaurrondo, A. , P\u00e9rez-P\u00e9rez, M. , P\u00e9rez-Rodr\u00edguez, G. , L\u00f3pez-Mart\u00edn, J. , Santamaria, J. , de la Pe\u00f1a, S. , Villegas, M. , Akhondi, S. , Valencia, A. , Louren\u00e7o, A. and Krallinger, M. (2017). The Biomedical Abbreviation Recognition and Resolution (BARR) track: benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts. In Proceedings of SEPLN 2017, pp. 230\u2013246.","key":"S1351324922000055_ref24"},{"key":"S1351324922000055_ref27","first-page":"924","article-title":"PyMedTermino: an open-source generic API for advanced terminology services","volume":"210","author":"Lamy","year":"2006","journal-title":"Studies in Health Technology and Informatics"},{"unstructured":"Gonzalez-Agirre, A. , Laparra, E. and Rigau, G. (2012). Multilingual Central Repository version 3.0. European Languages Resources Association (ELRA), pp. 2525\u20132529.","key":"S1351324922000055_ref19"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref16","DOI":"10.1080\/09296179508590032"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref28","DOI":"10.1016\/j.is.2020.101636"},{"unstructured":"Heid, U. , Jauss, S. , Krueger, K. and Hohmann, A. (1996). Term extraction with standard tools for corpus exploration. Experience from German. In Proceedings of TKE\u201996 Terminology and Knowledge Engineering, pp. 139\u201350.","key":"S1351324922000055_ref21"},{"unstructured":"Jonquet, C. , Emonet, V. and Musen, M.A. (2015). Roadmap for a Multilingual BioPortal. In Proceedings of the Fourth Workshop on the Multilingual Semantic Web (MSW4) co-located with 12th Extended Semantic Web Conference (ESWC), pp. 15\u201326.","key":"S1351324922000055_ref26"},{"key":"S1351324922000055_ref32","first-page":"450","article-title":"Representing complexity in part-whole relationships within the foundational model of anatomy","author":"Mejino","year":"2003","journal-title":"AMIA Symposium"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref31","DOI":"10.1007\/s10791-015-9262-2"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref38","DOI":"10.1186\/s13326-018-0179-8"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref42","DOI":"10.1016\/j.jbi.2003.11.007"},{"unstructured":"N\u00e9v\u00e9ol, A. , Grouin, C. , Leixa, J. , Rosset, S. and Zweigenbaum, P. (2014) The QUAERO French medical corpus: A resource for medical entity recognition and normalization. In Proceeding of BioTextMining Work, pp. 24\u201330.","key":"S1351324922000055_ref34"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref43","DOI":"10.1007\/s10278-007-9073-0"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref23","DOI":"10.1016\/j.jbi.2013.07.011"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref10","DOI":"10.1007\/978-3-642-45114-0_28"},{"key":"S1351324922000055_ref2","first-page":"1","article-title":"Biomedical question answering: A survey","volume":"99","author":"Athenikos","year":"2010","journal-title":"International Committee on Computational Linguistics"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref36","DOI":"10.1016\/j.artint.2012.07.001"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref15","DOI":"10.1075\/term.9.1.06dro"},{"unstructured":"Lee, J. , Scott, D. , Villarroel, M. , Clifford, G. , Saeed, M. and Mar, R. (2011). Open access MIMIC-II database for intensive care research. In 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 8315\u20138318.","key":"S1351324922000055_ref30"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref3","DOI":"10.1007\/s10462-019-09725-4"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref33","DOI":"10.1007\/BF03256752"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref5","DOI":"10.1093\/nar\/gkh061"},{"unstructured":"Hellrich, J. , Schulz, S. , Buechel, S. and Hahn, U. (2015). JUFIT: A configurable rule engine for filtering and generating new multilingual UMLS terms. In American Medical Informatics Association Annual Symposium, pp. 604\u2013610.","key":"S1351324922000055_ref22"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref12","DOI":"10.18653\/v1\/S17-2044"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref25","DOI":"10.1038\/sdata.2016.35"},{"key":"S1351324922000055_ref13","first-page":"7","article-title":"Survey of current terminologies and ontologies in biology and medicine","author":"de Freitas","year":"2009","journal-title":"Electronic Journal in Communication, Information and Innovation in Health"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref18","DOI":"10.5120\/11638-7118"},{"key":"S1351324922000055_ref20","first-page":"640","article-title":"Flight of the PEGASUS? Comparing transformers on few-shot and zero-shot multi-document abstractive summarization","author":"Goodwin","year":"2020","journal-title":"Computer Methods and Programs in Biomedicine"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref17","DOI":"10.1007\/3-540-49653-X_35"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref48","DOI":"10.1093\/nar\/gkx1037"},{"unstructured":"Wang, R. and Liu, W. (2016). Featureless domain-specific term extraction with minimal labelled data. In Proceedings of Australasian Language Technology Association Workshop, pp. 103\u2013112.","key":"S1351324922000055_ref49"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref8","DOI":"10.1007\/978-94-024-0881-2_53"},{"key":"S1351324922000055_ref29","first-page":"265","article-title":"Medical Subject Headings (MeSH)","volume":"88","author":"Lipscomb","year":"2006","journal-title":"Bulletin of the Medical Library Association"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref39","DOI":"10.1016\/j.datak.2003.06.002"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref51","DOI":"10.1109\/MCI.2018.2840738"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref47","DOI":"10.1093\/nar\/gkj067"},{"year":"2001","author":"Cabr\u00e9","first-page":"53","key":"S1351324922000055_ref7"},{"unstructured":"Newman, D. , Koilada, N. , Lau, J. and Baldwin, T. (2012). Bayesian text segmentation for index term identification and Keyphrase extraction. In Proceedings of COLING 2012, p. 2077\u201392.","key":"S1351324922000055_ref37"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref41","DOI":"10.3233\/AO-2009-0063"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref46","DOI":"10.1109\/JBHI.2017.2767063"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref50","DOI":"10.1093\/nar\/gkr469"},{"key":"S1351324922000055_ref40","first-page":"255","volume-title":"Terminology Extraction: An Analysis of Linguistic and Statistical Approaches","author":"Pazienza","year":"2005"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref45","DOI":"10.1007\/978-3-642-35173-0_12"},{"unstructured":"Corsar, D. , Moss, L. , Sleeman, D. and Sim, M. (2009). Supporting the development of medical ontologies. Proceedings of the 4th Workshop Formal Ontologies Meet Industry, pp. 114\u2013125.","key":"S1351324922000055_ref9"},{"key":"S1351324922000055_ref35","first-page":"27","volume-title":"SemEval-2017 Task 3: Community question answering","author":"Nakov","year":"2017"},{"doi-asserted-by":"publisher","key":"S1351324922000055_ref1","DOI":"10.1136\/jamia.2009.002733"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324922000055","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,11]],"date-time":"2023-09-11T02:07:15Z","timestamp":1694398035000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324922000055\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,25]]},"references-count":51,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["S1351324922000055"],"URL":"https:\/\/doi.org\/10.1017\/s1351324922000055","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2022,2,25]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}