{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T05:41:30Z","timestamp":1775799690656,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2024,6,27]],"date-time":"2024-06-27T00:00:00Z","timestamp":1719446400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Health Data & Evidence Network"},{"name":"Innovative Medicines Initiative 2 Joint Undertaking","award":["806968"],"award-info":[{"award-number":["806968"]}]},{"name":"European Union's Horizon 2020"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objective<\/jats:title>\n                    <jats:p>To explore the feasibility of validating Dutch concept extraction tools using annotated corpora translated from English, focusing on preserving annotations during translation and addressing the scarcity of non-English annotated clinical corpora.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Materials and Methods<\/jats:title>\n                    <jats:p>Three annotated corpora were standardized and translated from English to Dutch using 2 machine translation services, Google Translate and OpenAI GPT-4, with annotations preserved through a proposed method of embedding annotations in the text before translation. The performance of 2 concept extraction tools, MedSpaCy and MedCAT, was assessed across the corpora in both Dutch and English.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The translation process effectively generated Dutch annotated corpora and the concept extraction tools performed similarly in both English and Dutch. Although there were some differences in how annotations were preserved across translations, these did not affect extraction accuracy. Supervised MedCAT models consistently outperformed unsupervised models, whereas MedSpaCy demonstrated high recall but lower precision.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Our validation of Dutch concept extraction tools on corpora translated from English was successful, highlighting the efficacy of our annotation preservation method and the potential for efficiently creating multilingual corpora. Further improvements and comparisons of annotation preservation techniques and strategies for corpus synthesis could lead to more efficient development of multilingual corpora and accurate non-English concept extraction tools.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>This study has demonstrated that translated English corpora can be used to validate non-English concept extraction tools. The annotation preservation method used during translation proved effective, and future research can apply this corpus translation method to additional languages and clinical settings.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocae159","type":"journal-article","created":{"date-parts":[[2024,6,10]],"date-time":"2024-06-10T17:26:53Z","timestamp":1718040413000},"page":"1725-1734","source":"Crossref","is-referenced-by-count":8,"title":["Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5369-8260","authenticated-orcid":false,"given":"Tom M","family":"Seinen","sequence":"first","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center , 3015 GD Rotterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jan A","family":"Kors","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center , 3015 GD Rotterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erik M","family":"van Mulligen","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center , 3015 GD Rotterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter R","family":"Rijnbeek","sequence":"additional","affiliation":[{"name":"Department of Medical Informatics, Erasmus University Medical Center , 3015 GD Rotterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,6,27]]},"reference":[{"issue":"8","key":"2024071907530894100_ocae159-B1","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1093\/jamia\/ocy032","article-title":"Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data","volume":"25","author":"Reps","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2024071907530894100_ocae159-B2","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1136\/ard-2022-222626","article-title":"From real-world electronic health record data to real-world results using artificial intelligence","volume":"82","author":"Knevel","year":"2023","journal-title":"Ann Rheum Dis"},{"key":"2024071907530894100_ocae159-B3","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1146\/annurev-biodatasci-030421-030931","article-title":"Modern clinical text mining: a guide and review","volume":"4","author":"Percha","year":"2021","journal-title":"Annu Rev Biomed Data Sci"},{"issue":"5","key":"2024071907530894100_ocae159-B4","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1093\/jamia\/ocv180","article-title":"Extracting information from the text of electronic medical records to improve case detection: a systematic review","volume":"23","author":"Ford","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2024071907530894100_ocae159-B5","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32(suppl_1)","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res Spec Publ"},{"issue":"12","key":"2024071907530894100_ocae159-B6","doi-asserted-by":"crossref","first-page":"1973","DOI":"10.1093\/jamia\/ocad160","article-title":"The added value of text from Dutch general practitioner notes in predictive modeling","volume":"30","author":"Seinen","year":"2023","journal-title":"J Am Med Inform Assoc"},{"issue":"7","key":"2024071907530894100_ocae159-B7","doi-asserted-by":"crossref","first-page":"1292","DOI":"10.1093\/jamia\/ocac058","article-title":"Use of unstructured text in prognostic clinical prediction models: a systematic review","volume":"29","author":"Seinen","year":"2022","journal-title":"J Am Med Inform Assoc"},{"key":"2024071907530894100_ocae159-B8","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.jbi.2017.07.012","article-title":"Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review","volume":"73","author":"Kreimeyer","year":"2017","journal-title":"J Biomed Inform"},{"key":"2024071907530894100_ocae159-B9","doi-asserted-by":"crossref","first-page":"105122","DOI":"10.1016\/j.ijmedinf.2023.105122","article-title":"Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review","volume":"177","author":"Fraile Navarro","year":"2023","journal-title":"Int J Med Inform"},{"key":"2024071907530894100_ocae159-B10","first-page":"491","article-title":"Biomedical corpora and natural language processing on clinical text in languages other than English: a systematic review","author":"AlShuweihi","year":"2021"},{"issue":"5","key":"2024071907530894100_ocae159-B11","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2024071907530894100_ocae159-B12","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","author":"Aronson","year":"2001","journal-title":"Proc AMIA Symp"},{"key":"2024071907530894100_ocae159-B13","article-title":"QuickUMLS: a fast, unsupervised approach for medical concept extraction","author":"Soldaini","year":"2016","journal-title":"MedIR Workshop, SIGIR"},{"key":"2024071907530894100_ocae159-B14","doi-asserted-by":"crossref","first-page":"102083","DOI":"10.1016\/j.artmed.2021.102083","article-title":"Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit","volume":"117","author":"Kraljevic","year":"2021","journal-title":"Artif Intell Med"},{"key":"2024071907530894100_ocae159-B15","author":"Bai","year":"2021"},{"key":"2024071907530894100_ocae159-B16","author":"Hu"},{"issue":"1","key":"2024071907530894100_ocae159-B17","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1002\/cpt.2479","article-title":"Real-world evidence in EU medicines regulation: enabling use and establishing value","volume":"111","author":"Arlett","year":"2022","journal-title":"Clin Pharmacol Ther"},{"issue":"2","key":"2024071907530894100_ocae159-B18","doi-asserted-by":"crossref","first-page":"e10214","DOI":"10.1002\/lrh2.10214","article-title":"The European Medical Information Framework: a novel ecosystem for sharing healthcare data across Europe","volume":"4","author":"Lovestone","year":"2020","journal-title":"Learn Health Syst"},{"issue":"12","key":"2024071907530894100_ocae159-B19","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1007\/s40264-023-01353-w","article-title":"Supporting pharmacovigilance signal validation and prioritization with analyses of routinely collected health data: lessons learned from an EHDEN Network Study","volume":"46","author":"Gauffin","year":"2023","journal-title":"Drug Saf"},{"key":"2024071907530894100_ocae159-B20","author":"European Medicines Agency"},{"issue":"1","key":"2024071907530894100_ocae159-B21","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1136\/amiajnl-2011-000376","article-title":"Validation of a common data model for active safety surveillance research","volume":"19","author":"Overhage","year":"2012","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2024071907530894100_ocae159-B22","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1093\/jamia\/ocad247","article-title":"OHDSI Standardized Vocabularies\u2014a large-scale centralized reference ontology for international data harmonization","volume":"31","author":"Reich","year":"2024","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2024071907530894100_ocae159-B23","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1186\/s12859-022-05130-x","article-title":"Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods","volume":"24","author":"van Es","year":"2023","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"2024071907530894100_ocae159-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13326-020-00231-z","article-title":"Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies","volume":"11","author":"Kersloot","year":"2020","journal-title":"J Biomed Semant"},{"issue":"5","key":"2024071907530894100_ocae159-B25","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","article-title":"2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text","volume":"18","author":"Uzuner","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"2024071907530894100_ocae159-B26","first-page":"1613","author":"Mowery","year":"2014"},{"key":"2024071907530894100_ocae159-B27","author":"Mohan"},{"key":"2024071907530894100_ocae159-B28","first-page":"7221","author":"De Vries"},{"issue":"5","key":"2024071907530894100_ocae159-B29","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1093\/jamia\/ocv037","article-title":"A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC","volume":"22","author":"Kors","year":"2015","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2024071907530894100_ocae159-B30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13326-018-0179-8","article-title":"Clinical natural language processing in languages other than English: opportunities and challenges","volume":"9","author":"N\u00e9v\u00e9ol","year":"2018","journal-title":"J Biomed Semant"},{"key":"2024071907530894100_ocae159-B31","author":"Patel","year":"2018"},{"key":"2024071907530894100_ocae159-B32","author":"Anaby-Tavor","year":"2020"},{"key":"2024071907530894100_ocae159-B33","author":"Schick"},{"key":"2024071907530894100_ocae159-B34","first-page":"671","author":"Whitehouse"},{"key":"2024071907530894100_ocae159-B35","doi-asserted-by":"crossref","first-page":"104478","DOI":"10.1016\/j.jbi.2023.104478","article-title":"Annotated dataset creation through large language models for non-english medical NLP","volume":"145","author":"Frei","year":"2023","journal-title":"J Biomed Inform"},{"key":"2024071907530894100_ocae159-B36","doi-asserted-by":"crossref","first-page":"100212","DOI":"10.1016\/j.simpa.2021.100212","article-title":"GERNERMED: An open German medical NER model","volume":"11","author":"Frei","year":"2022","journal-title":"Software Impacts"},{"key":"2024071907530894100_ocae159-B37","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.eng.2021.03.023","article-title":"Progress in machine translation","volume":"18","author":"Wang","year":"2022","journal-title":"Engineering"},{"key":"2024071907530894100_ocae159-B38","author":"Gaschi","year":"2023"},{"key":"2024071907530894100_ocae159-B39","doi-asserted-by":"crossref","first-page":"104513","DOI":"10.1016\/j.jbi.2023.104513","article-title":"GERNERMED++: Semantic annotation in German medical NLP through transfer-learning, translation and word alignment","volume":"147","author":"Frei","year":"2023","journal-title":"J Biomed Inform"},{"key":"2024071907530894100_ocae159-B40","author":"Achiam"},{"key":"2024071907530894100_ocae159-B41","author":"Papineni","year":"2002"},{"key":"2024071907530894100_ocae159-B42","author":"Popovi\u0107","year":"2015"},{"issue":"1","key":"2024071907530894100_ocae159-B43","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"issue":"3","key":"2024071907530894100_ocae159-B44","doi-asserted-by":"crossref","first-page":"253","DOI":"10.22158\/sll.v3n3p253","article-title":"An updated evaluation of Google translate accuracy","volume":"3","author":"Aiken","year":"2019","journal-title":"Stud Linguist Literature"},{"key":"2024071907530894100_ocae159-B45","author":"Jiao"},{"issue":"10","key":"2024071907530894100_ocae159-B46","doi-asserted-by":"crossref","first-page":"574","DOI":"10.3390\/info14100574","article-title":"Translation performance from the user\u2019s perspective of large language models and neural machine translation systems","volume":"14","author":"Son","year":"2023","journal-title":"Inform"},{"key":"2024071907530894100_ocae159-B47","doi-asserted-by":"crossref","DOI":"10.1093\/jamia\/ocae029","article-title":"BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights","author":"Remy","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2024071907530894100_ocae159-B48","first-page":"565","author":"Liu"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/8\/1725\/58591226\/ocae159.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/8\/1725\/58591226\/ocae159.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T03:53:34Z","timestamp":1721361214000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/8\/1725\/7697361"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,27]]},"references-count":48,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,6,27]]},"published-print":{"date-parts":[[2024,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae159","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.03.14.24304289","asserted-by":"object"}]},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,8]]},"published":{"date-parts":[[2024,6,27]]}}}