{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T00:33:35Z","timestamp":1759538015638,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686295","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,2]],"date-time":"2025-10-02T00:00:00Z","timestamp":1759363200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,2]]},"abstract":"<jats:p>Clinical research often requires integrating data from diverse sources, which differ not only in structure but also in semantics and language. Traditional extract-transform-load (ETL) pipelines struggle to handle semantic variability and lack built-in support for multilingual or ontology-driven harmonisation. This fragmentation limits the interoperability and reuse of clinical datasets in large-scale analyses. In this paper, we propose an integrated framework that combines an embedding-based concept mapping engine with an automated ETL pipeline using Apache Airflow. The mapping engine uses transformer-based embeddings to align clinical terms with standard concepts, producing outputs in White Rabbit and Usagi-compatible formats to ensure backward interoperability. We validated the system using multilingual real-world datasets demonstrating its ability to handle heterogeneous inputs and maintain end-to-end reproducibility.<\/jats:p>","DOI":"10.3233\/shti251524","type":"book-chapter","created":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T09:57:48Z","timestamp":1759485468000},"source":"Crossref","is-referenced-by-count":0,"title":["A Semantic-Driven for Cohort Data Harmonisation into OMOP CDM Schema"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-3983-7926","authenticated-orcid":false,"given":"Raquel","family":"Paradinha","sequence":"first","affiliation":[{"name":"IEETA \/ DETI, LASI, University of Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4526-2249","authenticated-orcid":false,"given":"Vicente","family":"Barros","sequence":"additional","affiliation":[{"name":"IEETA \/ DETI, LASI, University of Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0729-2264","authenticated-orcid":false,"given":"Jo\u00e3o Rafael","family":"Almeida","sequence":"additional","affiliation":[{"name":"IEETA \/ DETI, LASI, University of Aveiro, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6672-6176","authenticated-orcid":false,"given":"Jos\u00e9 Lu\u00eds","family":"Oliveira","sequence":"additional","affiliation":[{"name":"IEETA \/ DETI, LASI, University of Aveiro, Portugal"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","Good Evaluation - Better Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI251524","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T09:57:48Z","timestamp":1759485468000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI251524"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,2]]},"ISBN":["9781643686295"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti251524","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"value":"0926-9630","type":"print"},{"value":"1879-8365","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,2]]}}}