{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T04:30:23Z","timestamp":1776054623648,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01ZZ1802B"],"award-info":[{"award-number":["01ZZ1802B"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Herzzentrum G\u00f6ttingen"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Secondary use of routine medical data is key to large-scale clinical and health services research. In a maximum care hospital, the volume of data generated exceeds the limits of big data on a daily basis. This so-called \u201creal world data\u201d are essential to complement knowledge and results from clinical trials. Furthermore, big data may help in establishing precision medicine. However, manual data extraction and annotation workflows to transfer routine data into research data would be complex and inefficient. Generally, best practices for managing research data focus on data output rather than the entire data journey from primary sources to analysis. To eventually make routinely collected data usable and available for research, many hurdles have to be overcome. In this work, we present the implementation of an automated framework for timely processing of clinical care data including free texts and genetic data (non-structured data) and centralized storage as Findable, Accessible, Interoperable, Reusable (FAIR) research data in a maximum care university hospital.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>We identify data processing workflows necessary to operate a medical research data service unit in a maximum care hospital. We decompose structurally equal tasks into elementary sub-processes and propose a framework for general data processing. We base our processes on open-source software-components and, where necessary, custom-built generic tools.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We demonstrate the application of our proposed framework in practice by describing its use in our Medical Data Integration Center (MeDIC). Our microservices-based and fully open-source data processing automation framework incorporates a complete recording of data management and manipulation activities. The prototype implementation also includes a metadata schema for data provenance and a process validation concept. All requirements of a MeDIC are orchestrated within the proposed framework: Data input from many heterogeneous sources, pseudonymization and harmonization, integration in a data warehouse and finally possibilities for extraction or aggregation of data for research purposes according to data protection requirements.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Though the framework is not a panacea for bringing routine-based research data into compliance with FAIR principles, it provides a much-needed possibility to process data in a fully automated, traceable, and reproducible manner.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-023-02195-3","type":"journal-article","created":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T14:39:02Z","timestamp":1684161542000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital"],"prefix":"10.1186","volume":"23","author":[{"given":"Marcel","family":"Parciak","sequence":"first","affiliation":[]},{"given":"Markus","family":"Suhr","sequence":"additional","affiliation":[]},{"given":"Christian","family":"Schmidt","sequence":"additional","affiliation":[]},{"given":"Caroline","family":"B\u00f6nisch","sequence":"additional","affiliation":[]},{"given":"Benjamin","family":"L\u00f6hnhardt","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2166-846X","authenticated-orcid":false,"given":"Dorothea","family":"Keszty\u00fcs","sequence":"additional","affiliation":[]},{"given":"Tibor","family":"Keszty\u00fcs","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,15]]},"reference":[{"issue":"1","key":"2195_CR1","doi-asserted-by":"publisher","first-page":"28","DOI":"10.15265\/IY-2017-008","volume":"26","author":"FJ Martin-Sanchez","year":"2017","unstructured":"Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform. 2017;26(1):28\u201337.","journal-title":"Yearb Med Inform"},{"key":"2195_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2016.18","volume":"3","author":"MD Wilkinson","year":"2016","unstructured":"Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. Comment: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:1\u20139.","journal-title":"Sci Data."},{"key":"2195_CR3","first-page":"230","volume-title":"Provenance and Annotation of Data and Processes IPAW 2016 Lecture Notes in Computer Science","author":"Y Cao","year":"2016","unstructured":"Cao Y, Jones C, Cuevas-Vicentt\u00edn V, Jones MB, Lud\u00e4scher B, McPhillips T, et al. DataONE: A Data Federation with Provenance Support. In: Mattoso M, Glavic B, editors., et al., Provenance and Annotation of Data and Processes IPAW 2016 Lecture Notes in Computer Science. Springer Cham; 2016. p. 230\u20134."},{"issue":"6","key":"2195_CR4","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1038\/ng.3864","volume":"49","author":"L Ohno-Machado","year":"2017","unstructured":"Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet. 2017;49(6):816\u20139.","journal-title":"Nat Genet"},{"issue":"2","key":"2195_CR5","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1089\/bio.2017.0110","volume":"16","author":"P Holub","year":"2018","unstructured":"Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schl\u00fcnder I, Martin GM, et al. Enhancing Reuse of Data and Biological Material in Medical Research: From FAIR to FAIR-Health. Biopreserv Biobank. 2018;16(2):97\u2013105.","journal-title":"Biopreserv Biobank"},{"issue":"01","key":"2195_CR6","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1055\/s-0038-1641210","volume":"27","author":"P Knaup","year":"2018","unstructured":"Knaup P, Deserno T, Prokosch H-U, Sax U. Implementation of a National Framework to Promote Health Data Sharing. Yearb Med Inform. 2018;27(01):302\u20134.","journal-title":"Yearb Med Inform"},{"issue":"Open 1","key":"2195_CR7","first-page":"66","volume":"57","author":"B Haarbrandt","year":"2018","unstructured":"Haarbrandt B, Schreiweis B, Rey S, Sax U, Scheithauer S, Rienhoff O, et al. HiGHmed - An Open Platform Approach to Enhance Care and Research across Institutional Boundaries. Methods Inf Med. 2018;57(Open 1):66\u201381.","journal-title":"Methods Inf Med."},{"key":"2195_CR8","volume-title":"CIDR 2015 - 7th Biennial Conference on Innovative Data Systems Research","author":"I Terrizzano","year":"2015","unstructured":"Terrizzano I, Schwarz P, Roth M, Colino JE. Data wrangling: The challenging journey from the wild to the lake. In: CIDR 2015 - 7th Biennial Conference on Innovative Data Systems Research. 2015."},{"key":"2195_CR9","first-page":"1199","volume":"2019","author":"E Aghajani","year":"2019","unstructured":"Aghajani E, Nagy C, Vega-Marquez OL, Linares-Vasquez M, Moreno L, Bavota G, et al. Software Documentation Issues Unveiled. Proc - Int Conf Softw Eng. 2019;2019:1199\u2013210.","journal-title":"Proc - Int Conf Softw Eng."},{"key":"2195_CR10","first-page":"298","volume":"264","author":"M Parciak","year":"2019","unstructured":"Parciak M, Bauer C, Bender T, Lodahl R, Schreiweis B, Tute E, et al. Provenance solutions for medical research in heterogeneous IT-infrastructure: An implementation roadmap. Stud Health Technol Inform. 2019;264:298\u2013302.","journal-title":"Stud Health Technol Inform"},{"key":"2195_CR11","first-page":"262","volume":"228","author":"CR Bauer","year":"2017","unstructured":"Bauer CR, Umbach N, Baum B, Buckow K, Franke T, Gr\u00fctz R, et al. Architecture of a biomedical informatics research data management pipeline. Stud Health Technol Inform. 2017;228:262\u20136.","journal-title":"Stud Health Technol Inform"},{"issue":"6","key":"2195_CR12","first-page":"E21","volume":"59","author":"AA Sinaci","year":"2020","unstructured":"Sinaci AA, N\u00fa\u00f1ez-Benjumea FJ, Gencturk M, Jauer ML, Deserno T, Chronaki C, et al. From Raw Data to FAIR Data: The FAIRification Workflow for Health Research. Methods Inf Med. 2020;59(6):E21-32.","journal-title":"Methods Inf Med"},{"key":"2195_CR13","first-page":"392","volume":"270","author":"M L\u00f6be","year":"2020","unstructured":"L\u00f6be M, Matthies F, St\u00e4ubert S, Meineke FA, Winter A. Problems in fairifying medical datasets. Stud Health Technol Inform. 2020;270:392\u20136.","journal-title":"Stud Health Technol Inform"},{"issue":"1","key":"2195_CR14","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1055\/s-0040-1712510","volume":"59","author":"K Bhatia","year":"2020","unstructured":"Bhatia K, Tanch J, Chen ES, Sarkar IN. Applying FAIR Principles to Improve Data Searchability of Emergency Department Datasets: A Case Study for HCUP-SEDD. Methods Inf Med. 2020;59(1):48\u201356.","journal-title":"Methods Inf Med"},{"key":"2195_CR15","first-page":"102","volume":"2849","author":"C B\u00f6nisch","year":"2019","unstructured":"B\u00f6nisch C, Sargeant A, Wulff A, Parciak M, Bauer CR, Sax U. FAIRness of openEHR archetypes and templates. CEUR Workshop Proc. 2019;2849:102\u201311.","journal-title":"CEUR Workshop Proc"},{"key":"2195_CR16","doi-asserted-by":"crossref","unstructured":"Pereira A, Lopes RP, Oliveira JL. SCALEUS-FD: A FAIR Data Tool for Biomedical Applications. Biomed Res Int. 2020;2020.","DOI":"10.1155\/2020\/3041498"},{"issue":"August","key":"2195_CR17","doi-asserted-by":"publisher","first-page":"100834","DOI":"10.1016\/j.dcn.2020.100834","volume":"45","author":"JJ Zondergeld","year":"2020","unstructured":"Zondergeld JJ, Scholten RHH, Vreede BMI, Hessels RS, Pijl AG, Buizer-Voskamp JE, et al. FAIR, safe and high-quality data: The data infrastructure and accessibility of the YOUth cohort study. Dev Cogn Neurosci. 2020;45(August):100834.","journal-title":"Dev Cogn Neurosci"},{"key":"2195_CR18","unstructured":"U.S. Food and Drug Administration FDA. Real-World Evidence. 2022. https:\/\/www.fda.gov\/science-research\/science-and-research-special-topics\/real-world-evidence. Accessed 02 Aug 2022."},{"key":"2195_CR19","unstructured":"imi innovative medicines initiative. EMIF European Medical Information Framework. 2018. https:\/\/www.imi.europa.eu\/projects-results\/project-factsheets\/emif. Accessed 02 Aug 2022."},{"key":"2195_CR20","first-page":"188","volume":"2018","author":"A Trifan","year":"2018","unstructured":"Trifan A, Oliveira JL. A FAIR Marketplace for Biomedical Data Custodians and Clinical Researchers. Proc - IEEE Symp Comput Med Syst. 2018;2018:188\u201393.","journal-title":"Proc - IEEE Symp Comput Med Syst."},{"key":"2195_CR21","unstructured":"World Wide Web Consortium W3C. JSON-LD 1.1 A JSON-based Serialization for Linked Data. 2020. https:\/\/www.w3.org\/TR\/json-ld\/. Accessed 02 Aug 2022."},{"key":"2195_CR22","unstructured":"Internet Engineering Task Force (IETF). The JavaScript Object Notation (JSON) Data Interchange Format. 2017. https:\/\/www.rfc-editor.org\/rfc\/rfc8259. Accessed 02 Aug 2022."},{"key":"2195_CR23","doi-asserted-by":"crossref","unstructured":"Duerst M, Suignard M. Internationalized Resource Identifiers (IRIs). 2005. https:\/\/www.rfc-editor.org\/rfc\/rfc3987.html. Accessed 02 Aug 2022.","DOI":"10.17487\/rfc3987"},{"key":"2195_CR24","unstructured":"Schema.org. Welcome to Schema.org. 2021. https:\/\/schema.org\/. Accessed 02 Aug 2022."},{"key":"2195_CR25","unstructured":"Apache CouchDB. CouchDB relax. Seamless multi-master sync, that scales from Big Data to Mobile, with an Intuitive HTTP\/JSON API and designed for Reliability. 2021. https:\/\/couchdb.apache.org\/. Accessed 02 Aug 2022."},{"key":"2195_CR26","unstructured":"Automatic Mode Labs. ActiveWorkflow. Turn complex requirements to workflows without leaving the comfort of your technology stack. 2021. https:\/\/www.activeworkflow.org\/. Accessed 02 Aug 2022."},{"key":"2195_CR27","unstructured":"docker. Developers Love Docker. Businesses Trust It. Build safer, share wider, run faster: New updates to our product subscriptions. 2021. https:\/\/www.docker.com\/. Accessed 02 Aug 2022."},{"key":"2195_CR28","unstructured":"Celery.org. Celery - Distributed Task Queue. 2021. https:\/\/docs.celeryproject.org\/en\/stable\/. Accessed 02 Aug 2022."},{"key":"2195_CR29","unstructured":"Schmitt O, Siemon A, Schwardmann U, Hellkamp M. GWDG Object Storage and Search Solution for Research \u2013 Common Data Storage Architecture (CDSTAR). Gesellschaft f\u00fcr wissenschaftliche Datenverarbeitung mbH G\u00f6ttingen (GWDG), editor. G\u00f6ttingen: Gesellschaft f\u00fcr wissenschaftliche Datenverarbeitung mbH G\u00f6ttingen. 2014."},{"key":"2195_CR30","unstructured":"Armbrust M, Ghodsi A, Xin R, Zaharia M, Berkeley U. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. 11th Annual Conference on Innovative Data Systems Research (CIDR \u201921). 2021. https:\/\/www.cidrdb.org\/cidr2021\/papers\/cidr2021_paper17.pdf. Accessed 12 May 2023."},{"key":"2195_CR31","unstructured":"Amundsen. Overview Amundsen. 2022. https:\/\/www.amundsen.io\/amundsen\/. Accessed 19 Jan 2023."},{"key":"2195_CR32","unstructured":"EGERIA. Open metadata and governance for enterprises - automatically capturing, managing and exchanging metadata between tools and platforms, no matter the vendor. 2022. https:\/\/egeria-project.org\/. Accessed 19 Jan 2023."},{"key":"2195_CR33","unstructured":"OpenLineage. OpenLineage. An open framework for data lineage collection and analysis. 2022. https:\/\/openlineage.io\/. Accessed 02 Aug 2022."},{"key":"2195_CR34","unstructured":"LF AI & Data Foundation. Marquez. Collect, aggregate, and visualize a data ecosystem\u2019s metadata. 2022. https:\/\/marquezproject.github.io\/marquez\/. Accessed 02 Aug 2022."},{"key":"2195_CR35","unstructured":"Wittenburg P. Common Patterns in Revolutionary Infrastructures and Data. 2018. p. 1\u201313. Available from: https:\/\/b2share.eudat.eu\/records\/4e8ac36c0dd343da81fd9e83e72805a0"},{"issue":"1\u20132","key":"2195_CR36","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1162\/dint_a_00050","volume":"2","author":"H van Vlijmen","year":"2020","unstructured":"van Vlijmen H, Mons A, Waalkens A, Franke W, Baak A, Ruiter G, et al. The need of industry to go fair. Data Intell. 2020;2(1\u20132):276\u201384.","journal-title":"Data Intell"},{"key":"2195_CR37","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/j.ijmedinf.2016.07.009","volume":"94","author":"MJ Denney","year":"2016","unstructured":"Denney MJ, Long DM, Armistead MG, Anderson JL, Conway BN. Validating the extract, transform, load process used to populate a large clinical research database. Int J Med Inform. 2016;94:271\u20134.","journal-title":"Int J Med Inform"},{"issue":"7","key":"2195_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2196\/15918","volume":"8","author":"H Spengler","year":"2020","unstructured":"Spengler H, Lang C, Mahapatra T, Gatz I, Kuhn KA, Prasser F. Enabling agile clinical and translational data warehousing: Platform development and evaluation. JMIR Med Informatics. 2020;8(7):1\u201318.","journal-title":"JMIR Med Informatics."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02195-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-023-02195-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02195-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T14:42:22Z","timestamp":1684161742000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-023-02195-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,15]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["2195"],"URL":"https:\/\/doi.org\/10.1186\/s12911-023-02195-3","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,15]]},"assertion":[{"value":"10 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 May 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"94"}}