{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T07:19:30Z","timestamp":1769152770467,"version":"3.49.0"},"reference-count":28,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,1,26]],"date-time":"2024-01-26T00:00:00Z","timestamp":1706227200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Volkswagen Stiftung"},{"name":"DZHK and DFG SFB 1002 Modulary Units in Heart Failure"},{"name":"Else Kr\u00f6ner-Fresenius Foundation (EKFS)"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named \u201cCaosDB\u201d).<\/jats:p>","DOI":"10.3390\/data9020024","type":"journal-article","created":{"date-parts":[[2024,1,26]],"date-time":"2024-01-26T08:56:01Z","timestamp":1706259361000},"page":"24","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5549-578X","authenticated-orcid":false,"given":"Henrik","family":"tom W\u00f6rden","sequence":"first","affiliation":[{"name":"Indiscale GmbH, 37083 G\u00f6ttingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6856-2910","authenticated-orcid":false,"given":"Florian","family":"Spreckelsen","sequence":"additional","affiliation":[{"name":"Indiscale GmbH, 37083 G\u00f6ttingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7214-8125","authenticated-orcid":false,"given":"Stefan","family":"Luther","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Dynamics and Self-Organization, 37077 G\u00f6ttingen, Germany"},{"name":"Institute for the Dynamics of Complex Systems, Georg-August-Universit\u00e4t, 37077 G\u00f6ttingen, Germany"},{"name":"German Center for Cardiovascular Research (DZHK), Partner Site G\u00f6ttingen, 37075 G\u00f6ttingen, Germany"},{"name":"Institute of Pharmacology and Toxicology, University Medical Center G\u00f6ttingen, 37075 G\u00f6ttingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3058-1435","authenticated-orcid":false,"given":"Ulrich","family":"Parlitz","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Dynamics and Self-Organization, 37077 G\u00f6ttingen, Germany"},{"name":"Institute for the Dynamics of Complex Systems, Georg-August-Universit\u00e4t, 37077 G\u00f6ttingen, Germany"},{"name":"German Center for Cardiovascular Research (DZHK), Partner Site G\u00f6ttingen, 37075 G\u00f6ttingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4124-9649","authenticated-orcid":false,"given":"Alexander","family":"Schlemmer","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Dynamics and Self-Organization, 37077 G\u00f6ttingen, Germany"},{"name":"German Center for Cardiovascular Research (DZHK), Partner Site G\u00f6ttingen, 37075 G\u00f6ttingen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"ref_2","unstructured":"Deutsche Forschungsgemeinschaft (2023, August 10). Guidelines for Safeguarding Good Research Practice. Code of Conduct. Available online: https:\/\/zenodo.org\/records\/6472827."},{"key":"ref_3","unstructured":"Ferguson, L.M., Bertelmann, R., Bruch, C., Messerschmidt, R., Pampel, H., Schrader, A.C., Schultze-Motel, P., and Weisweiler, N.L. (2022). Good (Digital) Research Practice and Open Science Support and Best Practices for Implementing the DFG Code of Conduct \u201cGuidelines for Safeguarding Good Research Practice\u201d, Helmholtz Open Science Office. Helmholtz Open Science Briefing. Version 2.0."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1145\/1107499.1107503","article-title":"Scientific data management in the coming decade","volume":"34","author":"Gray","year":"2005","journal-title":"Acm Sigmod Rec."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., and Hartig, O. (2017). The Semantic Web, Springer International Publishing.","DOI":"10.1007\/978-3-319-58068-5"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Vaisman, A., and Zim\u00e1nyi, E. (2014). Data Warehouse Systems, Springer.","DOI":"10.1007\/978-3-642-54655-6"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"638","DOI":"10.1093\/bioinformatics\/btv606","article-title":"openBIS ELN-LIMS: An open-source database for academic laboratories","volume":"32","author":"Barillari","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"292","DOI":"10.12688\/f1000research.52157.3","article-title":"eLabFTW as an Open Science tool to improve the quality and translation of preclinical research","volume":"10","author":"Hewera","year":"2021","journal-title":"F1000Research"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Suhr, M., Lehmann, C., Bauer, C.R., Bender, T., Knopp, C., Freckmann, L., \u00d6st Hansen, B., Henke, C., Aschenbrandt, G., and K\u00fchlborn, L.K. (2020). Menoci: Lightweight extensible web portal enhancing data management for biomedical research projects. BMC Bioinform., 21.","DOI":"10.1186\/s12859-020-03928-1"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bauch, A., Adamczyk, I., Buczek, P., Elmer, F.J., Enimanev, K., Glyzewski, P., Kohler, M., Pylak, T., Quandt, A., and Ramakrishnan, C. (2011). openBIS: A flexible framework for managing and analyzing complex data in biology research. BMC Bioinform., 12.","DOI":"10.1186\/1471-2105-12-468"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Dudchenko, A., Ringwald, F., Czernilofsky, F., Dietrich, S., Knaup, P., and Ganzinger, M. (2022). Large-File Raw Data Synchronization for openBIS Research Repositories. Challenges of Trustable AI and Added-Value on Health, IOS Press.","DOI":"10.3233\/SHTI220486"},{"key":"ref_12","unstructured":"McBride, B. (2004). Handbook on Ontologies, Springer."},{"key":"ref_13","unstructured":"(2012). OWL 2 Web Ontology Language Document Overview, World Wide Web Consortium. [2nd ed.]."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1620585.1620589","article-title":"Semantics and Complexity of SPARQL","volume":"34","author":"Arenas","year":"2009","journal-title":"ACM Trans. Database Syst."},{"key":"ref_15","unstructured":"Bizer, C., Heath, T., Ayers, D., and Raimond, Y. (2007, January 3\u20137). Interlinking Open Data on the Web. Proceedings of the 4th European Semantic Web Conference, Innsbruck, Austria."},{"key":"ref_16","unstructured":"Bizer, C., Heath, T., and Berners-Lee, T. (2011). Semantic Services, Interoperability and Web Applications: Emerging Concepts, IGI Global."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"De Smedt, K., Koureas, D., and Wittenburg, P. (2020). FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units. Publications, 8.","DOI":"10.20944\/preprints202003.0073.v1"},{"key":"ref_18","first-page":"75","article-title":"A Survey of Extract\u2013Transform\u2013Load Technology","volume":"5","author":"Vassiliadis","year":"2009","journal-title":"Int. J. Data Warehous. Min. (IJDWM)"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Fitschen, T., Schlemmer, A., Hornung, D., tom W\u00f6rden, H., Parlitz, U., and Luther, S. (2019). CaosDB\u2014Research Data Management for Complex, Changing, and Automated Research Workflows. Data, 4.","DOI":"10.3390\/data4020083"},{"key":"ref_20","unstructured":"Hornung, D., Spreckelsen, F., and Wei\u00df, T. (2024, January 02). Agile Research Data Management with Open Source: CaosDB. Available online: https:\/\/www.inggrid.org\/article\/id\/3866\/."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Spreckelsen, F., R\u00fcchardt, B., Lebert, J., Luther, S., Parlitz, U., and Schlemmer, A. (2020). Guidelines for a Standardized Filesystem Layout for Scientific Data. Data, 5.","DOI":"10.20944\/preprints202004.0035.v1"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"160044","DOI":"10.1038\/sdata.2016.44","article-title":"The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments","volume":"3","author":"Gorgolewski","year":"2016","journal-title":"Sci. Data"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"920","DOI":"10.1007\/s003300101100","article-title":"Introduction to the DICOM standard","volume":"12","author":"Mildenberger","year":"2002","journal-title":"Eur. Radiol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Koranne, S. (2011). Handbook of Open Source Tools, Springer.","DOI":"10.1007\/978-1-4419-7719-9"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Folk, M., Heber, G., Koziol, Q., Pourmal, E., and Robinson, D. (2011, January 25). An overview of the HDF5 technology suite and its applications. Proceedings of the EDBT\/ICDT 2011 Workshop on Array Databases, Uppsala, Sweden.","DOI":"10.1145\/1966895.1966900"},{"key":"ref_26","unstructured":"Schlemmer, A. (2021). Mapping Data Files to Semantic Data Models Using the CaosDB Crawler, Zenodo."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., and Vrgo\u010d, D. (2016, January 11\u201315). Foundations of JSON schema. Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada.","DOI":"10.1145\/2872427.2883029"},{"key":"ref_28","unstructured":"Bray, T. (2023, August 10). Available online: https:\/\/datatracker.ietf.org\/doc\/rfc7159\/."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/9\/2\/24\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:49:54Z","timestamp":1760104194000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/9\/2\/24"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,26]]},"references-count":28,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["data9020024"],"URL":"https:\/\/doi.org\/10.3390\/data9020024","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,26]]}}}