{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T02:46:34Z","timestamp":1776393994853,"version":"3.51.2"},"reference-count":31,"publisher":"China Science Publishing & Media Ltd.","issue":"4","license":[{"start":{"date-parts":[[2021,5,12]],"date-time":"2021-05-12T00:00:00Z","timestamp":1620777600000},"content-version":"vor","delay-in-days":131,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,25]]},"abstract":"<jats:p>In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for data sets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schema's compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.<\/jats:p>","DOI":"10.1162\/dint_a_00100","type":"journal-article","created":{"date-parts":[[2021,5,12]],"date-time":"2021-05-12T16:13:16Z","timestamp":1620835996000},"page":"528-547","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":10,"title":["DAMS: A Distributed Analytics Metadata Schema"],"prefix":"10.3724","volume":"3","author":[{"given":"Sascha","family":"Welten","sequence":"first","affiliation":[{"name":"Chair Informatik 5, RWTH Aachen University, 52056 Aachen, Germany"}]},{"given":"Laurenz","family":"Neumann","sequence":"additional","affiliation":[{"name":"Chair Informatik 5, RWTH Aachen University, 52056 Aachen, Germany"}]},{"given":"Yeliz Ucer","family":"Yediel","sequence":"additional","affiliation":[{"name":"Fraunhofer Institute for Applied Information Techniques (FIT), 53757 Sankt Augustin, Germany"}]},{"given":"Luiz Olavo Bonino","family":"da Silva Santos","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500AE Enschede, The Netherlands"},{"name":"Department of Human Genetics, Leiden University Medical Centre, Leiden 2333 ZA, The Netherlands"}]},{"given":"Stefan","family":"Decker","sequence":"additional","affiliation":[{"name":"Chair Informatik 5, RWTH Aachen University, 52056 Aachen, Germany"},{"name":"Fraunhofer Institute for Applied Information Techniques (FIT), 53757 Sankt Augustin, Germany"}]},{"given":"Oya","family":"Beyan","sequence":"additional","affiliation":[{"name":"Fraunhofer Institute for Applied Information Techniques (FIT), 53757 Sankt Augustin, Germany"},{"name":"Institute of Medical Information, Faculty of Medicine & University Hospital Cologne, University of Cologne, 50674 Cologne, Germany"}]}],"member":"2026","published-online":{"date-parts":[[2021,10,25]]},"reference":[{"issue":"13","key":"2021102517001126400_ref1","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1001\/jama.2013.393","article-title":"The inevitable application of big data to health\n                        care","volume":"309","author":"Murdoch","year":"2013","journal-title":"JAMA"},{"key":"2021102517001126400_ref2","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.ijmedinf.2018.03.013","article-title":"Concurrence of big data analytics and healthcare: A\n                        systematic review","volume":"114","author":"Mehta","year":"2018","journal-title":"International Journal of Medical\n                        Informatics"},{"issue":"1\u20132","key":"2021102517001126400_ref3","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1162\/dint_a_00032","article-title":"Distributed analytics on sensitive medical data: The personal\n                        health train","volume":"2","author":"Beyan","year":"2020","journal-title":"Data Intelligence"},{"key":"2021102517001126400_ref4","volume-title":"General Data Protection Regulation\n                        (GDPR)\u2014Official Legal Text","author":"GDPR","year":"2021"},{"issue":"3","key":"2021102517001126400_ref5","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1377\/hlthaff.16.3.146","article-title":"The politics of the Health Insurance Portability and\n                        Accountability Act","volume":"16","author":"Atchinson","year":"1997","journal-title":"Health Affairs (Project\n                        Hope)"},{"key":"2021102517001126400_ref6","volume-title":"Data protection","author":"DPA","year":"2021"},{"key":"2021102517001126400_ref7","first-page":"373","article-title":"A privacy-preserving infrastructure for analyzing personal\n                        health data in a vertically partitioned scenario","volume":"264","author":"Sun","year":"2019","journal-title":"MedInfo"},{"key":"2021102517001126400_ref8","doi-asserted-by":"crossref","DOI":"10.1038\/s41597-019-0241-0","article-title":"Distributed radiomics as a signature validation study using\n                        the Personal Health Train infrastructure","volume":"6","author":"Shi","year":"2019","journal-title":"Scientific\n                        Data"},{"key":"2021102517001126400_ref9","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.radonc.2019.11.019","article-title":"Distributed learning on 20000+ lung cancer\n                        patients\u2014The Personal Health Train","volume":"144","author":"Deist","year":"2020","journal-title":"Radiotherapy and Oncology"},{"issue":"2","key":"2021102517001126400_ref10","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1016\/j.ijrobp.2017.04.021","article-title":"Developing and validating a survival prediction model for\n                        nsclc patients through distributed learning across 3\n                        countries","volume":"99","author":"Jochems","year":"2017","journal-title":"International Journal of Radiation\n                        Oncology, Biology, Physics"},{"issue":"3","key":"2021102517001126400_ref11","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1016\/j.radonc.2016.10.002","article-title":"Distributed learning: Developing a predictive model based on\n                        data from multiple hospitals without data leaving the hospital\u2014A real\n                        life proof of concept","volume":"121","author":"Jochems","year":"2016","journal-title":"Radiotherapy and\n                        Oncology"},{"issue":"8","key":"2021102517001126400_ref12","first-page":"945","article-title":"Distributed deep learning networks among institutions for\n                        medical imaging","volume":"25","author":"Chang","year":"2018","journal-title":"JAMIA"},{"key":"2021102517001126400_ref13","first-page":"969","article-title":"Collaborative filtering as a case-study for model parallelism\n                        on bulk synchronous systems","volume-title":"Conference on\n                        Information and Knowledge Management (CIKM)","author":"Das","year":"2017"},{"key":"2021102517001126400_ref14","volume-title":"Communication-efficient learning of deep networks from\n                        decentralized data","author":"McMahan","year":"2017"},{"key":"2021102517001126400_ref15","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1007\/978-3-030-11723-8_9","article-title":"Multi-institutional deep learning modeling without sharing\n                        patient data: A feasibility study on brain tumor\n                        segmentation","volume-title":"Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic\n                        Brain Injuries","author":"Sheller","year":"2019"},{"key":"2021102517001126400_ref16","volume-title":"Experiments on parallel training of deep neural network using model\n                        averaging","author":"Su","year":"2015"},{"key":"2021102517001126400_ref17","first-page":"1463","article-title":"Communication-efficient distributed deep metric learning with\n                        hybrid synchronization","volume-title":"International Conference\n                        on Information and Knowledge Management (CIKM)","author":"Su","year":"2018"},{"key":"2021102517001126400_ref18","doi-asserted-by":"crossref","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management\n                        and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Scientific Data"},{"issue":"1\u20132","key":"2021102517001126400_ref19","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1162\/dint_a_00031","article-title":"Making fair easy with fair tools: From creolization to\n                        convergence","volume":"2","author":"Thompson","year":"2020","journal-title":"Data Intelligence"},{"key":"2021102517001126400_ref20","volume-title":"FAIR principles by GO-FAIR","year":"2021"},{"key":"2021102517001126400_ref21","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.cageo.2019.07.005","article-title":"The bonares metadata schema for geospatial soil-agricultural\n                        research data\u2014merging inspire and datacite metadata\n                        schemes","volume":"132","author":"Specka","year":"2019","journal-title":"Computers & Geosciences"},{"key":"2021102517001126400_ref22","first-page":"3428","article-title":"Making metadata fit for next generation language technology\n                        platforms: The metadata schema of the european language\n                    grid","volume-title":"Language Resources and Evaluation\n                        Conference","author":"Labropoulou","year":"2020"},{"key":"2021102517001126400_ref23","doi-asserted-by":"crossref","DOI":"10.1038\/s41597-020-00771-0","article-title":"Plasma-MDS, a metadata schema for plasma science with\n                        examples from plasma technology","volume":"7","author":"Franke","year":"2020","journal-title":"Scientific\n                        Data"},{"issue":"1\u20132","key":"2021102517001126400_ref24","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1162\/dint_a_00028","article-title":"A generic workflow for the data fairification\n                        process","volume":"2","author":"Jacobsen","year":"2020","journal-title":"Data Intelligence"},{"issue":"1","key":"2021102517001126400_ref25","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1109\/5254.747904","article-title":"Building a chemical ontology using methontology and the\n                        ontology design environment","volume":"14","author":"Lopez","year":"1999","journal-title":"IEEE Intelligent\n                        Systems and Their Applications"},{"key":"2021102517001126400_ref26","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.websem.2015.01.001","article-title":"The data mining optimization ontology","volume":"32","author":"Keet","year":"2015","journal-title":"Journal of Web Semantics"},{"key":"2021102517001126400_ref27","first-page":"33","article-title":"Methontology: From ontological art towards ontological\n                        engineering","volume-title":"AAAI Conference on Artificial\n                        Intelligence","author":"Fern\u00e1ndez-L\u00f3pez","year":"1997"},{"key":"2021102517001126400_ref28","first-page":"205","article-title":"The use and effectiveness of user stories in\n                        practice","volume-title":"International Working Conference on\n                        Requirements Engineering: Foundation for Software Quality","author":"Lucassen","year":"2016"},{"key":"2021102517001126400_ref29","volume-title":"User stories applied: For agile software development","author":"Cohn","year":"2004"},{"key":"2021102517001126400_ref30","first-page":"500","article-title":"An ontology based personalized privacy\n                        preservation","volume-title":"International Joint Conference on\n                        Knowledge Discovery, Knowledge Engineering and Knowledge\n                    Management","author":"Can","year":"2019"},{"key":"2021102517001126400_ref31","volume-title":"Data Catalog Vocabulary (DCAT)","author":"Maali","year":"2021"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/4\/528\/1968588\/dint_a_00100.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/4\/528\/1968588\/dint_a_00100.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:41:31Z","timestamp":1741938091000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00100"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,10,25]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00100","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}