{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T17:38:37Z","timestamp":1740159517647,"version":"3.37.3"},"reference-count":17,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T00:00:00Z","timestamp":1686700800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T00:00:00Z","timestamp":1686700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100009534","name":"Universit\u00e4t Stuttgart","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100009534","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>As a\u00a0result of the paradigm shift away from rather rigid data warehouses to general-purpose data lakes, fully flexible self-service analytics is made possible. However, this also increases the complexity for domain experts who perform these analyses, since comprehensive data preparation tasks have to be implemented for each data access. For this reason, we developed BARENTS, a\u00a0toolset that enables domain experts to specify data preparation tasks as ontology rules, which are then applied to the data involved. Although our evaluation of BARENTS showed that it is a\u00a0valuable contribution to self-service analytics, a\u00a0major drawback is that domain experts do not receive any semantic support when specifying the rules. In this paper, we therefore address how a\u00a0recommender approach can provide additional support to domain experts by identifying supplementary datasets that might be relevant for their analyses or additional data processing steps to improve data refinement. This recommender operates on the set of data preparation rules specified in BARENTS\u2014i.e., the accumulated knowledge of all domain experts is factored into the data preparation for each new analysis. Evaluation results indicate that such a\u00a0recommender approach further contributes to the practicality of BARENTS and thus represents a\u00a0step towards effective and efficient self-service analytics in data lakes.<\/jats:p>","DOI":"10.1007\/s13222-023-00443-4","type":"journal-article","created":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T12:01:36Z","timestamp":1686744096000},"page":"123-132","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A\u00a0Recommender Approach to Enable Effective and Efficient Self-Service Analytics in Data Lakes"],"prefix":"10.1007","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3795-7909","authenticated-orcid":false,"given":"Christoph","family":"Stach","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rebecca","family":"Eichler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Simone","family":"Schmidt","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,6,14]]},"reference":[{"issue":"2","key":"443_CR1","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1145\/2229156.2229157","volume":"3","author":"W van der Aalst","year":"2012","unstructured":"van der Aalst W (2012) Process mining: overview and opportunities. ACM Trans Manage Inf Syst 3(2):7","journal-title":"ACM Trans Manage Inf Syst"},{"issue":"3","key":"443_CR2","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1145\/3388870","volume":"38","author":"A Alserafi","year":"2020","unstructured":"Alserafi A, Abell\u00f3 A, Romero O et al (2020) Keeping the data lake in form: proximity mining for pre-filtering schema matching. ACM Trans Inf Syst 38(3):26","journal-title":"ACM Trans Inf Syst"},{"key":"443_CR3","first-page":"61","volume-title":"BIS\u201920","author":"M Behringer","year":"2020","unstructured":"Behringer M, Hirmer P, Fritz M et al (2020) Empowering domain experts to preprocess massive distributed datasets. In: BIS\u201920, pp 61\u201375"},{"key":"443_CR4","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1007\/978-3-030-67024-5_14","volume-title":"Metalearning: applications to automated machine learning and data mining","author":"P Brazdil","year":"2022","unstructured":"Brazdil P, van Rijn JN, Soares C et al (2022) Automating data science. In: Metalearning: applications to automated machine learning and data mining. Springer, Cham, pp 269\u2013282"},{"key":"443_CR5","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1007\/s10796-020-10010-x","volume":"23","author":"C Diamantini","year":"2021","unstructured":"Diamantini C, Lo Giudice P, Potena D et al (2021) An approach to extracting topic-guided views from the sources of a\u00a0data lake. Inform Syst Front 23:243\u2013262","journal-title":"Inform Syst Front"},{"key":"443_CR6","first-page":"73","volume-title":"DaWaK\u201920","author":"R Eichler","year":"2020","unstructured":"Eichler R, Giebler C, Gr\u00f6ger C et al (2020) HANDLE - A generic metadata model for data lakes. In: DaWaK\u201920, pp 73\u201388"},{"key":"443_CR7","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1038\/s41597-022-01347-w","volume":"9","author":"N Gao","year":"2022","unstructured":"Gao N, Marschall M, Burry J et al (2022) Understanding occupants\u2019 behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables. Sci Data 9:261","journal-title":"Sci Data"},{"key":"443_CR8","first-page":"57","volume-title":"EDOC\u201920","author":"C Giebler","year":"2020","unstructured":"Giebler C, Gr\u00f6ger C, Hoos E et al (2020) A zone reference model for enterprise-grade data lake management. In: EDOC\u201920, pp 57\u201366"},{"key":"443_CR9","first-page":"795","volume-title":"SIGMOD\u201916","author":"A Halevy","year":"2016","unstructured":"Halevy A, Korn F, Noy NF et al (2016) Goods: organizing Google\u2019s datasets. In: SIGMOD\u201916, pp 795\u2013806"},{"key":"443_CR10","first-page":"1082","volume-title":"MIPRO\u201922","author":"T Hlupi\u0107","year":"2022","unstructured":"Hlupi\u0107 T, Ore\u0161\u010danin D, Ru\u017eak D et al (2022) An overview of current data lake architecture models. In: MIPRO\u201922, pp 1082\u20131087"},{"key":"443_CR11","volume-title":"Data lake architecture: designing the data lake and avoiding the garbage dump","author":"B Inmon","year":"2016","unstructured":"Inmon B (2016) Data lake architecture: designing the data lake and avoiding the garbage dump. Technics Publications, Basking Ridge"},{"key":"443_CR12","volume-title":"Building the data warehouse","author":"WH Inmon","year":"2005","unstructured":"Inmon WH (2005) Building the data warehouse. John Wiley & Sons, Indianapolis"},{"key":"443_CR13","first-page":"553","volume-title":"SOFSEM\u201921","author":"I Megdiche","year":"2021","unstructured":"Megdiche I, Ravat F, Zhao Y (2021) Metadata management on data processing in data lakes. In: SOFSEM\u201921, pp 553\u2013562"},{"key":"443_CR14","first-page":"46","volume-title":"ECIS\u201920","author":"S Michalczyk","year":"2020","unstructured":"Michalczyk S, Nadj M, Azarfar D et al (2020) A state-of-the-Art overview and future research avenues of self-service business intelligence and analytics. In: ECIS\u201920, p 46"},{"key":"443_CR15","volume-title":"Architecting data lakes","author":"B Sharma","year":"2018","unstructured":"Sharma B (2018) Architecting data lakes. O\u2019Reilly Media, Sebastopol"},{"issue":"2","key":"443_CR16","doi-asserted-by":"publisher","first-page":"71","DOI":"10.3390\/fi15020071","volume":"15","author":"C Stach","year":"2023","unstructured":"Stach C (2023) Data is the new oil\u2013sort of: a view on why this comparison is misleading and its implications for modern data administration. Future Internet 15(2):71","journal-title":"Future Internet"},{"key":"443_CR17","first-page":"187","volume-title":"iiWAS\u201921","author":"C Stach","year":"2021","unstructured":"Stach C, Br\u00e4cker J, Eichler R et al (2021) Demand-driven data Provisioning in data lakes: BARENTS \u2014 A tailorable data preparation zone. In: iiWAS\u201921, pp 187\u2013198"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-023-00443-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13222-023-00443-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-023-00443-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,1]],"date-time":"2023-08-01T07:39:02Z","timestamp":1690875542000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13222-023-00443-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,14]]},"references-count":17,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["443"],"URL":"https:\/\/doi.org\/10.1007\/s13222-023-00443-4","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"type":"print","value":"1618-2162"},{"type":"electronic","value":"1610-1995"}],"subject":[],"published":{"date-parts":[[2023,6,14]]},"assertion":[{"value":"1 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 June 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}