{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T17:38:37Z","timestamp":1740159517664,"version":"3.37.3"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T00:00:00Z","timestamp":1687737600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T00:00:00Z","timestamp":1687737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Karlsruher Institut f\u00fcr Technologie (KIT)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Data catalogs represent a\u00a0promising solution for semantically classifying and organizing data sources and enriching raw data with metadata. However, recent research has shown that data catalogs are difficult to implement due to the complexity of the data landscape or issues with data governance. Moreover, data catalogs struggle to enable business analysts to find the data they need for their use cases. Against this backdrop, we develop a\u00a0self-service system that automatically extracts metadata from a\u00a0data lake and enables business analysts to explore the metadata through an easy-to-use interface. Specifically, instead of implementing the data catalog top-down, our system derives metadata from user queries bottom-up. Hereby, we conduct 15 interviews with business analysts to derive the underlying requirements of the system and evaluate its features with a\u00a0focus group. Our findings illustrate that participants especially value the possibility to reuse queries from other users and appreciated the support in query validation as data preparation is a\u00a0complex and time-consuming endeavour.<\/jats:p>","DOI":"10.1007\/s13222-023-00448-z","type":"journal-article","created":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T20:18:56Z","timestamp":1687810736000},"page":"97-105","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Metadata Extraction from User Queries for Self-Service Data Lake Exploration"],"prefix":"10.1007","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2608-2679","authenticated-orcid":false,"given":"Jonas","family":"Gunklach","sequence":"first","affiliation":[]},{"given":"Sven","family":"Michalczyk","sequence":"additional","affiliation":[]},{"given":"Mario","family":"Nadj","sequence":"additional","affiliation":[]},{"given":"Alexander","family":"Maedche","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,6,26]]},"reference":[{"key":"448_CR1","unstructured":"(2023) JSqlParser (4.5 Stable or 4.6 Snapshot). JSQLParser"},{"issue":"1","key":"448_CR2","doi-asserted-by":"publisher","first-page":"107","DOI":"10.2307\/3250961","volume":"25","author":"M Alavi","year":"2001","unstructured":"Alavi M, Leidner DE (2001) Review: knowledge management and knowledge management systems: conceptual foundations and research issues. MISQ 25(1):107. https:\/\/doi.org\/10.2307\/3250961 (https:\/\/arxiv.org\/abs\/3250961)","journal-title":"MISQ"},{"issue":"2","key":"448_CR3","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s12599-016-0424-6","volume":"58","author":"P Alpar","year":"2016","unstructured":"Alpar P, Schulz M (2016) Self-service business intelligence. Bus Inf Syst Eng 58(2):151\u2013155. https:\/\/doi.org\/10.1007\/s12599-016-0424-6","journal-title":"Bus Inf Syst Eng"},{"key":"448_CR4","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/j.is.2018.08.004","volume":"86","author":"K Drushku","year":"2019","unstructured":"Drushku K, Aligon J, Labroche N et al (2019) Interest-based recommendations for business intelligence users. Inf Syst 86:79\u201393. https:\/\/doi.org\/10.1016\/j.is.2018.08.004","journal-title":"Inf Syst"},{"key":"448_CR5","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1007\/978-3-030-87101-7_15","volume-title":"Database and expert systems applications \u2013 DEXA 2021 workshops","author":"L Ehrlinger","year":"2021","unstructured":"Ehrlinger L, Schrott J, Melichar M et al (2021) Data catalogs: a\u00a0systematic literature review and guidelines to implementation. In: Database and expert systems applications \u2013 DEXA 2021 workshops, S 148\u2013158 https:\/\/doi.org\/10.1007\/978-3-030-87101-7_15"},{"key":"448_CR6","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1007\/978-3-030-59065-9_7","volume-title":"Big data Analytics and knowledge discovery","author":"R Eichler","year":"2020","unstructured":"Eichler R, Giebler C, Gr\u00f6ger C et al (2020) HANDLE \u2013 A\u00a0generic metadata model for data lakes. In: Big data Analytics and knowledge discovery, S 73\u201388 https:\/\/doi.org\/10.1007\/978-3-030-59065-9_7"},{"issue":"1","key":"448_CR7","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1007\/s13222-018-0273-1","volume":"18","author":"C Gr\u00f6ger","year":"2018","unstructured":"Gr\u00f6ger C (2018) Building an industry 4.0 Analytics platform: practical challenges, approaches and future research directions. Datenbank Spektrum 18(1):5\u201314. https:\/\/doi.org\/10.1007\/s13222-018-0273-1","journal-title":"Datenbank Spektrum"},{"key":"448_CR8","doi-asserted-by":"publisher","DOI":"10.18420\/BTW2019-26","author":"C Gr\u00f6ger","year":"2019","unstructured":"Gr\u00f6ger C, Hoos E (2019) Ganzheitliches metadatenmanagement im data lake: Anforderungen, IT-werkzeuge und herausforderungen in der praxis. BTW. https:\/\/doi.org\/10.18420\/BTW2019-26","journal-title":"BTW"},{"issue":"1","key":"448_CR9","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1287\/mnsc.1070.0748","volume":"54","author":"S Haefliger","year":"2008","unstructured":"Haefliger S, Von Krogh G, Spaeth S (2008) Code reuse in open source software. Manage Sci 54(1):180\u2013193","journal-title":"Manage Sci"},{"issue":"3","key":"448_CR10","first-page":"5","volume":"39","author":"AY Halevy","year":"2016","unstructured":"Halevy AY, Korn F, Noy NF et al (2016) Managing Google\u2019s data lake: An overview of the Goods system. IEEE Data Eng Bull 39(3):5\u201314","journal-title":"IEEE Data Eng Bull"},{"key":"448_CR11","volume-title":"DAMA-DMBOK: data management body of knowledge","author":"D International","year":"2017","unstructured":"International D (2017) DAMA-DMBOK: data management body of knowledge. Technics Publications, LLC"},{"key":"448_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22351-8_47","author":"N Khoussainova","year":"2011","unstructured":"Khoussainova N, Kwon Y, Liao WT et al (2011) Session-based browsing for more effective query reuse. Sci Stat Database Manag. https:\/\/doi.org\/10.1007\/978-3-642-22351-8_47","journal-title":"Sci Stat Database Manag"},{"key":"448_CR13","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1109\/CBI49978.2020.00029","volume-title":"2020 IEEE 22nd Conference on Business Informatics (CBI)","author":"C Labadie","year":"2020","unstructured":"Labadie C, Legner C, Eurich M et al (2020) FAIR enough? Enhancing the usage of enterprise data with data catalogs. In: 2020 IEEE 22nd Conference on Business Informatics (CBI), S 201\u2013210 https:\/\/doi.org\/10.1109\/CBI49978.2020.00029"},{"issue":"3","key":"448_CR14","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1007\/s13222-017-0272-7","volume":"17","author":"C Mathis","year":"2017","unstructured":"Mathis C (2017) Data lakes. Datenbank Spektrum 17(3):289\u2013293. https:\/\/doi.org\/10.1007\/s13222-017-0272-7","journal-title":"Datenbank Spektrum"},{"key":"448_CR15","volume-title":"A\u00a0state-of-the-Art overview and future research avenues of self-service business intelligence & analytics","author":"S Michalczyk","year":"2020","unstructured":"Michalczyk S, Nadj M, Azarfar D et al (2020) A\u00a0state-of-the-Art overview and future research avenues of self-service business intelligence & analytics"},{"key":"448_CR16","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1109\/ENABL.2003.1231428","volume-title":"WET ICE 2003. Proceedings. Twelfth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises","author":"F Paetsch","year":"2003","unstructured":"Paetsch F, Eberlein A, Maurer F (2003) Requirements engineering and agile software development. In: WET ICE 2003. Proceedings. Twelfth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, S 308\u2013313 https:\/\/doi.org\/10.1109\/ENABL.2003.1231428"},{"key":"448_CR17","volume-title":"Wirtschaftsinformatik und angewandte Informatik","author":"J Passlick","year":"2017","unstructured":"Passlick J, Lebek B, Breitner MH (2017) A\u00a0self-service supporting business intelligence and big data analytics architecture. In: Wirtschaftsinformatik und angewandte Informatik"},{"key":"448_CR18","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1007\/978-3-030-52829-4_8","volume-title":"Towards Interoperable research infrastructures for environmental and earth sciences","author":"E Quimbert","year":"2020","unstructured":"Quimbert E, Jeffery K, Martens C et al (2020) Data cataloguing. In: Towards Interoperable research infrastructures for environmental and earth sciences. Springer, Berlin Heidelberg, S 140\u2013161"},{"key":"448_CR19","doi-asserted-by":"publisher","DOI":"10.7250\/csimq.2016-9.04","author":"C Quix","year":"2016","unstructured":"Quix C, Hai R, Vatov I (2016) Metadata extraction and management in data lakes with GEMMS. CSIMQ. https:\/\/doi.org\/10.7250\/csimq.2016-9.04","journal-title":"CSIMQ"},{"key":"448_CR20","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1007\/978-3-030-27615-7_23","volume-title":"Database and expert systems applications","author":"F Ravat","year":"2019","unstructured":"Ravat F, Zhao Y (2019) Data lakes: trends and perspectives. In: Database and expert systems applications, S 304\u2013313 https:\/\/doi.org\/10.1007\/978-3-030-27615-7_23"},{"issue":"4","key":"448_CR21","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1109\/32.799955","volume":"25","author":"C Seaman","year":"1999","unstructured":"Seaman C (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557\u2013572. https:\/\/doi.org\/10.1109\/32.799955","journal-title":"IEEE Trans Softw Eng"},{"key":"448_CR22","first-page":"1","volume-title":"Proceedings of the first international workshop on ontology-supported business intelligence","author":"M Spahn","year":"2008","unstructured":"Spahn M, Kleb J, Grimm S et al (2008) Supporting business intelligence by providing ontology-based end-user information self-service. In: Proceedings of the first international workshop on ontology-supported business intelligence, S 1\u201312"},{"key":"448_CR23","volume-title":"Knowledge transfer-based recommendations to enable self-service business intelligence","author":"S Sulaiman","year":"2019","unstructured":"Sulaiman S (2019) Knowledge transfer-based recommendations to enable self-service business intelligence. Shaker, Aachen"},{"key":"448_CR24","volume-title":"Multikonferenz Wirtschaftsinformatik","author":"S Sulaiman","year":"2018","unstructured":"Sulaiman S, G\u00f3mez JM (2018) Recommendation-based business intelligence architecture to empower self service business users. In: Multikonferenz Wirtschaftsinformatik"},{"key":"448_CR25","unstructured":"Zaidi E, De Simoni G, Edjlali R et al (2017) Data catalogs are the new black in data management and analytics. https:\/\/www.gartner.com\/en\/documents\/3837968, last accessed on  2023-01-10"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-023-00448-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13222-023-00448-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-023-00448-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,1]],"date-time":"2023-08-01T07:39:25Z","timestamp":1690875565000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13222-023-00448-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,26]]},"references-count":25,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["448"],"URL":"https:\/\/doi.org\/10.1007\/s13222-023-00448-z","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"type":"print","value":"1618-2162"},{"type":"electronic","value":"1610-1995"}],"subject":[],"published":{"date-parts":[[2023,6,26]]},"assertion":[{"value":"1 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 May 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflicts of interest to declare that are relevant to the article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}