{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T19:27:04Z","timestamp":1763580424883,"version":"3.44.0"},"reference-count":16,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>While existing data management solutions try to keep up with novel data formats and features, a myriad of valuable functionality is often only accessible via programming language libraries. Particularly for machine learning tasks, there is a wealth of pre-trained models and easy-to-use libraries that allow a wide audience to harness state-of-the-art machine learning. We propose the demonstration of a highly modularized data processor for semi-structured data that can be extended by means of plain Python scripts. Next to commonly supported user-defined functions, the deep decomposition allows augmenting the core engine with additional index structures, customized import and export routines, and custom aggregation functions. For several use cases, we detail how user-defined modules can be quickly realized and invite the audience to write and apply custom code, to tailor provided code snippets that we bring along to own preferences to solve data analytics tasks involving sentiment analysis of Twitter tweets.<\/jats:p>","DOI":"10.14778\/3611540.3611610","type":"journal-article","created":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T11:32:37Z","timestamp":1694777557000},"page":"4018-4021","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["To UDFs and Beyond: Demonstration of a Fully Decomposed Data Processor for General Data Wrangling Tasks"],"prefix":"10.14778","volume":"16","author":[{"given":"Nico","family":"Sch\u00e4fer","sequence":"first","affiliation":[{"name":"RPTU Kaiserslautern-Landau"}]},{"given":"Damjan","family":"Gjurovski","sequence":"additional","affiliation":[{"name":"RPTU Kaiserslautern-Landau"}]},{"given":"Angjela","family":"Davitkova","sequence":"additional","affiliation":[{"name":"RPTU Kaiserslautern-Landau"}]},{"given":"Sebastian","family":"Michel","sequence":"additional","affiliation":[{"name":"RPTU Kaiserslautern-Landau"}]}],"member":"320","published-online":{"date-parts":[[2023,8]]},"reference":[{"unstructured":"M. Boehm I. Antonov S. Baunsgaard M. Dokter R. Ginth\u00f6r K. Innerebner F. Klezin S. N. Lindstaedt A. Phani B. Rath B. Reinwald S. Siddiqui and S. B. Wrede. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR. www.cidrdb.org.","key":"e_1_2_1_1_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.14778\/2824032.2824045"},{"volume-title":"Compiling PL\/SQL Away. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org.","author":"Duta C.","unstructured":"C. Duta, D. Hirn, and T. Grust. 2020. Compiling PL\/SQL Away. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org.","key":"e_1_2_1_3_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.14778\/1687553.1687567"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.14778\/3457390.3457402"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.14778\/2367502.2367510"},{"doi-asserted-by":"crossref","unstructured":"D. Hirn and T. Grust. 2021. One WITH RECURSIVE is Worth Many GOTOs. In SIGMOD. ACM 723--735.","key":"e_1_2_1_7_1","DOI":"10.1145\/3448016.3457272"},{"doi-asserted-by":"crossref","unstructured":"A. Joulin E. Grave P. Bojanowski and T. Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016).","key":"e_1_2_1_8_1","DOI":"10.18653\/v1\/E17-2068"},{"unstructured":"L. Passing M. Then N. C. Hubig H. Lang M. Schreier S. G\u00fcnnemann A. Kemper and T. Neumann. 2017. SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases. In EDBT. OpenProceedings.org 84--95.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","volume-title":"JODA: A Vertically Scalable, Lightweight JSON Processor for Big Data Transformations","author":"Sch\u00e4fer N.","year":"2020","unstructured":"N. Sch\u00e4fer and S. Michel. 2020. JODA: A Vertically Scalable, Lightweight JSON Processor for Big Data Transformations. In ICDE. IEEE, 1726--1729."},{"key":"e_1_2_1_11_1","volume-title":"BETZE: Benchmarking Data Exploration Tools with (Almost) Zero Effort","author":"Sch\u00e4fer N.","year":"2022","unstructured":"N. Sch\u00e4fer and S. Michel. 2022. BETZE: Benchmarking Data Exploration Tools with (Almost) Zero Effort. In ICDE. IEEE, 2385--2398."},{"key":"e_1_2_1_12_1","first-page":"1","article-title":"Freedom for the SQL-Lambda","volume":"6","author":"Sch\u00fcle M. E.","year":"2020","unstructured":"M. E. Sch\u00fcle, J. Huber, A. Kemper, and T. Neumann. 2020. Freedom for the SQL-Lambda: Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM, 6:1--6:12.","journal-title":"Just-in-Time-Compiling User-Injected Functions in PostgreSQL. In SSDBM. ACM"},{"unstructured":"M. E. Sch\u00fcle L. Scalerandi A. Kemper and T. Neumann. 2023. Blue Elephants Inspecting Pandas: Inspection and Execution of Machine Learning Pipelines in SQL. In EDBT. OpenProceedings.org 40--52.","key":"e_1_2_1_13_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.14778\/3510397.3510408"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/3448016.3457244"},{"key":"e_1_2_1_16_1","volume-title":"Transformers: State-of-the-Art Natural Language Processing","author":"Wolf T.","year":"2020","unstructured":"T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. Association for Computational Linguistics, 38--45."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611540.3611610","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T22:34:46Z","timestamp":1757543686000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611540.3611610"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8]]},"references-count":16,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.14778\/3611540.3611610"],"URL":"https:\/\/doi.org\/10.14778\/3611540.3611610","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2023,8]]},"assertion":[{"value":"2023-08-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}