{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:12:21Z","timestamp":1750219941639,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Spark big data processing platform is heavily used in today's IT services for various critical applications such as machine learning tasks for service recommendations or massive volumes of raw sales data analysis. Spark is designed to deliver high performance by enabling a high degree of parallelism while processing various heavy-weight queries that require homogeneous operations on large data. However, it has been observed that workloads made of small and short-running queries coming from various sources are becoming dominant in practice. Unfortunately, the current Spark architecture is unfit to process workloads made of a large number of small queries optimally due to excessive I\/Os with small computations. We present a technique, called QaaD, that addresses this problem fundamentally by applying i) transparent conversion of workloads made of small queries into one with large queries and ii) dynamic partition size adjustment for runtime overhead minimization. For this, we introduce a new abstraction, microRDD, to support our design of query merging, the embedding of queries as part of data, and an opportunistic sharing of common input data among queries. Comprehensive evaluation using real-world data shows that QaaD is able to deliver 10.6x to 36.6x speed-up against standard Spark executions for small query workloads.<\/jats:p>","DOI":"10.1145\/3589279","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-26","source":"Crossref","is-referenced-by-count":2,"title":["QaaD (Query-as-a-Data): Scalable Execution of Massive Number of Small Queries in Spark"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-9355-2140","authenticated-orcid":false,"given":"Yeonsu","family":"Park","sequence":"first","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8204-6816","authenticated-orcid":false,"given":"Byungchul","family":"Tak","sequence":"additional","affiliation":[{"name":"Kyungpook National University, Daegu, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9206-9563","authenticated-orcid":false,"given":"Wook-Shin","family":"Han","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2022. Amazon Seller Central. https:\/\/sellercentral.amazon.com. Accessed: 2022-09--25."},{"key":"e_1_2_2_2_1","unstructured":"2022. Brazilian E-commerce Public Dataset by Olist. https:\/\/www.kaggle.com\/datasets\/olistbr\/brazilian-ecommerce. Accessed: 2022-09--25."},{"key":"e_1_2_2_3_1","unstructured":"2022. Online Auctions Dataset. https:\/\/www.kaggle.com\/datasets\/onlineauctions\/online-auctions-dataset. Accessed: 2022-09--25."},{"key":"e_1_2_2_4_1","unstructured":"2023. Adaptive Query Execution. https:\/\/docs.databricks.com\/optimizations\/aqe.html. Accessed: 2023-01--15."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236195"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86534-4_3"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData50022.2020.9378196"},{"key":"e_1_2_2_8_1","volume-title":"Proceedings of KDD cup and workshop","volume":"2007","author":"Bennett James","year":"2007","unstructured":"James Bennett, Stan Lanning, et al. 2007. The netflix prize. In Proceedings of KDD cup and workshop, Vol. 2007. Citeseer, 35."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314045"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352121"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367519"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057206"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3186359"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536225"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2018.00158"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192971"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258257"},{"volume-title":"AdaptDB: adaptive partitioning for distributed joins. Ph. D. Dissertation","author":"Yi Lu.","key":"e_1_2_2_18_1","unstructured":"Yi Lu. 2017. AdaptDB: adaptive partitioning for distributed joins. Ph. D. Dissertation. Massachusetts Institute of Technology."},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135984"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00580-x"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2016.49"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476388"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526158"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1080\/1206212X.2020.1732081"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452799"},{"key":"e_1_2_2_27_1","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for in-Memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, USA."},{"key":"e_1_2_2_28_1","volume-title":"2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10)","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1855741.1855744"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.100"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2019.05.077"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526164"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457392"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589279","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589279","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:54Z","timestamp":1750182534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589279"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":33,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589279"],"URL":"https:\/\/doi.org\/10.1145\/3589279","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}