{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T14:22:38Z","timestamp":1756995758027},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>A package query returns a package---a multiset of tuples---that maximizes or minimizes a linear objective function subject to linear constraints, thereby enabling in-database decision support. Prior work has established the equivalence of package queries to Integer Linear Programs (ILPs) and developed the SketchRefine algorithm for package query processing. While this algorithm was an important first step toward supporting prescriptive analytics scalably inside a relational database, it struggles when the data size grows beyond a few hundred million tuples or when the constraints become very tight. In this paper, we present Progressive Shading, a novel algorithm for processing package queries that can scale efficiently to billions of tuples and gracefully handle tight constraints. Progressive Shading solves a sequence of optimization problems over a hierarchy of relations, each resulting from an ever-finer partitioning of the original tuples into homogeneous groups until the original relation is obtained. This strategy avoids the premature discarding of high-quality tuples that can occur with SketchRefine. Our novel partitioning scheme, Dynamic Low Variance, can handle very large relations with multiple attributes and can dynamically adapt to both concentrated and spread-out sets of attribute values, provably outperforming traditional partitioning schemes such as kd-tree. We further optimize our system by replacing our off-the-shelf optimization software with customized ILP and LP solvers, called Dual Reducer and Parallel Dual Simplex respectively, that are highly accurate and orders of magnitude faster.<\/jats:p>","DOI":"10.14778\/3641204.3641222","type":"journal-article","created":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T22:05:43Z","timestamp":1714687543000},"page":"1146-1158","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization"],"prefix":"10.14778","volume":"17","author":[{"given":"Anh L.","family":"Mai","sequence":"first","affiliation":[{"name":"New York University Abu Dhabi"}]},{"given":"Pengyu","family":"Wang","sequence":"additional","affiliation":[{"name":"New York University Abu Dhabi"}]},{"given":"Azza","family":"Abouzied","sequence":"additional","affiliation":[{"name":"New York University Abu Dhabi"}]},{"given":"Matteo","family":"Brucato","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Peter J.","family":"Haas","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst"}]},{"given":"Alexandra","family":"Meliou","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst"}]}],"member":"320","published-online":{"date-parts":[[2024,5,2]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","unstructured":"Abdurro'uf and et al. 2021. The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete Release of MaNGA MaStar and APOGEE-2 Data. arXiv:2112.02026. 10.3847\/1538-4365\/ac4414","DOI":"10.3847\/1538-4365\/ac4414"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2015.0073"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.12.1.45.11902"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-017-0483-4"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592980.3595313"},{"key":"e_1_2_1_7_1","first-page":"46","article-title":"OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering","volume":"5","author":"Dagum Leonardo","year":"1998","unstructured":"Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE 5, 1 (1998), 46--55.","journal-title":"IEEE"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00288933"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","unstructured":"Matteo Fischetti and Andrea Lodi. 2011. Heuristics in Mixed Integer Programming. 10.1002\/9780470400531.eorms0376","DOI":"10.1002\/9780470400531.eorms0376"},{"key":"e_1_2_1_10_1","unstructured":"Gurobi Optimization LLC. 2022. Gurobi Optimizer Reference Manual. https:\/\/www.gurobi.com"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2015.01.049"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551833"},{"key":"e_1_2_1_14_1","volume-title":"Introduction to Operations Research","author":"Hillier Frederick S.","unstructured":"Frederick S. Hillier. 1967. Introduction to Operations Research. San Francisco, Holden-Day."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12532-017-0130-5"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593666"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794378"},{"key":"e_1_2_1_18_1","volume-title":"Rousseeuw","author":"Kaufman Leonard","year":"1990","unstructured":"Leonard Kaufman and Peter J. Rousseeuw. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1287\/ijoc.2019.0944"},{"key":"e_1_2_1_20_1","unstructured":"C. C. N. Kuhn G. Calbert I. Garanovich and T. Weir. 2023. Integer linear programming supporting portfolio design. arXiv:2303.14364 [math.OC]"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/icc.2019.8762075"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","unstructured":"Anh Mai Matteo Brucato Azza Abouzied Peter J. Haas and Alexandra Meliou. 2023. Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization. https:\/\/github.com\/alm818\/PackageQuery. arXiv:2307.02860 [cs.DB] 10.48550\/arXiv.2307.02860","DOI":"10.48550\/arXiv.2307.02860"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5772\/12909"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","unstructured":"Vinod Nair Sergey Bartunov Felix Gimeno Ingrid von Glehn Pawel Lichocki Ivan Lobov Brendan O'Donoghue Nicolas Sonnerat Christian Tjandraatmadja Pengming Wang Ravichandra Addanki Tharindi Hapuarachchi Thomas Keck James Keeling Pushmeet Kohli Ira Ktena Yujia Li Oriol Vinyals and Yori Zwols. 2020. Solving Mixed Integer Programs Using Neural Networks. 10.48550\/ARXIV.2012.13349","DOI":"10.48550\/ARXIV.2012.13349"},{"key":"e_1_2_1_25_1","volume-title":"Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. CoRR abs\/2110.11927","author":"Narayanan Deepak","year":"2021","unstructured":"Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula, Stephen P. Boyd, and Matei Zaharia. 2021. Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP. CoRR abs\/2110.11927 (2021). arXiv:2110.11927 https:\/\/arxiv.org\/abs\/2110.11927"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/5992.814654"},{"key":"e_1_2_1_27_1","volume-title":"Vesper: Measuring Time-to-Interactivity for Web Pages. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18)","author":"Netravali Ravi","year":"2018","unstructured":"Ravi Netravali, Vikram Nathan, James Mickens, and Hari Balakrishnan. 2018. Vesper: Measuring Time-to-Interactivity for Web Pages. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 217--231. https:\/\/www.usenix.org\/conference\/nsdi18\/presentation\/netravali-vesper"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-3434-7_10"},{"key":"e_1_2_1_29_1","volume-title":"A comparative study of divisive hierarchical clustering algorithms. CoRR abs\/1506.08977","author":"Roux Maurice","year":"2015","unstructured":"Maurice Roux. 2015. A comparative study of divisive hierarchical clustering algorithms. CoRR abs\/1506.08977 (2015). arXiv:1506.08977 http:\/\/arxiv.org\/abs\/1506.08977"},{"key":"e_1_2_1_30_1","volume-title":"Visualization and ILOG CPLEX","author":"Sander Georg","unstructured":"Georg Sander and Adrian Vasiliu. 2005. Visualization and ILOG CPLEX. In Graph Drawing, J\u00e1nos Pach (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 510--511."},{"key":"e_1_2_1_31_1","unstructured":"TPCH [n.d.]. TPC-H Decision Support Benchmark. https:\/\/www.tpc.org\/tpch\/."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-04565-7"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2949689.2949693"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3641204.3641222","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,2]],"date-time":"2024-05-02T22:08:24Z","timestamp":1714687704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3641204.3641222"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.14778\/3641204.3641222"],"URL":"https:\/\/doi.org\/10.14778\/3641204.3641222","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2024-05-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}