{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T01:52:36Z","timestamp":1756000356440,"version":"3.44.0"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1762268,IIS-1956096"],"award-info":[{"award-number":["IIS-1762268,IIS-1956096"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,5,10]]},"abstract":"<jats:p>\n            We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e.,we want to find a formula that minimizes the number of variable repetitions. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean formula minimization is \u03a3\n            <jats:sup>p<\/jats:sup>\n            <jats:sub>2<\/jats:sub>\n            -complete, we show that the problem is NP-Complete in our case. Additionally, we identify a large class of queries that can be solved in PTIME, expanding beyond the previously known tractable cases of read-once formulas and hierarchical queries.\n          <\/jats:p>\n          <jats:p>We describe connections between factorizations, Variable Elimination Orders (VEOs), and minimal query plans. We leverage these insights to create an Integer Linear Program (ILP) that can solve the minimal factorization problem exactly. We also propose a Max-Flow Min-Cut (MFMC) based algorithm that gives an efficient approximate solution. Importantly, we show that both the Linear Programming (LP) relaxation of our ILP, and our MFMC-based algorithm are always correct for all currently known PTIME cases. Thus, we present two unified algorithms (ILP and MFMC) that can both recover all known PTIME cases in PTIME, yet also solve NP-Complete cases either exactly (ILP) or approximately (MFMC), as desired.<\/jats:p>","DOI":"10.1145\/3651605","type":"journal-article","created":{"date-parts":[[2024,5,14]],"date-time":"2024-05-14T08:32:13Z","timestamp":1715675533000},"page":"1-24","source":"Crossref","is-referenced-by-count":2,"title":["Minimally Factorizing the Provenance of Self-join Free Conjunctive Queries"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0221-6836","authenticated-orcid":false,"given":"Neha","family":"Makhija","sequence":"first","affiliation":[{"name":"Northeastern University, Boston, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9614-0504","authenticated-orcid":false,"given":"Wolfgang","family":"Gatterbauer","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,5,14]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","unstructured":"Mahmoud Abo Khamis Hung Q. Ngo and Dan Suciu. 2017. What Do Shannon-Type Inequalities Submodular Width and Disjunctive Datalog Have to Do with One Another?. In PODS. 429--444. https:\/\/doi.org\/10.1145\/3034786.3056105","DOI":"10.1145\/3034786.3056105"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1137\/060664537"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2389241.2389249"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2402.322389"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.ICALP.2023.113"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1137\/0210059"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2010.06.011"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","unstructured":"Peter Buneman Sanjeev Khanna and Wang-Chiew Tan. 2002. On Propagation of Deletions and Annotations Through Views. In PODS. 150--158. https:\/\/doi.org\/10.1145\/543613.543633","DOI":"10.1145\/543613.543633"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBCCI.2013.6644862"},{"key":"e_1_2_2_10_1","volume-title":"HITSNDIFFS: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property. In ICDE. https:\/\/arxiv.org\/pdf\/2401.00013","author":"Chen Zixuan","year":"2024","unstructured":"Zixuan Chen, Subhodeep Mitra, R Ravi, and Wolfgang Gatterbauer. 2024. HITSNDIFFS: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property. In ICDE. https:\/\/arxiv.org\/pdf\/2401.00013"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1561\/9781601982339"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.disc.2005.12.033"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0166--218X(01)00344--4"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1017\/cbo9780511852008.003"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/357775.357777"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/b978-012088469--8.50076-0"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-006-0004--3"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/1177299"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/319732.319740"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004--3702(99)00059--4"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","unstructured":"Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann. https:\/\/doi.org\/10.1016\/b978--1--55860--890-0.x5000--2","DOI":"10.1016\/b978--1--55860--890-0.x5000--2"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","unstructured":"Maarten Van den Heuvel Peter Ivanov Wolfgang Gatterbauer Floris Geerts and Martin Theobald. 2019. Anytime Approximation in Probabilistic Databases via Scaled Dissociations. In SIGMOD. 1295--1312. https:\/\/doi.org\/10.1145\/3299869.3319900","DOI":"10.1145\/3299869.3319900"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-013-0310--5"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850592"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","unstructured":"Cibele Freire Wolfgang Gatterbauer Neil Immerman and Alexandra Meliou. 2020. New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins. In PODS. 271--284. https:\/\/doi.org\/10.1145\/3375395.3387647","DOI":"10.1145\/3375395.3387647"},{"volume-title":"Computers and intractability","author":"Garey Michael R","key":"e_1_2_2_26_1","unstructured":"Michael R Garey and David S Johnson. 1979. Computers and intractability. Vol. 174. W. H. Freeman & Co. https:\/\/dl.acm.org\/doi\/10.5555\/578533"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2532641"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-016-0434--5"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2008.03.002"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1017\/cbo9780511852008.011"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2005.09.016"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2008.02.011"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","unstructured":"Todd J. Green Grigoris Karvounarakis and Val Tannen. 2007. Provenance semirings. In PODS. 31--40. https:\/\/doi.org\/10.1145\/1265530.1265535","DOI":"10.1145\/1265530.1265535"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3034786.3056125"},{"key":"e_1_2_2_35_1","unstructured":"LLC Gurobi Optimization. 2021a. Gurobi Optimizer Reference Manual. http:\/\/www.gurobi.com"},{"key":"e_1_2_2_36_1","unstructured":"LLC Gurobi Optimization. 2021b. Mixed-Integer Programming (MIP) -- A Primer on the Basics. https:\/\/www.gurobi.com\/resource\/mip-basics\/"},{"key":"e_1_2_2_37_1","first-page":"183","article-title":"Repetition-free Boolean functions","volume":"32","author":"Gurvich V.A.","year":"1977","unstructured":"V.A. Gurvich. 1977. Repetition-free Boolean functions. Uspekhi Mat. Nauk, Vol. 32 (1977), 183--184. http:\/\/mi.mathnet.ru\/umn3055 (in Russian).","journal-title":"Uspekhi Mat. Nauk"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004--3702(93)90062-G"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/sfcs.1997.646147"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS52979.2021.00050"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1017\/cbo9780511977152.002"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1988.tb01721.x"},{"key":"e_1_2_2_43_1","unstructured":"Neha Makhija and Wolfgang Gatterbauer. 2021. Minimally Factorizing the Provenance of Self-join Free Conjunctive Queries. (2021). https:\/\/arxiv.org\/abs\/2105.14307 (Full technical report with Online Appendix)."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626715"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2010.5647772"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402803"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2005.02.007"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","unstructured":"Dan Olteanu and Jiewen Huang. 2008. Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. In SUM. 326--340. https:\/\/doi.org\/10.1007\/978--3--540--87993-0_26","DOI":"10.1007\/978--3--540--87993-0_26"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3003665.3003667"},{"key":"e_1_2_2_50_1","volume-title":"On Factorisation of Provenance Polynomials. In 3rd Workshop on the Theory and Practice of Provenance (TaPP'11)","author":"Olteanu Dan","year":"2011","unstructured":"Dan Olteanu and Jakub Z\u00e1 vodn\u00fd. 2011. On Factorisation of Provenance Polynomials. In 3rd Workshop on the Theory and Practice of Provenance (TaPP'11). USENIX. https:\/\/www.usenix.org\/conference\/tapp11\/factorisation-provenance-polynomials"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2274576.2274607"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2656335"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196--6774(86)90023--4"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","unstructured":"Sudeepa Roy Vittorio Perduca and Val Tannen. 2011. Faster query answering in probabilistic databases using read-once functions. In ICDT. 232--243. https:\/\/doi.org\/10.1145\/1938551.1938582","DOI":"10.1145\/1938551.1938582"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1137\/1030065"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920975"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","unstructured":"Dan Suciu Dan Olteanu Christopher R\u00e9 and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool. https:\/\/doi.org\/10.2200\/s00362ed1v01y201105dtm016","DOI":"10.2200\/s00362ed1v01y201105dtm016"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1137\/0213035"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/sfcs.1998.743506"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","unstructured":"Moshe Y. Vardi. 1982. The Complexity of Relational Query Languages (Extended Abstract). In STOC. 137--146. https:\/\/doi.org\/10.1145\/800070.802186","DOI":"10.1145\/800070.802186"},{"volume-title":"Approximation algorithms","author":"Vazirani Vijay V","key":"e_1_2_2_61_1","unstructured":"Vijay V Vazirani. 2001. Approximation algorithms. Vol. 1. Springer. https:\/\/dl.acm.org\/doi\/10.5555\/500776"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1017\/9781316888568"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3651605","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3651605","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T21:39:03Z","timestamp":1755898743000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3651605"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,10]]},"references-count":62,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,5,10]]}},"alternative-id":["10.1145\/3651605"],"URL":"https:\/\/doi.org\/10.1145\/3651605","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2024,5,10]]}}}