{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T10:57:52Z","timestamp":1761649072284},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Addressing the increasing demand for data exchange has led to the development of data markets that facilitate transactional interactions between data buyers and data sellers. Still, cost-effective and distribution-aware query answering is a substantial challenge in these environments. In this paper, while differentiating different types of data markets, we take the initial steps towards addressing this challenge. In particular, we envision a unified query answering framework and discuss its functionalities. Our framework enables integrating data from different sources in a data market into a dataset that meets user-provided schema and distribution requirements cost-effectively. In order to facilitate consumers' query answering, our system discovers data views in the form of join-paths on relevant data sources, defines a get-next operation to query views, and estimates the cost of get-next on each view. The query answering engine then selects the next views to sample sequentially to collect the output data. Depending on the knowledge of the system from the underlying data sources, the view selection problem can be modeled as an instance of a multi-arm bandit or coupon collector's problem.<\/jats:p>","DOI":"10.14778\/3551793.3551858","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:25:03Z","timestamp":1664490303000},"page":"3137-3144","source":"Crossref","is-referenced-by-count":13,"title":["Towards distribution-aware query answering in data markets"],"prefix":"10.14778","volume":"15","author":[{"given":"Abolfazl","family":"Asudeh","sequence":"first","affiliation":[{"name":"University of Illinois Chicago"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fatemeh","family":"Nargesian","sequence":"additional","affiliation":[{"name":"University of Rochester"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Maximizing Gain over Flexible Attributes in Peer to Peer Marketplaces","author":"Asudeh Abolfazl","unstructured":"Abolfazl Asudeh , Azade Nazi , Nick Koudas , and Gautam Das . 2019. Maximizing Gain over Flexible Attributes in Peer to Peer Marketplaces . In PAKDD. Springer , 327--345. Abolfazl Asudeh, Azade Nazi, Nick Koudas, and Gautam Das. 2019. Maximizing Gain over Flexible Attributes in Peer to Peer Marketplaces. In PAKDD. Springer, 327--345."},{"key":"e_1_2_1_2_1","volume-title":"RRR: Rank-regret representative. In SIGMOD. ACM.","author":"Asudeh Abolfazl","year":"2019","unstructured":"Abolfazl Asudeh , Azade Nazi , Nan Zhang , Gautam Das , and HV Jagadish . 2019 . RRR: Rank-regret representative. In SIGMOD. ACM. Abolfazl Asudeh, Azade Nazi, Nan Zhang, Gautam Das, and HV Jagadish. 2019. RRR: Rank-regret representative. In SIGMOD. ACM."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Alex Bogatu Alvaro A. A. Fernandes Norman W. Paton and Nikolaos Konstantinou. 2020. Dataset Discovery in Data Lakes. In ICDE. 709--720.  Alex Bogatu Alvaro A. A. Fernandes Norman W. Paton and Nikolaos Konstantinou. 2020. Dataset Discovery in Data Lakes. In ICDE. 709--720.","DOI":"10.1109\/ICDE48307.2020.00067"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2001.914855"},{"key":"e_1_2_1_5_1","volume-title":"Noy","author":"Brickley Dan","year":"2019","unstructured":"Dan Brickley , Matthew Burgess , and Natasha F . Noy . 2019 . Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In WWW. 1365--1375. Dan Brickley, Matthew Burgess, and Natasha F. Noy. 2019. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In WWW. 1365--1375."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687750"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476346"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3357377.3357378"},{"key":"e_1_2_1_9_1","unstructured":"Lingjiao Chen Paraschos Koutris and Arun Kumar. [n.d.]. Towards Model-based Pricing for Machine Learning in a Data Marketplace. In SIGMOD Peter A. Boncz Stefan Manegold Anastasia Ailamaki Amol Deshpande and Tim Kraska (Eds.). 1535--1552.  Lingjiao Chen Paraschos Koutris and Arun Kumar. [n.d.]. Towards Model-based Pricing for Machine Learning in a Data Marketplace. In SIGMOD Peter A. Boncz Stefan Manegold Anastasia Ailamaki Amol Deshpande and Tim Kraska (Eds.). 1535--1552."},{"key":"e_1_2_1_10_1","volume-title":"Forthcoming","author":"Ertz Myriam","year":"2016","unstructured":"Myriam Ertz , Fabien Durif , and Manon Arcand . 2016. Collaborative Consumption or the Rise of the Two-Sided Consumer. Journal of Business & Management , Forthcoming ( 2016 ). Myriam Ertz, Fabien Durif, and Manon Arcand. 2016. Collaborative Consumption or the Rise of the Two-Sided Consumer. Journal of Business & Management, Forthcoming (2016)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1158\/1055-9965.EPI-20-1038"},{"key":"e_1_2_1_12_1","volume-title":"Aurum: A Data Discovery System. In ICDE.","author":"Fernandez Raul Castro","year":"2018","unstructured":"Raul Castro Fernandez , Ziawasch Abedjan , Famien Koko , Gina Yuan , Samuel Madden , and Michael Stonebraker . 2018 . Aurum: A Data Discovery System. In ICDE. Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A Data Discovery System. In ICDE."},{"key":"e_1_2_1_13_1","volume-title":"Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang.","author":"Fernandez Raul Castro","year":"2018","unstructured":"Raul Castro Fernandez , Essam Mansour , Abdulhakim Ali Qahtan , Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2018 . Seeping Semantics : Linking Datasets Using Word Embeddings for Data Discovery. In ICDE. 989--1000. Raul Castro Fernandez, Essam Mansour, Abdulhakim Ali Qahtan, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2018. Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery. In ICDE. 989--1000."},{"key":"e_1_2_1_14_1","volume-title":"Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment. In ICDE. 1190--1201.","author":"Fernandez Raul Castro","year":"2019","unstructured":"Raul Castro Fernandez , Jisoo Min , Demitri Nava , and Samuel Madden . 2019 . Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment. In ICDE. 1190--1201. Raul Castro Fernandez, Jisoo Min, Demitri Nava, and Samuel Madden. 2019. Lazo: A Cardinality-Based Method for Coupled Estimation of Jaccard Similarity and Containment. In ICDE. 1190--1201."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407800"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3372716.3372726"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2775993.2776000"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2770870"},{"key":"e_1_2_1_19_1","volume-title":"Wander Join: Online Aggregation via Random Walks. In SIGMOD. 615--629.","author":"Li Feifei","year":"2016","unstructured":"Feifei Li , Bin Wu , Ke Yi , and Zhuoyue Zhao . 2016 . Wander Join: Online Aggregation via Random Walks. In SIGMOD. 615--629. Feifei Li, Bin Wu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via Random Walks. In SIGMOD. 615--629."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467872"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2806881"},{"key":"e_1_2_1_23_1","first-page":"2747","article-title":"Demonstration of Dealer: An End-to-End Model Marketplace with Differential Privacy","volume":"14","author":"Liu Jinfei","year":"2021","unstructured":"Jinfei Liu , Qiongqiong Lin , Jiayao Zhang , Kui Ren , Jian Lou , Junxu Liu , Li Xiong , Jian Pei , and Jimeng Sun . 2021 . Demonstration of Dealer: An End-to-End Model Marketplace with Differential Privacy . PVLDB 14 , 12 (2021), 2747 -- 2750 . Jinfei Liu, Qiongqiong Lin, Jiayao Zhang, Kui Ren, Jian Lou, Junxu Liu, Li Xiong, Jian Pei, and Jimeng Sun. 2021. Demonstration of Dealer: An End-to-End Model Marketplace with Differential Privacy. PVLDB 14, 12 (2021), 2747--2750.","journal-title":"PVLDB"},{"key":"e_1_2_1_24_1","volume-title":"Randomized algorithms","author":"Motwani Rajeev","unstructured":"Rajeev Motwani and Prabhakar Raghavan . 1995. Randomized algorithms . Cambridge university press . Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized algorithms. Cambridge university press."},{"key":"e_1_2_1_25_1","volume-title":"Deep Learning for Entity Matching: A Design Space Exploration","author":"Mudgal Sidharth","unstructured":"Sidharth Mudgal , Han Li , Theodoros Rekatsinas , AnHai Doan , Youngchoon Park , Ganesh Krishnan , Rohit Deep , Esteban Arcaute , and Vijay Raghavendra . 2018. Deep Learning for Entity Matching: A Design Space Exploration . In SIGMOD, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM , 19--34. Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In SIGMOD, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 19--34."},{"key":"e_1_2_1_26_1","volume-title":"Tailoring Data Source Distributions for Fairness-aware Data Integration. PVLDB 14, 11","author":"Nargesian Fatemeh","year":"2021","unstructured":"Fatemeh Nargesian , Abolfazl Asudeh , and HV Jagadish . 2021. Tailoring Data Source Distributions for Fairness-aware Data Integration. PVLDB 14, 11 ( 2021 ). Fatemeh Nargesian, Abolfazl Asudeh, and HV Jagadish. 2021. Tailoring Data Source Distributions for Fairness-aware Data Integration. PVLDB 14, 11 (2021)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476299"},{"key":"e_1_2_1_28_1","volume-title":"Turaga","author":"Nargesian Fatemeh","year":"2018","unstructured":"Fatemeh Nargesian , Udayan Khurana , Tejaswini Pedapati , Horst Samulowitz , and Deepak S . Turaga . 2018 . Dataset Evolver : An Interactive Feature Engineering Notebook. In AAAI. 8212--8213. Fatemeh Nargesian, Udayan Khurana, Tejaswini Pedapati, Horst Samulowitz, and Deepak S. Turaga. 2018. Dataset Evolver: An Interactive Feature Engineering Notebook. In AAAI. 8212--8213."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476364"},{"key":"e_1_2_1_31_1","volume-title":"Erkang Zhu, Ken Q. Pu, and Ren\u00e9e J. Miller.","author":"Ouellette Paul","year":"2021","unstructured":"Paul Ouellette , Aidan Sciortino , Fatemeh Nargesian , Bahar Ghadiri Bashardoost , Erkang Zhu, Ken Q. Pu, and Ren\u00e9e J. Miller. 2021 . RONIN : Data Lake Exploration. PVLDB ( 2021). Paul Ouellette, Aidan Sciortino, Fatemeh Nargesian, Bahar Ghadiri Bashardoost, Erkang Zhu, Ken Q. Pu, and Ren\u00e9e J. Miller. 2021. RONIN: Data Lake Exploration. PVLDB (2021)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336665"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336665"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Li Qian Michael J. Cafarella and H. V. Jagadish. 2012. Sample-driven Schema Mapping. In SIGMOD (Scottsdale Arizona USA). 73--84.  Li Qian Michael J. Cafarella and H. V. Jagadish. 2012. Sample-driven Schema Mapping. In SIGMOD (Scottsdale Arizona USA). 73--84.","DOI":"10.1145\/2213836.2213846"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"A\u00e9cio S. R. Santos Aline Bessa Fernando Chirigati Christopher Musco and Juliana Freire. 2021. Correlation Sketches for Approximate Join-Correlation Queries. In SIGMOD Guoliang Li Zhanhuai Li Stratos Idreos and Divesh Srivastava (Eds.). 1531--1544.  A\u00e9cio S. R. Santos Aline Bessa Fernando Chirigati Christopher Musco and Juliana Freire. 2021. Correlation Sketches for Approximate Join-Correlation Queries. In SIGMOD Guoliang Li Zhanhuai Li Stratos Idreos and Divesh Srivastava (Eds.). 1531--1544.","DOI":"10.1145\/3448016.3458456"},{"key":"e_1_2_1_36_1","unstructured":"C. Shapiro and H.R. Varian. 1998. Versioning: the smart way to sell information. Harvard Business Review (1998) 106--114.  C. Shapiro and H.R. Varian. 1998. Versioning: the smart way to sell information. Harvard Business Review (1998) 106--114."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000068"},{"key":"e_1_2_1_38_1","volume-title":"Versioning Information Goods","author":"Varian Hal","year":"2000","unstructured":"Hal Varian . 2000. Versioning Information Goods . Internet Publishing and Beyond : The Economics of Digital Information and Intellectual Property ( 2000 ). Hal Varian. 2000. Versioning Information Goods. Internet Publishing and Beyond: The Economics of Digital Information and Intellectual Property (2000)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Zhuoyue Zhao Robert Christensen Feifei Li Xiao Hu and Ke Yi. 2018. Random Sampling over Joins Revisited. In SIGMOD. 1525--1539.  Zhuoyue Zhao Robert Christensen Feifei Li Xiao Hu and Ke Yi. 2018. Random Sampling over Joins Revisited. In SIGMOD. 1525--1539.","DOI":"10.1145\/3183713.3183739"},{"key":"e_1_2_1_40_1","volume-title":"Miller","author":"Zhu Erkang","year":"2019","unstructured":"Erkang Zhu , Dong Deng , Fatemeh Nargesian , and Ren\u00e9e J . Miller . 2019 . JOSIE : Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. In SIGMOD. 847--864. Erkang Zhu, Dong Deng, Fatemeh Nargesian, and Ren\u00e9e J. Miller. 2019. JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes. In SIGMOD. 847--864."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300065"},{"key":"e_1_2_1_42_1","first-page":"1185","article-title":"LSH Ensemble","volume":"9","author":"Zhu Erkang","year":"2016","unstructured":"Erkang Zhu , Fatemeh Nargesian , Ken Q. Pu , and Ren\u00e9e J. Miller . 2016 . LSH Ensemble : Internet-Scale Domain Search. PVLDB 9 , 12 (2016), 1185 -- 1196 . Erkang Zhu, Fatemeh Nargesian, Ken Q. Pu, and Ren\u00e9e J. Miller. 2016. LSH Ensemble: Internet-Scale Domain Search. PVLDB 9, 12 (2016), 1185--1196.","journal-title":"Internet-Scale Domain Search. PVLDB"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3551793.3551858","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:52:24Z","timestamp":1672224744000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3551793.3551858"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":42,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.14778\/3551793.3551858"],"URL":"https:\/\/doi.org\/10.14778\/3551793.3551858","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,7]]}}}