{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T07:10:05Z","timestamp":1771917005140,"version":"3.50.1"},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:p>\n            In order to conduct analytical tasks, data scientists often need to find relevant data from an avalanche of sources (e.g., data lakes, large organizational databases). This effort is typically made in an ad hoc, non-systematic manner, which makes it a daunting endeavour. Current data discovery systems typically require the users to find relevant tables manually, usually by issuing multiple queries (e.g., using SQL). However, expressing such queries is nontrivial, as it requires knowledge of the underlying structure (schema) of the data organization in advance. This issue is further exacerbated when data resides in data lakes, where there is no predefined schema that data must conform to. On the other hand, data scientists can often come up with a few\n            <jats:italic>example records<\/jats:italic>\n            of interest quickly. Motivated by this observation, we developed DICE---a human-in-the-loop system for\n            <jats:italic>&lt;u&gt;D&lt;\/u&gt;ata d&lt;u&gt;I&lt;\/u&gt;s&lt;u&gt;C&lt;\/u&gt;overy by &lt;u&gt;E&lt;\/u&gt;xample---that<\/jats:italic>\n            takes user-provided example records as input and returns more records that satisfy the user intent.\n            <jats:italic>DICE's<\/jats:italic>\n            key idea is to synthesize a SQL query that captures the user intent, specified via examples. To this end,\n            <jats:italic>DICE<\/jats:italic>\n            follows a three-step process: (1)\n            <jats:italic>DICE<\/jats:italic>\n            first discovers a few candidate queries by finding join paths across tables within the data lake. (2) Then\n            <jats:italic>DICE<\/jats:italic>\n            consults with the user for validation by presenting a few records to them, and, thus, eliminating spurious queries. (3) Based on the user feedback,\n            <jats:italic>DICE<\/jats:italic>\n            refines the search and repeats the process until the user is satisfied with the results. We will demonstrate how\n            <jats:italic>DICE<\/jats:italic>\n            can help in data discovery through an interactive, example-based interaction.\n          <\/jats:p>","DOI":"10.14778\/3476311.3476353","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T22:48:43Z","timestamp":1635461323000},"page":"2819-2822","source":"Crossref","is-referenced-by-count":18,"title":["DICE"],"prefix":"10.14778","volume":"14","author":[{"given":"El Kindi","family":"Rezig","sequence":"first","affiliation":[{"name":"MIT CSAIL"}]},{"given":"Anshul","family":"Bhandari","sequence":"additional","affiliation":[{"name":"NIT Hamirpur"}]},{"given":"Anna","family":"Fariha","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst"}]},{"given":"Benjamin","family":"Price","sequence":"additional","affiliation":[{"name":"MIT Lincoln Laboratory"}]},{"given":"Allan","family":"Vanterpool","sequence":"additional","affiliation":[{"name":"United States Air Force"}]},{"given":"Vijay","family":"Gadepally","sequence":"additional","affiliation":[{"name":"MIT Lincoln Laboratory"}]},{"given":"Michael","family":"Stonebraker","sequence":"additional","affiliation":[{"name":"MIT CSAIL"}]}],"member":"320","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Lucene. 2021. https:\/\/lucene.apache.org. Accessed: 03\/2021.  Apache Lucene. 2021. https:\/\/lucene.apache.org. Accessed: 03\/2021."},{"key":"e_1_2_1_2_1","unstructured":"AZLyrics. 2021. https:\/\/azlyrics.com Accessed: 03\/2021.  AZLyrics. 2021. https:\/\/azlyrics.com Accessed: 03\/2021."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389776"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818637"},{"key":"e_1_2_1_5_1","volume-title":"Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang.","author":"Deng Dong","year":"2017"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2599168"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376442"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342266"},{"key":"e_1_2_1_9_1","volume-title":"Aurum: A Data Discovery System. In ICDE. 1001--1012.","author":"Fernandez Raul Castro","year":"2018"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831369"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749452"},{"key":"e_1_2_1_12_1","volume-title":"Dagger: A Data (not code) Debugger. In CIDR.","author":"Rezig El Kindi","year":"2020"},{"key":"e_1_2_1_13_1","volume-title":"Poly\/DMAH@VLDB","author":"Rezig El Kindi"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593664"},{"key":"e_1_2_1_15_1","unstructured":"The Music Brainz Encyclopedia. 2021. https:\/\/musicbrainz.org Accessed:03\/2021.  The Music Brainz Encyclopedia. 2021. https:\/\/musicbrainz.org Accessed:03\/2021."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140587.3062365"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2013.6693082"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3476311.3476353","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:32:45Z","timestamp":1672227165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3476311.3476353"}},"subtitle":["data discovery by example"],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":17,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.14778\/3476311.3476353"],"URL":"https:\/\/doi.org\/10.14778\/3476311.3476353","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}