{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:33:47Z","timestamp":1781714027157,"version":"3.54.5"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>The ubiquity of data lakes has created fascinating new challenges for data management research. In this tutorial, we review the state-of-the-art in data management for data lakes. We consider how data lakes are introducing new problems including dataset discovery and how they are changing the requirements for classic problems including data extraction, data cleaning, data integration, data versioning, and metadata management.<\/jats:p>","DOI":"10.14778\/3352063.3352116","type":"journal-article","created":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T18:36:11Z","timestamp":1568831771000},"page":"1986-1989","source":"Crossref","is-referenced-by-count":215,"title":["Data lake management"],"prefix":"10.14778","volume":"12","author":[{"given":"Fatemeh","family":"Nargesian","sequence":"first","affiliation":[{"name":"University of Toronto"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erkang","family":"Zhu","sequence":"additional","affiliation":[{"name":"University of Toronto"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ren\u00e9e J.","family":"Miller","sequence":"additional","affiliation":[{"name":"Northeastern University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ken Q.","family":"Pu","sequence":"additional","affiliation":[{"name":"UOIT"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Patricia C.","family":"Arocena","sequence":"additional","affiliation":[{"name":"TD Bank Group"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2019,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536343"},{"key":"e_1_2_1_2_1","volume-title":"CIDR","author":"Bhardwaj A. P.","year":"2015","unstructured":"A. P. Bhardwaj , S. Bhattacherjee , A. Chavan , A. Deshpande , A. J. Elmore , S. Madden , and A. G. Parameswaran . DataHub: Collaborative data science & dataset version management at scale . In CIDR , 2015 . A. P. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G. Parameswaran. DataHub: Collaborative data science & dataset version management at scale. In CIDR, 2015."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824035"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209900.3209911"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687750"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132882"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376702"},{"key":"e_1_2_1_9_1","volume-title":"CIDR","author":"Deng D.","year":"2017","unstructured":"D. Deng , R. C. Fernandez , Z. Abedjan , S. Wang , M. Stonebraker , A. K. Elmagarmid , I. F. Ilyas , S. Madden , M. Ouzzani , and N. Tang . The data civilizer system . In CIDR , 2017 . D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, M. Stonebraker, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, and N. Tang. The data civilizer system. In CIDR, 2017."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01853-4","volume-title":"Big Data Integration. Synthesis Lectures on Data Management","author":"Dong X. L.","year":"2015","unstructured":"X. L. Dong and D. Srivastava . Big Data Integration. Synthesis Lectures on Data Management . 2015 . X. L. Dong and D. Srivastava. Big Data Integration. Synthesis Lectures on Data Management. 2015."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/BDC.2015.30"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02463-4_12"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/1085304.1085309"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899391"},{"key":"e_1_2_1_15_1","first-page":"1001","volume-title":"ICDE","author":"Fernandez R. C.","year":"2018","unstructured":"R. C. Fernandez , Z. Abedjan , F. Koko , G. Yuan , S. Madden , and M. Stonebraker . Aurum: A data discovery system . In ICDE , pages 1001 -- 1012 , 2018 . R. C. Fernandez, Z. Abedjan, F. Koko, G. Yuan, S. Madden, and M. Stonebraker. Aurum: A data discovery system. In ICDE, pages 1001--1012, 2018."},{"key":"e_1_2_1_16_1","first-page":"989","volume-title":"ICDE","author":"Fernandez R. C.","year":"2018","unstructured":"R. C. Fernandez , E. Mansour , A. A. Qahtan , A. K. Elmagarmid , I. F. Ilyas , S. Madden , M. Ouzzani , M. Stonebraker , and N. Tang . Seeping semantics: Linking datasets using word embeddings for data discovery . In ICDE , pages 989 -- 1000 , 2018 . R. C. Fernandez, E. Mansour, A. A. Qahtan, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, M. Stonebraker, and N. Tang. Seeping semantics: Linking datasets using word embeddings for data discovery. In ICDE, pages 989--1000, 2018."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1938551.1938556"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183746"},{"key":"e_1_2_1_19_1","first-page":"1313","volume-title":"AAAI","author":"Gatterbauer W.","year":"2006","unstructured":"W. Gatterbauer and P. Bohunsky . Table extraction using spatial reasoning on the CSS2 visual box model . In AAAI , pages 1313 -- 1318 , 2006 . W. Gatterbauer and P. Bohunsky. Table extraction using spatial reasoning on the CSS2 visual box model. In AAAI, pages 1313--1318, 2006."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899389"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903730"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/4229.4233"},{"key":"e_1_2_1_23_1","volume-title":"CIDR","author":"Hellerstein J. M.","year":"2017","unstructured":"J. M. Hellerstein , V. Sreekanti , J. E. Gonzalez , J. Dalton , A. Dey , S. Nag , K. Ramachandran , S. Arora , A. Bhattacharyya , S. Das , M. Donsky , G. Fierro , C. She , C. Steinbach , V. Subramanian , and E. Sun . Ground: A data context service . In CIDR , 2017 . J. M. Hellerstein, V. Sreekanti, J. E. Gonzalez, J. Dalton, A. Dey, S. Nag, K. Ramachandran, S. Arora, A. Bhattacharyya, S. Das, M. Donsky, G. Fierro, C. She, C. Steinbach, V. Subramanian, and E. Sun. Ground: A data context service. In CIDR, 2017."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209900.3209902"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7363784"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.140"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872518.2889386"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3240491"},{"issue":"2","key":"e_1_2_1_29_1","first-page":"59","article-title":"Making open data transparent: Data discovery on open data","volume":"41","author":"Miller R. J.","year":"2018","unstructured":"R. J. Miller , F. Nargesian , E. Zhu , C. Christodoulakis , K. Q. Pu , and P. Andritsos . Making open data transparent: Data discovery on open data . IEEE Data Eng. Bull. , 41 ( 2 ): 59 -- 70 , 2018 . R. J. Miller, F. Nargesian, E. Zhu, C. Christodoulakis, K. Q. Pu, and P. Andritsos. Making open data transparent: Data discovery on open data. IEEE Data Eng. Bull., 41(2):59--70, 2018.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_30_1","volume-title":"Optimizing organizations for navigating data lakes","author":"Nargesian F.","year":"2018","unstructured":"F. Nargesian , K. Q. Pu , E. Zhu , B. G. Bashardoost , and R. J. Miller . Optimizing organizations for navigating data lakes , 2018 . arXiv:1812.07024. F. Nargesian, K. Q. Pu, E. Zhu, B. G. Bashardoost, and R. J. Miller. Optimizing organizations for navigating data lakes, 2018. arXiv:1812.07024."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336665"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213846"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213962"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593664"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/eScience.2018.00040"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452479"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824117"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/1952376.1952378"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213848"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2904442"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300065"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137788"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27694-1_13"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3352063.3352116","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,20]],"date-time":"2023-09-20T13:21:16Z","timestamp":1695216076000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3352063.3352116"}},"subtitle":["challenges and opportunities"],"short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":45,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2019,8]]}},"alternative-id":["10.14778\/3352063.3352116"],"URL":"https:\/\/doi.org\/10.14778\/3352063.3352116","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,8]]}}}