{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T05:18:45Z","timestamp":1769750325023,"version":"3.49.0"},"reference-count":78,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,8]]},"abstract":"<jats:p>\n            Open data plays a major role in supporting both governmental and organizational transparency. Many organizations are adopting\n            <jats:italic>Open Data Principles<\/jats:italic>\n            promising to make their open data complete, primary, and timely. These properties make this data tremendously valuable to data scientists. However, scientists generally do not have\n            <jats:italic>a priori<\/jats:italic>\n            knowledge about what data is available (its schema or content). Nevertheless, they want to be able to use open data and integrate it with other public or private data they are studying. Traditionally, data integration is done using a framework called\n            <jats:italic>query discovery<\/jats:italic>\n            where the main task is to discover a query (or transformation) that translates data from one form into another. The goal is to find the right operators to join, nest, group, link, and twist data into a desired form. We introduce a new paradigm for thinking about integration where the focus is on\n            <jats:italic>data discovery,<\/jats:italic>\n            but highly efficient internet-scale discovery that is driven by data analysis needs. We describe a research agenda and recent progress in developing scalable data-analysis or query-aware data discovery algorithms that provide high recall and accuracy over massive data repositories.\n          <\/jats:p>","DOI":"10.14778\/3229863.3240491","type":"journal-article","created":{"date-parts":[[2018,9,10]],"date-time":"2018-09-10T12:12:28Z","timestamp":1536581548000},"page":"2130-2139","source":"Crossref","is-referenced-by-count":70,"title":["Open data integration"],"prefix":"10.14778","volume":"11","author":[{"given":"Ren\u00e9e J.","family":"Miller","sequence":"first","affiliation":[{"name":"Northeastern University"}]}],"member":"320","published-online":{"date-parts":[[2018,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807267"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327494"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850586"},{"issue":"2","key":"e_1_2_1_4_1","first-page":"47","article-title":"Benchmarking data curation systems","volume":"39","author":"Arocena P. C.","year":"2016","unstructured":"P. C. Arocena , B. Glavic , G. Mecca , R. J. Miller , P. Papotti , and D. Santoro . Benchmarking data curation systems . IEEE Data Eng. Bull. , 39 ( 2 ): 47 -- 62 , 2016 . P. C. Arocena, B. Glavic, G. Mecca, R. J. Miller, P. Papotti, and D. Santoro. Benchmarking data curation systems. IEEE Data Eng. Bull., 39(2):47--62, 2016.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2740908.2742845"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/27633.27634"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060840"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085504.3091116"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767856"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.4018\/jswis.2009081901"},{"key":"e_1_2_1_11_1","first-page":"158","volume-title":"VLDB","author":"Buneman P.","year":"1995","unstructured":"P. Buneman , S. B. Davidson , K. Hart , G. C. Overton , and L. Wong . A data transformation system for biological data sources . In VLDB , pages 158 -- 169 , 1995 . P. Buneman, S. B. Davidson, K. Hart, G. C. Overton, and L. Wong. A data transformation system for biological data sources. In VLDB, pages 158--169, 1995."},{"key":"e_1_2_1_12_1","first-page":"55","volume-title":"Semantics in Databases","author":"Buneman P.","year":"1995","unstructured":"P. Buneman , S. B. Davidson , and A. Kosky . Semantics of database transformations . In Semantics in Databases , pages 55 -- 91 , 1995 . P. Buneman, S. B. Davidson, and A. Kosky. Semantics of database transformations. In Semantics in Databases, pages 55--91, 1995."},{"issue":"3","key":"e_1_2_1_13_1","first-page":"60","article-title":"Extracting, linking and integrating data from public sources: A financial case study","volume":"34","author":"Burdick D.","year":"2011","unstructured":"D. Burdick , M. A. Hern\u00e1ndez , H. Ho , G. Koutrika , R. Krishnamurthy , L. Popa , I. Stanoi , S. Vaithyanathan , and S. R. Das . Extracting, linking and integrating data from public sources: A financial case study . IEEE Data Eng. Bull. , 34 ( 3 ): 60 -- 67 , 2011 . D. Burdick, M. A. Hern\u00e1ndez, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. Stanoi, S. Vaithyanathan, and S. R. Das. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bull., 34(3):60--67, 2011.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687750"},{"key":"e_1_2_1_16_1","volume-title":"Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications","author":"Christen P.","year":"2012","unstructured":"P. Christen . Data Matching - Concepts and Techniques for Record Linkage , Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications . Springer , 2012 . P. Christen. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012."},{"key":"e_1_2_1_17_1","first-page":"1","article-title":"media type for javascript object notation (JSON)","volume":"4627","author":"Crockford D.","year":"2006","unstructured":"D. Crockford . The application\/json media type for javascript object notation (JSON) . Request for Comment , 4627 : 1 -- 10 , 2006 . D. Crockford. The application\/json media type for javascript object notation (JSON). Request for Comment, 4627:1--10, 2006.","journal-title":"Request for Comment"},{"key":"e_1_2_1_18_1","unstructured":"CrowdFlower. 2017 Data Scientist Report. http:\/\/visit.crowdflower.com\/rs\/416-ZBE-142\/images\/CrowdFlower_DataScienceReport_2016.pdf Date accessed: July 15 2019.  CrowdFlower. 2017 Data Scientist Report. http:\/\/visit.crowdflower.com\/rs\/416-ZBE-142\/images\/CrowdFlower_DataScienceReport_2016.pdf Date accessed: July 15 2019."},{"key":"e_1_2_1_19_1","volume-title":"CIDR","author":"Deng D.","year":"2017","unstructured":"D. Deng , R. C. Fernandez , Z. Abedjan , S. Wang , M. Stonebraker , A. K. Elmagarmid , I. F. Ilyas , S. Madden , M. Ouzzani , and N. Tang . The data civilizer system . In CIDR , 2017 . D. Deng, R. C. Fernandez, Z. Abedjan, S. Wang, M. Stonebraker, A. K. Elmagarmid, I. F. Ilyas, S. Madden, M. Ouzzani, and N. Tang. The data civilizer system. In CIDR, 2017."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544886"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989337"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02463-4_12"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/645505.656440"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1085304.1085309"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1107499.1107502"},{"key":"e_1_2_1_26_1","first-page":"276","volume-title":"VLDB","author":"Haas L. M.","year":"1997","unstructured":"L. M. Haas , D. Kossmann , E. L. Wimmers , and J. Yang . Optimizing queries across diverse data sources . In VLDB , pages 276 -- 285 , 1997 . L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang. Optimizing queries across diverse data sources. In VLDB, pages 276--285, 1997."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903730"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646084"},{"key":"e_1_2_1_29_1","unstructured":"O. Hassanzadeh A. Kementsietsidis L. Lim R. J. Miller and M. Wang. LinkedCT: A linked data space for clinical trials. CoRR abs\/0908.0567 2009.  O. Hassanzadeh A. Kementsietsidis L. Lim R. J. Miller and M. Wang. LinkedCT: A linked data space for clinical trials. CoRR abs\/0908.0567 2009."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-25010-6_16"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536345"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872784"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/4229.4233"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452440"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1137\/0215061"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2068"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7363784"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872783"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519103.1519110"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.140"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196959.3196991"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137657"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872518.2889386"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2015.05.001"},{"key":"e_1_2_1_46_1","first-page":"251","volume-title":"VLDB","author":"Levy A. Y.","year":"1996","unstructured":"A. Y. Levy , A. Rajaraman , and J. J. Ordille . Querying heterogeneous information sources using source descriptions . In VLDB , pages 251 -- 262 , 1996 . A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying heterogeneous information sources using source descriptions. In VLDB, pages 251--262, 1996."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497434"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137833"},{"key":"e_1_2_1_49_1","first-page":"2677","volume-title":"IJCAI","author":"Ling X.","year":"2013","unstructured":"X. Ling , A. Y. Halevy , F. Wu , and C. Yu . Synthesizing union tables from the web . In IJCAI , pages 2677 -- 2683 , 2013 . X. Ling, A. Y. Halevy, F. Wu, and C. Yu. Synthesizing union tables from the web. In IJCAI, pages 2677--2683, 2013."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947620"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402790"},{"key":"e_1_2_1_52_1","volume-title":"Schema mapping and data exchange tools: Time for the golden age. it - Information Technology, 54(3):105--113","author":"Mecca G.","year":"2012","unstructured":"G. Mecca and P. Papotti . Schema mapping and data exchange tools: Time for the golden age. it - Information Technology, 54(3):105--113 , 2012 . G. Mecca and P. Papotti. Schema mapping and data exchange tools: Time for the golden age. it - Information Technology, 54(3):105--113, 2012."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276322"},{"key":"e_1_2_1_54_1","first-page":"77","volume-title":"VLDB","author":"Miller R. J.","year":"2000","unstructured":"R. J. Miller , L. M. Haas , and M. A. Hern\u00e1ndez . Schema mapping as query discovery . In VLDB , pages 77 -- 88 , 2000 . R. J. Miller, L. M. Haas, and M. A. Hern\u00e1ndez. Schema mapping as query discovery. In VLDB, pages 77--88, 2000."},{"key":"e_1_2_1_55_1","first-page":"120","volume-title":"VLDB","author":"Miller R. J.","year":"1993","unstructured":"R. J. Miller , Y. E. Ioannidis , and R. Ramakrishnan . The use of information capacity in schema integration and translation . In VLDB , pages 120 -- 133 , 1993 . R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. The use of information capacity in schema integration and translation. In VLDB, pages 120--133, 1993."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4379(94)90024-8"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687649"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336665"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16518-4_1"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/2797115.2797118"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3186549.3186559"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213962"},{"key":"e_1_2_1_64_1","first-page":"1","article-title":"Common format and MIME type for comma-separated values (CSV) files","volume":"4180","author":"Shafranovich Y.","year":"2005","unstructured":"Y. Shafranovich . Common format and MIME type for comma-separated values (CSV) files . Request for Comment , 4180 : 1 -- 8 , 2005 . Y. Shafranovich. Common format and MIME type for comma-separated values (CSV) files. Request for Comment, 4180:1--8, 2005.","journal-title":"Request for Comment"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/11687238_8"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_67_1","volume-title":"https:\/\/opengovdata.io\/","author":"Tauberer J.","year":"2014","unstructured":"J. Tauberer . Open Government Data ( The Book). https:\/\/opengovdata.io\/ , 2014 . Second Edition. Date accessed: July 15, 2018. J. Tauberer. Open Government Data (The Book). https:\/\/opengovdata.io\/, 2014. Second Edition. Date accessed: July 15, 2018."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452479"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3105959"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247531"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113311"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213847"},{"issue":"2","key":"e_1_2_1_73_1","first-page":"47","article-title":"Explaining data integration","volume":"41","author":"Wang X.","year":"2018","unstructured":"X. Wang , L. M. Haas , and A. Meliou . Explaining data integration . IEEE Data Eng. Bull. , 41 ( 2 ): 47 -- 58 , 2018 . X. Wang, L. M. Haas, and A. Meliou. Explaining data integration. IEEE Data Eng. Bull., 41(2):47--58, 2018.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.111"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213848"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115409"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137788"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3229863.3240491","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:16:55Z","timestamp":1672222615000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3229863.3240491"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8]]},"references-count":78,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,8]]}},"alternative-id":["10.14778\/3229863.3240491"],"URL":"https:\/\/doi.org\/10.14778\/3229863.3240491","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,8]]}}}