{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T05:28:30Z","timestamp":1768109310355,"version":"3.49.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,2,18]],"date-time":"2015-02-18T00:00:00Z","timestamp":1424217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2015,2,18]]},"abstract":"<jats:p>While much work has focused on efficient processing of Big Data, little work considers how to understand them. In this paper, we describe Helix, a system for guided exploration of Big Data. Helix provides a unified view of sources, ranging from spreadsheets and XML files with no schema, all the way to RDF graphs and relational data with well-defined schemas. Helix users explore these heterogeneous data sources through a combination of keyword searches and navigation of linked web pages that include information about the schemas, as well as data and semantic links within and across sources. At a technical level, the paper describes the research challenges involved in developing Helix, along with a set of real-world usage scenarios and the lessons learned.<\/jats:p>","DOI":"10.1145\/2737817.2737829","type":"journal-article","created":{"date-parts":[[2015,2,18]],"date-time":"2015-02-18T13:24:05Z","timestamp":1424265845000},"page":"43-54","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Exploring Big Data with Helix"],"prefix":"10.1145","volume":"43","author":[{"given":"Jason","family":"Ellis","sequence":"first","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Achille","family":"Fokoue","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oktie","family":"Hassanzadeh","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anastasios","family":"Kementsietsidis","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kavitha","family":"Srinivas","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael J.","family":"Ward","sequence":"additional","affiliation":[{"name":"IBM Research"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,2,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1951365.1951432"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463718"},{"key":"e_1_2_1_3_1","first-page":"21","volume-title":"SEQUENCES","author":"Broder A. Z.","year":"1997","unstructured":"A. Z. Broder . On The Resemblance and Containment of Documents . In SEQUENCES , pages 21 -- 29 . IEEE Computer Society , 1997 . A. Z. Broder. On The Resemblance and Containment of Documents. In SEQUENCES, pages 21--29. IEEE Computer Society, 1997."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_2_1_6_1","volume-title":"ISWC Posters&Demos","author":"Cohen Marcelo","year":"2010","unstructured":"Marcelo Cohen and Daniel Schwabe . RExplorator - Supporting Reusable Explorations of Semantic Web Linked Data . In ISWC Posters&Demos , 2010 . Marcelo Cohen and Daniel Schwabe. RExplorator - Supporting Reusable Explorations of Semantic Web Linked Data. In ISWC Posters&Demos, 2010."},{"key":"e_1_2_1_7_1","first-page":"2009","article-title":"de Araujo and D. Schwabe. Explorator: A Tool for Exploring RDF Data through Direct Manipulation","year":"2009","unstructured":"S.F.C . de Araujo and D. Schwabe. Explorator: A Tool for Exploring RDF Data through Direct Manipulation . In LDOW 2009 , 2009 . S.F.C. de Araujo and D. Schwabe. Explorator: A Tool for Exploring RDF Data through Direct Manipulation. In LDOW2009, 2009.","journal-title":"LDOW"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247487"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35176-1_4"},{"key":"e_1_2_1_10_1","first-page":"436","volume-title":"Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats","author":"Goldman R.","year":"1999","unstructured":"R. Goldman and J. Widom . Approximate dataguides . In Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats , pages 436 -- 445 , 1999 . R. Goldman and J. Widom. Approximate dataguides. In Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats, pages 436--445, 1999."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807286"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142352"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687771"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963192.1963295"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25008-8_8"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536345"},{"key":"e_1_2_1_17_1","volume-title":"WebDB","author":"Yeganeh S. Hassas","year":"2011","unstructured":"S. Hassas Yeganeh , O. Hassanzadeh , and R. J. Miller . Linking Semistructured Data on the Web . In WebDB , 2011 . S. Hassas Yeganeh, O. Hassanzadeh, and R. J. Miller. Linking Semistructured Data on the Web. In WebDB, 2011."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35173-0_10"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1121949.1121979"},{"key":"e_1_2_1_20_1","first-page":"79","volume-title":"Hierarchial Data. In ICDE","author":"Nestorov S.","year":"1997","unstructured":"S. Nestorov , J. D. Ullman , J. L. Wiener , and S. S. Chawathe . Representative Objects: Concise Representations of Semistructured , Hierarchial Data. In ICDE , pages 79 -- 90 , 1997 . S. Nestorov, J. D. Ullman, J. L. Wiener, and S. S. Chawathe. Representative Objects: Concise Representations of Semistructured, Hierarchial Data. In ICDE, pages 79--90, 1997."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780100057"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367880"},{"key":"e_1_2_1_23_1","first-page":"691","volume-title":"VLDB","author":"Sismanis Y.","year":"2006","unstructured":"Y. Sismanis , P. Brown , P. J. Haas , and B. Reinwald . GORDIAN: Efficient and Scalable Discovery of Composite Keys . In VLDB , pages 691 -- 702 , 2006 . Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald. GORDIAN: Efficient and Scalable Discovery of Composite Keys. In VLDB, pages 691--702, 2006."},{"key":"e_1_2_1_24_1","unstructured":"SPARQL Query Language for RDF. http:\/\/www.w3.org\/TR\/rdf-sparql-query\/.  SPARQL Query Language for RDF. http:\/\/www.w3.org\/TR\/rdf-sparql-query\/."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453941"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.119"},{"key":"e_1_2_1_27_1","first-page":"663","volume-title":"VLDB","author":"Vaz Salles M. A.","year":"2007","unstructured":"M. A. Vaz Salles , J.-P. Dittrich , S. K. Karakashian , O. R. Girard , and L. Blunschi . iTrails: Pay-as-you-go Information Integration in Dataspaces . In VLDB , pages 663 -- 674 . 2007 . M. A. Vaz Salles, J.-P. Dittrich, S. K. Karakashian, O. R. Girard, and L. Blunschi. iTrails: Pay-as-you-go Information Integration in Dataspaces. In VLDB, pages 663--674. 2007."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240866.1241100"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2737817.2737829","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2737817.2737829","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:48:11Z","timestamp":1750225691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2737817.2737829"}},"subtitle":["Finding Needles in a Big Haystack"],"short-title":[],"issued":{"date-parts":[[2015,2,18]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,2,18]]}},"alternative-id":["10.1145\/2737817.2737829"],"URL":"https:\/\/doi.org\/10.1145\/2737817.2737829","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2015,2,18]]},"assertion":[{"value":"2015-02-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}