{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,28]],"date-time":"2025-12-28T19:49:04Z","timestamp":1766951344512},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>\n            While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for\n            <jats:italic>collaborative data analytics<\/jats:italic>\n            , especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a)\n            <jats:italic>flexible data storage, sharing, and native versioning capabilities:<\/jats:italic>\n            multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b)\n            <jats:italic>an app ecosystem that hosts apps for various data-processing activities:<\/jats:italic>\n            conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c)\n            <jats:italic>thrift-based data serialization permits data analysis in any combination of<\/jats:italic>\n            20+\n            <jats:italic>languages, with DataHub as the common data store:<\/jats:italic>\n            conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the\n            <jats:italic>DataHub notebook<\/jats:italic>\n            ---an IPython-based notebook for analyzing data and storing the results of data analysis.\n          <\/jats:p>","DOI":"10.14778\/2824032.2824100","type":"journal-article","created":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T12:18:17Z","timestamp":1442405897000},"page":"1916-1919","source":"Crossref","is-referenced-by-count":41,"title":["Collaborative data analytics with DataHub"],"prefix":"10.14778","volume":"8","author":[{"given":"Anant","family":"Bhardwaj","sequence":"first","affiliation":[{"name":"MIT"}]},{"given":"Amol","family":"Deshpande","sequence":"additional","affiliation":[{"name":"U. Maryland (UMD)"}]},{"given":"Aaron J.","family":"Elmore","sequence":"additional","affiliation":[{"name":"U. Chicago"}]},{"given":"David","family":"Karger","sequence":"additional","affiliation":[{"name":"MIT"}]},{"given":"Sam","family":"Madden","sequence":"additional","affiliation":[{"name":"MIT"}]},{"given":"Aditya","family":"Parameswaran","sequence":"additional","affiliation":[{"name":"U. Illinois (UIUC)"}]},{"given":"Harihar","family":"Subramanyam","sequence":"additional","affiliation":[{"name":"MIT"}]},{"given":"Eugene","family":"Wu","sequence":"additional","affiliation":[{"name":"Columbia"}]},{"given":"Rebecca","family":"Zhang","sequence":"additional","affiliation":[{"name":"MIT"}]}],"member":"320","published-online":{"date-parts":[[2015,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Fec presidential campaign finance (http:\/\/fec.gov\/disclosurep\/pnational.do). {Online; accessed 3-March-2014}.  Fec presidential campaign finance (http:\/\/fec.gov\/disclosurep\/pnational.do). {Online; accessed 3-March-2014}."},{"key":"e_1_2_1_2_1","first-page":"207","volume-title":"UIST","author":"Abouzied A.","year":"2012","unstructured":"A. Abouzied , J. M. Hellerstein , and A. Silberschatz . Dataplay: interactive tweaking and example-driven correction of graphical database queries . In UIST , pages 207 -- 218 , 2012 . 10.1145\/2380116.2380144 A. Abouzied, J. M. Hellerstein, and A. Silberschatz. Dataplay: interactive tweaking and example-driven correction of graphical database queries. In UIST, pages 207--218, 2012. 10.1145\/2380116.2380144"},{"key":"e_1_2_1_3_1","volume-title":"CIDR","author":"Bhardwaj A. P.","year":"2015","unstructured":"A. P. Bhardwaj , S. Bhattacherjee , A. Chavan , A. Deshpande , A. J. Elmore , S. Madden , and A. G. Parameswaran . Datahub: Collaborative data science & dataset version management at scale . In CIDR , 2015 . A. P. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G. Parameswaran. Datahub: Collaborative data science & dataset version management at scale. In CIDR, 2015."},{"key":"e_1_2_1_4_1","volume-title":"VLDB","author":"Bhattacherjee S.","year":"2015","unstructured":"S. Bhattacherjee , A. Chavan , S. Huang , A. Deshpande , and A. Parameswaran . Principles of dataset versioning: Exploring the recreation\/storage tradeoff . In VLDB , 2015 ( To appear). 10.14778\/2824032.2824035 S. Bhattacherjee, A. Chavan, S. Huang, A. Deshpande, and A. Parameswaran. Principles of dataset versioning: Exploring the recreation\/storage tradeoff. In VLDB, 2015 (To appear). 10.14778\/2824032.2824035"},{"key":"e_1_2_1_5_1","volume-title":"TAPP","author":"Chavan A.","year":"2015","unstructured":"A. Chavan , S. Huang , A. Deshpande , A. Elmore , S. Madden , and A. Parameswaran . Towards a unified query language for provenance and versioning . In TAPP , 2015 ( To appear). A. Chavan, S. Huang, A. Deshpande, A. Elmore, S. Madden, and A. Parameswaran. Towards a unified query language for provenance and versioning. In TAPP, 2015 (To appear)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000006"},{"key":"e_1_2_1_7_1","first-page":"1061","volume-title":"SIGMOD","author":"Gonzalez H.","year":"2010","unstructured":"H. Gonzalez , A. Y. Halevy , C. S. Jensen , A. Langen , J. Madhavan , R. Shapley , W. Shen , and J. Goldberg-Kidon . Google fusion tables: web-centered data management and collaboration . In SIGMOD , pages 1061 -- 1066 , 2010 . 10.1145\/1807167.1807286 H. Gonzalez, A. Y. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapley, W. Shen, and J. Goldberg-Kidon. Google fusion tables: web-centered data management and collaboration. In SIGMOD, pages 1061--1066, 2010. 10.1145\/1807167.1807286"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2240236.2240260"},{"key":"e_1_2_1_9_1","first-page":"3363","volume-title":"CHI","author":"Kandel S.","year":"2011","unstructured":"S. Kandel , A. Paepcke , J. M. Hellerstein , and J. Heer . Wrangler: interactive visual specification of data transformation scripts . In CHI , pages 3363 -- 3372 , 2011 . 10.1145\/1978942.1979444 S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. Wrangler: interactive visual specification of data transformation scripts. In CHI, pages 3363--3372, 2011. 10.1145\/1978942.1979444"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1400214.1400234"},{"key":"e_1_2_1_11_1","unstructured":"http:\/\/ipython.org. IPython (retrieved June 1 2014).  http:\/\/ipython.org. IPython (retrieved June 1 2014)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"http:\/\/thrift.apache.org. Apache Thrift (retrieved June 1 2014).  http:\/\/thrift.apache.org. Apache Thrift (retrieved June 1 2014).","DOI":"10.31273\/eirj.v1i1.67"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733035"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2824032.2824100","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:11:48Z","timestamp":1672222308000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2824032.2824100"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8]]},"references-count":13,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.14778\/2824032.2824100"],"URL":"https:\/\/doi.org\/10.14778\/2824032.2824100","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,8]]}}}