{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T06:01:49Z","timestamp":1770444109207,"version":"3.49.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,11]]},"abstract":"<jats:p>\n            Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result programmers spend countless hours collecting evidence (\n            <jats:italic>e.g.<\/jats:italic>\n            , from log files) and performing trial and error debugging. To aid this effort, we built\n            <jats:italic>Titian<\/jats:italic>\n            , a library that enables\n            <jats:italic>data provenance<\/jats:italic>\n            ---tracking data through transformations---in Apache Spark. Data scientists using the Titian Spark extension will be able to quickly identify the input data at the root cause of a potential bug or outlier result. Titian is built directly into the Spark platform and offers data provenance support at interactive speeds---orders-of-magnitude faster than alternative solutions---while minimally impacting Spark job performance; observed overheads for capturing data lineage rarely exceed 30% above the baseline job execution time.\n          <\/jats:p>","DOI":"10.14778\/2850583.2850595","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T14:10:31Z","timestamp":1454335831000},"page":"216-227","source":"Crossref","is-referenced-by-count":97,"title":["Titian"],"prefix":"10.14778","volume":"9","author":[{"given":"Matteo","family":"Interlandi","sequence":"first","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Kshitij","family":"Shah","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Sai Deep","family":"Tetali","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Muhammad Ali","family":"Gulzar","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Seunghyun","family":"Yoo","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Miryung","family":"Kim","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Todd","family":"Millstein","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]},{"given":"Tyson","family":"Condie","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles"}]}],"member":"320","published-online":{"date-parts":[[2015,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Hadoop. http:\/\/hadoop.apache.org.  Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"Mllib. http:\/\/spark.apache.org\/mllib.  Mllib. http:\/\/spark.apache.org\/mllib."},{"key":"e_1_2_1_3_1","unstructured":"Spark. http:\/\/spark.apache.org.  Spark. http:\/\/spark.apache.org."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723711"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2095686.2095693"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1739041.1739078"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497516"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767921"},{"key":"e_1_2_1_10_1","volume-title":"Better bitmap performance with Roaring bitmaps. ArXiv e-prints","author":"Chambi S.","year":"2014","unstructured":"S. Chambi , D. Lemire , O. Kaser , and R. Godin . Better bitmap performance with Roaring bitmaps. ArXiv e-prints , 2014 . S. Chambi, D. Lemire, O. Kaser, and R. Godin. Better bitmap performance with Roaring bitmaps. ArXiv e-prints, 2014."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-002-0083-8"},{"key":"e_1_2_1_12_1","volume-title":"Arthur: Rich post-facto debugging for production analytics applications","author":"Dave A.","year":"2013","unstructured":"A. Dave , M. Zaharia , S. Shenker , and I. Stoica . Arthur: Rich post-facto debugging for production analytics applications , 2013 . A. Dave, M. Zaharia, S. Shenker, and I. Stoica. Arthur: Rich post-facto debugging for production analytics applications, 2013."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.15"},{"key":"e_1_2_1_15_1","first-page":"599","volume-title":"OSDI","author":"Gonzalez J. E.","year":"2014","unstructured":"J. E. Gonzalez , R. S. Xin , A. Dave , : Graph processing in a distributed dataflow framework . In OSDI , pages 599 -- 613 , 2014 . J. E. Gonzalez, R. S. Xin, A. Dave, et al., Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014."},{"key":"e_1_2_1_16_1","unstructured":"M. A. Gulzar M. Interlandi S. Yoo S. D. Tetali T. Condie T. Millstein and M. Kim. Bigdebug: Debugging primitives for interactive big data processing in spark. Under Submission.  M. A. Gulzar M. Interlandi S. Yoo S. D. Tetali T. Condie T. Millstein and M. Kim. Bigdebug: Debugging primitives for interactive big data processing in spark. Under Submission."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376716"},{"key":"e_1_2_1_18_1","first-page":"273","volume-title":"CIDR","author":"Ikeda R.","year":"2011","unstructured":"R. Ikeda , H. Park , and J. Widom . Provenance for generalized map and reduce workflows . In CIDR , pages 273 -- 283 , 2011 . R. Ikeda, H. Park, and J. Widom. Provenance for generalized map and reduce workflows. In CIDR, pages 273--283, 2011."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113269"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807269"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523619"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89965-5_4"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402758"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835958"},{"key":"e_1_2_1_27_1","volume-title":"NSDI","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia , M. Chowdhury , T. Das , Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing . In NSDI , 2012 . M. Zaharia, M. Chowdhury, T. Das, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043584"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807234"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2850583.2850595","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:17:13Z","timestamp":1672222633000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2850583.2850595"}},"subtitle":["data provenance support in Spark"],"short-title":[],"issued":{"date-parts":[[2015,11]]},"references-count":29,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2015,11]]}},"alternative-id":["10.14778\/2850583.2850595"],"URL":"https:\/\/doi.org\/10.14778\/2850583.2850595","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,11]]}}}