{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T10:34:55Z","timestamp":1756895695891},"reference-count":7,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:p>\n            Successful data-driven science requires a complex combination of data engineering pipelines and data modelling techniques. Robust and defensible results can only be achieved when each step in the pipeline that is designed to clean, transform and alter data in preparation for data modelling can be justified, and its effect on the data explained. The\n            <jats:bold>DPDS<\/jats:bold>\n            toolkit presented in this paper is designed to make such justification and explanation process an integral part of data science practice, adding value while remaining as un-intrusive as possible to the analyst. Catering to the broad community of python\/pandas data engineers,\n            <jats:bold>DPDS<\/jats:bold>\n            implements an observer pattern that is able to capture the fine-grained provenance associated with each individual element of a dataframe, across multiple transformation steps. The resulting provenance graph is stored in Neo4j and queried through a UI, with the goal of helping engineers and analysts to justify and explain their choice of data operations, from raw data to model training, by highlighting the details of the changes through each transformation.\n          <\/jats:p>","DOI":"10.14778\/3554821.3554857","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:28:39Z","timestamp":1664490519000},"page":"3614-3617","source":"Crossref","is-referenced-by-count":6,"title":["DPDS"],"prefix":"10.14778","volume":"15","author":[{"given":"Adriane","family":"Chapman","sequence":"first","affiliation":[{"name":"University of Southampton, UK"}]},{"given":"Luca","family":"Lauro","sequence":"additional","affiliation":[{"name":"Universit\u00e0 Roma Tre, Italy"}]},{"given":"Paolo","family":"Missier","sequence":"additional","affiliation":[{"name":"Newcastle University, UK"}]},{"given":"Riccardo","family":"Torlone","sequence":"additional","affiliation":[{"name":"Universit\u00e0 Roma Tre, Italy"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proc. of Neural Information Processing Systems. 11301--11311","author":"Alaa Ahmed M","unstructured":"Ahmed M Alaa and Mihaela van der Schaar. 2019. Demystifying Black-box Models with Symbolic Metamodels . In Proc. of Neural Information Processing Systems. 11301--11311 . Ahmed M Alaa and Mihaela van der Schaar. 2019. Demystifying Black-box Models with Symbolic Metamodels. In Proc. of Neural Information Processing Systems. 11301--11311."},{"key":"e_1_2_1_2_1","volume-title":"ProPublica","author":"Angwin Julia","year":"2016","unstructured":"Julia Angwin , Jeff Larson , Surya Mattu , and Lauren Kirchner . 2016 . Machine bias . ProPublica , May 23 (2016), 139--159. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica, May 23 (2016), 139--159."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436911"},{"key":"e_1_2_1_4_1","first-page":"2349","article-title":"Orange: Data Mining Toolbox in Python","volume":"14","author":"Janez Dem\u0161ar","year":"2013","unstructured":"Janez Dem\u0161ar et al. 2013 . Orange: Data Mining Toolbox in Python . The Journal of Machine Learning Research 14 , 1 (2013), 2349 -- 2353 . Janez Dem\u0161ar et al. 2013. Orange: Data Mining Toolbox in Python. The Journal of Machine Learning Research 14, 1 (2013), 2349--2353.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407807"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3554821.3554857","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:31:42Z","timestamp":1672227102000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3554821.3554857"}},"subtitle":["assisting data science with data provenance"],"short-title":[],"issued":{"date-parts":[[2022,8]]},"references-count":7,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["10.14778\/3554821.3554857"],"URL":"https:\/\/doi.org\/10.14778\/3554821.3554857","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,8]]}}}