{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T22:59:24Z","timestamp":1775084364164,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2018,3,1]],"date-time":"2018-03-01T00:00:00Z","timestamp":1519862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DEB-1237491"],"award-info":[{"award-number":["DEB-1237491"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-1459519"],"award-info":[{"award-number":["DBI-1459519"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science Foundataion","award":["SSI-1450277"],"award-info":[{"award-number":["SSI-1450277"]}]},{"DOI":"10.13039\/100007229","name":"Harvard University","doi-asserted-by":"publisher","award":["Bullard Fellowship"],"award-info":[{"award-number":["Bullard Fellowship"]}],"id":[{"id":"10.13039\/100007229","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Data provenance is the history of an item of data from the point of its creation to its present state. It can support science by improving understanding of and confidence in data. RDataTracker is an R package that collects data provenance from R scripts (https:\/\/github.com\/End-to-end-provenance\/RDataTracker). In addition to details on inputs, outputs, and the computing environment collected by most provenance tools, RDataTracker also records a detailed execution trace and intermediate data values. It does this using R\u2019s powerful introspection functions and by parsing R statements prior to sending them to the interpreter so it knows what provenance to collect. The provenance is stored in a specialized graph structure called a Data Derivation Graph, which makes it possible to determine exactly how an output value is computed or how an input value is used. In this paper, we provide details about the provenance RDataTracker collects and the mechanisms used to collect it. We also speculate about how this rich source of information could be used by other tools to help an R programmer gain a deeper understanding of the software used and to support reproducibility.<\/jats:p>","DOI":"10.3390\/informatics5010012","type":"journal-article","created":{"date-parts":[[2018,3,1]],"date-time":"2018-03-01T12:15:44Z","timestamp":1519906544000},"page":"12","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Using Introspection to Collect Provenance in R"],"prefix":"10.3390","volume":"5","author":[{"given":"Barbara","family":"Lerner","sequence":"first","affiliation":[{"name":"Computer Science Department, Mount Holyoke College, South Hadley, MA 01075, USA"}]},{"given":"Emery","family":"Boose","sequence":"additional","affiliation":[{"name":"Harvard Forest, Harvard University, Petersham, MA 01366, USA"}]},{"given":"Luis","family":"Perez","sequence":"additional","affiliation":[{"name":"Harvard College, Cambridge, MA 02138, USA"}]}],"member":"1968","published-online":{"date-parts":[[2018,3,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"170114","DOI":"10.1038\/sdata.2017.114","article-title":"If these data could talk","volume":"4","author":"Pasquier","year":"2017","journal-title":"Sci. Data"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10723-006-9055-3","article-title":"The requirements of recording and using provenance in e-Science experiments","volume":"5","author":"Miles","year":"2005","journal-title":"J. Grid Comput."},{"key":"ref_3","unstructured":"Lerner, B.S., and Boose, E.R. (2014, January 12\u201313). RDataTracker: Collecting provenance in an interactive scripting environment. Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance, Cologne, Germany."},{"key":"ref_4","unstructured":"Lerner, B.S., and Boose, E.R. (2014, January 9\u201313). RDataTracker and DDG Explorer\u2014Capture, visualization and querying of provenance from R scripts. Proceedings of the 5th International Provenance and Annotation Workshop, Cologne, Germany."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Altintas, I., Barney, O., and Jaeger-Frank, E. (2006, January 3\u20135). Provenance Collection Support in the Kepler Scientific Workflow System. Proceedings of the 1st International Provenance and Annotation Workshop, Chicago, IL, USA.","DOI":"10.1007\/11890850_14"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1002\/cpe.1231","article-title":"Mining Taverna\u2019s semantic web of provenance","volume":"20","author":"Zhao","year":"2008","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1111\/j.1467-8659.2010.01830.x","article-title":"Using VisTrails and Provenance for Teaching Scientific Visualization","volume":"30","author":"Silva","year":"2011","journal-title":"Comput. Graph. Forum"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., and Horton, N.J. (2014). R Markdown: Integrating a Reproducible Analysis Tool into Introductory Statistics. Technol. Innov. Stat. Educ., 8, Available online: https:\/\/escholarship.org\/uc\/item\/90b2f5xh.","DOI":"10.5070\/T581020118"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Acu\u00f1a, R., Lacroix, Z., and Bazzi, R.A. (July, January 27). Instrumentation and Trace Analysis for Ad-hoc Python Workflows in Cloud Environments. Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing, New York, NY, USA.","DOI":"10.1109\/CLOUD.2015.25"},{"key":"ref_10","unstructured":"Guo, P.J., and Seltzer, M. (2012, January 14\u201315). BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance, Boston, MA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Miao, H., Chavan, A., and Deshpande, A. (2017, January 14). ProvDB: Lifecycle Management of Collaborative Analysis Workflows. Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, Chicago, IL, USA.","DOI":"10.1145\/3077257.3077267"},{"key":"ref_12","unstructured":"Hellerstein, J.M., Sreekanti, V., Gonzalez, J.E., Dalton, J., Dey, A., Nag, S., Ramachandran, K., Arora, S., Bhattacharyya, A., and Das, S. (2017, January 8\u201311). Ground: A Data Context Service. Proceedings of the Conference on Innovative Data Systems Research \u201917, Chaminade, CA, USA."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"298","DOI":"10.2218\/ijdc.v10i1.370","article-title":"YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts","volume":"10","author":"McPhillips","year":"2015","journal-title":"Int. J. Digit. Curation"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1137\/0909049","article-title":"Auditing of data analyses","volume":"9","author":"Becker","year":"1988","journal-title":"SIAM J. Sci. Stat. Comput."},{"key":"ref_15","unstructured":"Slaughter, P., Jones, M.B., Jones, C., and Palmer, L. (2018, February 27). Recordr. Available online: https:\/\/github.com\/NCEAS\/recordr."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, Z., and Pounds, S. (2014). An R package that automatically collects and archives details for reproducible computing. BMC Bioinform., 15.","DOI":"10.1186\/1471-2105-15-138"},{"key":"ref_17","unstructured":"Mattoso, M., and Glavic, B. (2016). Intermediate Notation for Provenance and Workflow Reproducibility. International Provenance and Annotation Workshop, Springer International Publishing. number 9672 in Lecture Notes in Computer Science."},{"key":"ref_18","unstructured":"Tariq, D., Ali, M., and Gehani, A. (2012, January 14\u201315). Towards Automated Collection of Application-level Data Provenance. Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance, Boston, MA, USA."},{"key":"ref_19","unstructured":"Guo, P.J., and Engler, D. (2010, January 22). Towards Practical Incremental Recomputation for Scientists: An Implementation for the Python Language. Proceedings of the 2nd USENIX Workshop on the Theory and Practice of Provenance, San Jose, CA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Guo, P.J., and Engler, D. (2011, January 17\u201321). Using Automatic Persistent Memoization to Facilitate Data Analysis Scripting. Proceedings of the 2011 International Symposium on Software Testing and Analysis, Toronto, ON, Canada.","DOI":"10.1145\/2001420.2001455"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Silles, C.A., and Runnalls, A.R. (2010, January 15\u201316). Provenance-Awareness in R. Proceedings of the 3rd International Provenance and Annotation Workshop, Troy, NY, USA.","DOI":"10.1007\/978-3-642-17819-1_8"},{"key":"ref_22","unstructured":"Pasquier, T. (2017). CamFlow\/cytoscape.js-prov: Initial release. Zenodo."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Murta, L., Braganholo, V., Chirigati, F., Koop, D., and Freire, J. (2014, January 9\u201313). NoWorkflow: Capturing and Analyzing Provenance of Scripts. Proceedings of the 5th International Provenance and Annotation Workshop, Cologne, Germany.","DOI":"10.1007\/978-3-319-16462-5_6"},{"key":"ref_24","unstructured":"Pimentel, J.F.N., Braganholo, V., Murta, L., and Freire, J. (2015, January 8\u20139). Collecting and Analyzing Provenance on Interactive Notebooks: When IPython meets noWorkflow. Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance, Edinburgh, Scotland."},{"key":"ref_25","unstructured":"Mattoso, M., and Glavic, B. (2016). Fine-Grained Provenance Collection over Scripts Through Program Slicing. Proceedings of the 6th International Provenance and Annotation Workshop, McLean, VA, USA, 7\u20138 June 2016, Springer. Lecture Notes in Computer Science."},{"key":"ref_26","unstructured":"Wickham, H. (2014). Advanced R, Chapman and Hall\/CRC."},{"key":"ref_27","unstructured":"Grolemund, G., and Wickham, H. (2017). R for Data Science, O\u2019Reilly. Chapter 18: Pipes."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Pasquier, T., Lau, M.K., Han, X., Fong, E., Lerner, B.S., Boose, E., Crosas, M., Ellison, A., and Seltzer, M. (2018). Sharing and Preserving Computational Analyses for Posterity with encapsulator. IEEE Comput. Sci. Eng., under review.","DOI":"10.1109\/MCSE.2018.042781334"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/5\/1\/12\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:57:02Z","timestamp":1760194622000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/5\/1\/12"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,1]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,3]]}},"alternative-id":["informatics5010012"],"URL":"https:\/\/doi.org\/10.3390\/informatics5010012","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,3,1]]}}}