{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T12:33:25Z","timestamp":1752669205177,"version":"3.41.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:00:00Z","timestamp":1619481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>Assessing and improving the quality of data are fundamental challenges in Big-Data applications. These challenges have given rise to numerous solutions targeting transformation, integration, and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between standalone tools in these areas. In this article, we focus on the problem of determining whether the available data-transforming procedures can be used together to bring about the desired quality characteristics of the data in business or analytics processes. For example, to help an organization avoid building a data-quality solution from scratch when facing a new analytics task, we ask whether the data quality can be improved by reusing the tools that are already available, and if so, which tools to apply, and in which order, all without presuming knowledge of the internals of the tools, which may be external or proprietary.<\/jats:p>\n          <jats:p>Toward addressing this problem, we conduct a formal study in which individual data cleaning, data migration, or other data-transforming tools are abstracted as black-box procedures with only some of the properties exposed, such as their applicability requirements, the parts of the data that the procedure modifies, and the conditions that the data satisfy once the procedure has been applied. As a proof of concept, we provide foundational results on sequential applications of procedures abstracted in this way, to achieve prespecified data-quality objectives, for the use case of relational data and for procedures described by standard relational constraints. We show that, while reasoning in this framework may be computationally infeasible in general, there exist well-behaved cases in which these foundational results can be applied in practice for achieving desired data-quality results on Big Data.<\/jats:p>","DOI":"10.1145\/3428154","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T14:14:25Z","timestamp":1619532865000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse"],"prefix":"10.1145","volume":"13","author":[{"given":"Rada","family":"Chirkova","sequence":"first","affiliation":[{"name":"North Carolina State University, Raleigh, NC, USA"}]},{"given":"Jon","family":"Doyle","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC, USA"}]},{"given":"Juan","family":"Reutter","sequence":"additional","affiliation":[{"name":"Pontificia Universidad Cat\u00f3lica de Chile, Santiago, Chile"}]}],"member":"320","published-online":{"date-parts":[[2021,4,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/551350"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3143803"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3104031"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/2983200.2983204"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508028.2505985"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/885746"},{"key":"e_1_2_1_7_1","volume-title":"Proc. AAAI. 1728\u20131735","author":"Bajraktari Labinot","year":"2018","unstructured":"Labinot Bajraktari , Magdalena Ortiz , and Mantas Simkus . 2018 . Combining rules and ontologies into Clopen knowledge bases . In Proc. AAAI. 1728\u20131735 . Labinot Bajraktari, Magdalena Ortiz, and Mantas Simkus. 2018. Combining rules and ontologies into Clopen knowledge bases. In Proc. AAAI. 1728\u20131735."},{"key":"e_1_2_1_8_1","volume-title":"Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena.","author":"Barany Vince","year":"2014","unstructured":"Vince Barany , Balder ten Cate , Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2014 . Declarative statistical modeling with datalog. arXiv preprint arXiv:1412.2221. Vince Barany, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2014. Declarative statistical modeling with datalog. arXiv preprint arXiv:1412.2221."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/2750423.2750432"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Open Interoperability Workshop on Enterprise Modelling and Ontologies for Interoperability.","author":"Berardi Daniela","year":"2005","unstructured":"Daniela Berardi , Diego Calvanese , Giuseppe De Giacomo , Richard Hull , Maurizio Lenzerini , and Massimo Mecella . 2005 . Modeling data & processes for service specifications in Colombo . In Proceedings of the Open Interoperability Workshop on Enterprise Modelling and Ontologies for Interoperability. Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Richard Hull, Maurizio Lenzerini, and Massimo Mecella. 2005. Modeling data & processes for service specifications in Colombo. In Proceedings of the Open Interoperability Workshop on Enterprise Modelling and Ontologies for Interoperability."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 13th Italian Symposium on Advanced Database Systems (SEBD\u201905)","author":"Berardi Daniela","year":"2005","unstructured":"Daniela Berardi , Diego Calvanese , Giuseppe De Giacomo , Richard Hull , and Massimo Mecella . 2005 . Automatic composition of web services in Colombo . In Proceedings of the 13th Italian Symposium on Advanced Database Systems (SEBD\u201905) . 8\u201315. Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Richard Hull, and Massimo Mecella. 2005. Automatic composition of web services in Colombo. In Proceedings of the 13th Italian Symposium on Advanced Database Systems (SEBD\u201905). 8\u201315."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218843005001201"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824096"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2737786"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1793114.1793141"},{"volume-title":"Reasoning Web International Summer School","author":"Bienvenu Meghyn","key":"e_1_2_1_16_1","unstructured":"Meghyn Bienvenu and Magdalena Ortiz . 2015. Ontology-mediated query answering with data-tractable description logics . In Reasoning Web International Summer School . Springer , 218\u2013307. Meghyn Bienvenu and Magdalena Ortiz. 2015. Ontology-mediated query answering with data-tractable description logics. In Reasoning Web International Summer School. Springer, 218\u2013307."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/257572.257636"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2019.101478"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463664.2467796"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0960129520000067"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2017.08.007"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/0214049"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-47666-6_9"},{"volume-title":"Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management.","author":"Chirkova Rada","key":"e_1_2_1_24_1","unstructured":"Rada Chirkova , Jon Doyle , and Juan L. Reutter . 2018. The data readiness problem for relational databases . In Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management. Rada Chirkova, Jon Doyle, and Juan L. Reutter. 2018. The data readiness problem for relational databases. In Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management."},{"key":"e_1_2_1_25_1","volume-title":"Obtaining information about queries behind views and dependencies. The Computing Research Repository (CoRR) abstract abs\/1403.5199","author":"Chirkova Rada","year":"2014","unstructured":"Rada Chirkova and Ting Yu. 2014. Obtaining information about queries behind views and dependencies. The Computing Research Repository (CoRR) abstract abs\/1403.5199 ( 2014 ). http:\/\/arxiv.org\/abs\/1403.5199 Rada Chirkova and Ting Yu. 2014. Obtaining information about queries behind views and dependencies. The Computing Research Repository (CoRR) abstract abs\/1403.5199 (2014). http:\/\/arxiv.org\/abs\/1403.5199"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/78935.78937"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218843012500025"},{"key":"e_1_2_1_28_1","unstructured":"Giuseppe De Giacomo Eugenia Ternovska and Ray Reiter. 2019. Non-terminating processes in the situation calculus. Ann. Math.ematics and Artif. Intell. (2019) 1\u201318.  Giuseppe De Giacomo Eugenia Ternovska and Ray Reiter. 2019. Non-terminating processes in the situation calculus. Ann. Math.ematics and Artif. Intell. (2019) 1\u201318."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/2371234"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694428.2694430"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3321487"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/648294.754678"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/548222"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/2401764"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2699442"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/1085304.1085309"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/2371176"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463664.2465221"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3412841.3441886"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/1201627"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142357"},{"key":"e_1_2_1_43_1","first-page":"59","article-title":"SampleClean: Fast and reliable analytics on dirty data","volume":"38","author":"Krishnan Sanjay","year":"2015","unstructured":"Sanjay Krishnan , Jiannan Wang , Michael J. Franklin , Ken Goldberg , Tim Kraska , Tova Milo , and Eugene Wu . 2015 . SampleClean: Fast and reliable analytics on dirty data . IEEE Data Eng. Bull. 38 , 3 (2015), 59 \u2013 75 . Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo, and Eugene Wu. 2015. SampleClean: Fast and reliable analytics on dirty data. IEEE Data Eng. Bull. 38, 3 (2015), 59\u201375.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1965351"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1093\/logcom\/4.5.655"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(96)00044-6"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/2832581.2832684"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741142"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3449118"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/2615731.2615759"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806416.2806439"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750544"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(93)90109-O"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-1066(95)00049-P"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/322217.322221"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the 19th International Conference on Database Theory (ICDT\u201916)","author":"Savkovic Ognjen","year":"2016","unstructured":"Ognjen Savkovic , Elisa Marengo , and Werner Nutt . 2016 . Query stability in monotonic data-aware business processes . In Proceedings of the 19th International Conference on Database Theory (ICDT\u201916) . 16:1\u201316:18. Ognjen Savkovic, Elisa Marengo, and Werner Nutt. 2016. Query stability in monotonic data-aware business processes. In Proceedings of the 19th International Conference on Database Theory (ICDT\u201916). 16:1\u201316:18."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(02)00365-X"},{"key":"e_1_2_1_58_1","volume-title":"AMW'17","author":"Sequeda Juan F.","year":"2017","unstructured":"Juan F. Sequeda . 2017 . Ontology based data access: Where do the ontologies and mappings come from? In AMW'17 . Juan F. Sequeda. 2017. Ontology based data access: Where do the ontologies and mappings come from? In AMW'17."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/1514894.1514896"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275515"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428154","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3428154","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:23Z","timestamp":1750195463000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3428154"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,27]]},"references-count":59,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3428154"],"URL":"https:\/\/doi.org\/10.1145\/3428154","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2021,4,27]]},"assertion":[{"value":"2020-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}