{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T20:15:03Z","timestamp":1765484103650,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,2,22]],"date-time":"2018-02-22T00:00:00Z","timestamp":1519257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2018,2,22]]},"abstract":"<jats:p>We outline a call to action for promoting empiricism in data quality research. The action points result from an analysis of the landscape of data quality research. The landscape exhibits two dimensions of empiricism in data quality research relating to type of metrics and scope of method. Our study indicates the presence of a data continuum ranging from real to synthetic data, which has implications for how data quality methods are evaluated. The dimensions of empiricism and their inter-relationships provide a means of positioning data quality research, and help expose limitations, gaps and opportunities.<\/jats:p>","DOI":"10.1145\/3186549.3186559","type":"journal-article","created":{"date-parts":[[2018,2,23]],"date-time":"2018-02-23T16:40:01Z","timestamp":1519404001000},"page":"35-43","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Data Quality"],"prefix":"10.1145","volume":"46","author":[{"given":"Shazia","family":"Sadiq","sequence":"first","affiliation":[{"name":"The University of Queensland, Queensland, Australia"}]},{"given":"Tamraparni","family":"Dasu","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research, Bedminster, NJ, USA"}]},{"given":"Xin Luna","family":"Dong","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, WA, USA"}]},{"given":"Juliana","family":"Freire","sequence":"additional","affiliation":[{"name":"New York University, NYC, NY, USA"}]},{"given":"Ihab F.","family":"Ilyas","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"given":"Sebastian","family":"Link","sequence":"additional","affiliation":[{"name":"The University of Auckland, Auckland, New Zealand"}]},{"given":"Miller J.","family":"Miller","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Felix","family":"Naumann","sequence":"additional","affiliation":[{"name":"Hasso Plattner Institute, University of Potsdam, Potsdam, Germany"}]},{"given":"Xiaofang","family":"Zhou","sequence":"additional","affiliation":[{"name":"The University of Queensland, Queensland, Australia"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research, Bedminster, NJ, USA"}]}],"member":"320","published-online":{"date-parts":[[2018,2,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994518"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-013-0126-x"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2767109.2770014"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1287369.1287420"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850586"},{"key":"e_1_2_1_6_1","first-page":"47","article-title":"Benchmarking Data Curation Systems","volume":"39","author":"Arocena P.","year":"2016","unstructured":"Arocena , P. , Glavic , B. , Mecca , G. , Miller , R.J. , Papotti , P. , and Santoro , D. 2016 . Benchmarking Data Curation Systems . IEEE Data Eng. Bull. 39 , 2, 47 -- 62 . Arocena, P., Glavic, B., Mecca, G., Miller, R.J., Papotti, P., and Santoro, D. 2016. Benchmarking Data Curation Systems. IEEE Data Eng. Bull. 39, 2, 47--62.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767864"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-014-0760-0"},{"key":"e_1_2_1_9_1","volume-title":"Technology Conference on Performance Evaluation and Benchmarking","author":"Cao P.","year":"2016","unstructured":"Cao , P. , Gowda , B. , Lakshmi , S. , Narasimhadevara , C. , Nguyen , P. , Poelman , J. , Poess , M. , and Rabl , T . 2016. From BigBench to TPCx-BB: Standardization of a Big Data Benchmark . In Technology Conference on Performance Evaluation and Benchmarking ( 2016 ), Springer, 24--44. Cao, P., Gowda, B., Lakshmi, S., Narasimhadevara, C., Nguyen, P., Poelman, J., Poess, M., and Rabl, T. 2016. From BigBench to TPCx-BB: Standardization of a Big Data Benchmark. In Technology Conference on Performance Evaluation and Benchmarking (2016), Springer, 24--44."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610520"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/11508069_15"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_2_1_13_1","first-page":"78","article-title":"Data Quality for Temporal Streams","volume":"39","author":"Dasu T.","year":"2016","unstructured":"Dasu , T. , Duan , R. , and Srivastava , D. 2016 . Data Quality for Temporal Streams . IEEE Data Eng. Bull. 39 , 2, 78 -- 92 . Dasu, T., Duan, R., and Srivastava, D. 2016. Data Quality for Temporal Streams. IEEE Data Eng. Bull. 39, 2, 78--92.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Dasu T. and Johnson T. 2003. Exploratory data mining and data cleaning. John Wiley&Sons.   Dasu T. and Johnson T. 2003. Exploratory data mining and data cleaning. John Wiley&Sons.","DOI":"10.1002\/0471448354"},{"key":"e_1_2_1_15_1","first-page":"106","article-title":"Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources","volume":"39","author":"Dong X.L.","year":"2016","unstructured":"Dong , X.L. , Gabrilovich , E. , Murphy , K. , Dang , V. , Horn , W. , Lugaresi , C. , Sun , S. , and Zhang , W. 2016 . Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources . IEEE Data Eng. Bull. 39 , 2, 106 -- 117 . Dong, X.L., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., and Zhang, W. 2016. Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. IEEE Data Eng. Bull. 39, 2, 106--117.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.9"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Floridi L. and Illari P. 2014. The philosophy of information quality. Springer.  Floridi L. and Illari P. 2014. The philosophy of information quality. Springer.","DOI":"10.1007\/978-3-319-07121-3"},{"key":"e_1_2_1_18_1","first-page":"63","article-title":"Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips","volume":"39","author":"Freire J.","year":"2016","unstructured":"Freire , J. , Bessa , A. , Chirigati , F. , Vo , H.T. , and Zhao , K. 2016 . Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips . IEEE Data Eng. Bull. 39 , 2, 63 -- 77 . Freire, J., Bessa, A., Chirigati, F., Vo, H.T., and Zhao, K. 2016. Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips. IEEE Data Eng. Bull. 39, 2, 63--77.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939509"},{"key":"e_1_2_1_20_1","first-page":"93","article-title":"Quality- Aware Entity-Level Semantic Representations for Short Texts","volume":"39","author":"Hua W.","year":"2016","unstructured":"Hua , W. , Zheng , K. , and Zhou , X. 2016 . Quality- Aware Entity-Level Semantic Representations for Short Texts . IEEE Data Eng. Bull. 39 , 2, 93 -- 105 . Hua, W., Zheng, K., and Zhou, X. 2016. Quality- Aware Entity-Level Semantic Representations for Short Texts. IEEE Data Eng. Bull. 39, 2, 93--105.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_21_1","first-page":"38","article-title":"Effective Data Cleaning with Continuous Evaluation","volume":"39","author":"Ilyas I.F.","year":"2016","unstructured":"Ilyas , I.F. 2016 . Effective Data Cleaning with Continuous Evaluation . IEEE Data Eng. Bull. 39 , 2, 38 -- 46 . Ilyas, I.F. 2016. Effective Data Cleaning with Continuous Evaluation. IEEE Data Eng. Bull. 39, 2, 38--46.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000045"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2611567"},{"key":"e_1_2_1_24_1","volume-title":"24th Australasian Conference on Information Systems (ACIS)","author":"Jayawardene V.","year":"2013","unstructured":"Jayawardene , V. , Sadiq , S. , and Indulska , M . 2013. The curse of dimensionality in data quality . In 24th Australasian Conference on Information Systems (ACIS) ( 2013 ), RMIT University, 1--11. Jayawardene, V., Sadiq, S., and Indulska, M. 2013. The curse of dimensionality in data quality. In 24th Australasian Conference on Information Systems (ACIS) (2013), RMIT University, 1--11."},{"volume-title":"Juran on leadership for quality","author":"Juran J.M.","key":"e_1_2_1_25_1","unstructured":"Juran , J.M. 1989. Juran on leadership for quality . New York : The Free Press . Juran, J.M. 1989. Juran on leadership for quality. New York: The Free Press."},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Kirk R.E. 2003. Experimental Design.  Kirk R.E. 2003. Experimental Design.","DOI":"10.1002\/0471264385.wei0201"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-016-0430-9"},{"key":"e_1_2_1_28_1","first-page":"21","article-title":"Discovering Meaningful Certain Keys from Incomplete and Inconsistent Relations","volume":"39","author":"K\u00f6hler H.","year":"2016","unstructured":"K\u00f6hler , H. , Link , S. , and Zhou , X. 2016 . Discovering Meaningful Certain Keys from Incomplete and Inconsistent Relations . IEEE Data Eng. Bull. 39 , 2, 21 - 37 . K\u00f6hler, H., Link, S., and Zhou, X. 2016. Discovering Meaningful Certain Keys from Incomplete and Inconsistent Relations. IEEE Data Eng. Bull. 39, 2, 21- 37.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994514"},{"key":"e_1_2_1_30_1","first-page":"8","article-title":"Data Anamnesis: Admitting Raw Data into an Organization","volume":"39","author":"Kruse S.","year":"2016","unstructured":"Kruse , S. , Papenbrock , T. , Harmouch , H. , and Naumann , F. 2016 . Data Anamnesis: Admitting Raw Data into an Organization . IEEE Data Eng. Bull. 39 , 2, 8 -- 20 . Kruse, S., Papenbrock, T., Harmouch, H., and Naumann, F. 2016. Data Anamnesis: Admitting Raw Data into an Organization. IEEE Data Eng. Bull. 39, 2, 8--20.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_31_1","unstructured":"The iBench Project: http:\/\/dblab.cs.toronto.edu\/project\/iBench\/  The iBench Project: http:\/\/dblab.cs.toronto.edu\/project\/iBench\/"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2501654.2501658"},{"key":"e_1_2_1_33_1","unstructured":"Markie P. 2004. Rationalism vs. empiricism.  Markie P. 2004. Rationalism vs. empiricism."},{"key":"e_1_2_1_34_1","volume-title":"30th International Conference on Data Engineering (ICDE)","author":"Mecca G.","year":"2014","unstructured":"Mecca , G. , Papotti , P. , and Santoro , D . 2014. IQMETER- An evaluation tool for data-transformation systems . In 30th International Conference on Data Engineering (ICDE) ( 2014 ), IEEE, 1218--1221. Mecca, G., Papotti, P., and Santoro, D. 2014. IQMETER- An evaluation tool for data-transformation systems. In 30th International Conference on Data Engineering (ICDE) (2014), IEEE, 1218--1221."},{"key":"e_1_2_1_35_1","volume-title":"COMAD","author":"Miller R.J.","year":"2014","unstructured":"Miller , R.J. 2014. Big Data Curation . In COMAD ( 2014 ), 4. Miller, R.J. 2014. Big Data Curation. In COMAD (2014), 4."},{"key":"e_1_2_1_36_1","unstructured":"Naumann F. CORA Dataset: https:\/\/hpi.de\/naumann\/projects\/repeatability\/datasets\/c ora-dataset.html  Naumann F. CORA Dataset: https:\/\/hpi.de\/naumann\/projects\/repeatability\/datasets\/c ora-dataset.html"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/1841211"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794377"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733009"},{"key":"e_1_2_1_40_1","volume-title":"18th International Conference on Data Engineering (ICDE)","author":"Popivanov I.","year":"2002","unstructured":"Popivanov , I. and Miller , R.J . 2002. Similarity search over time-series data using wavelets . In 18th International Conference on Data Engineering (ICDE) ( 2002 ), IEEE, 212--221. Popivanov, I. and Miller, R.J. 2002. Similarity search over time-series data using wavelets. In 18th International Conference on Data Engineering (ICDE) (2002), IEEE, 212--221."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750544"},{"key":"e_1_2_1_42_1","volume-title":"Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data. In Australasian Database Conference","author":"Razniewski S.","year":"2016","unstructured":"Razniewski , S. , Sadiq , S. , and Zhou , X . 2016 . Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data. In Australasian Database Conference ( 2016 ), Springer, 419--431. Razniewski, S., Sadiq, S., and Zhou, X. 2016. Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data. In Australasian Database Conference (2016), Springer, 419--431."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_1_44_1","volume-title":"32nd International Conference on Data Engineering (ICDE)","author":"Sadiq S.","year":"2016","unstructured":"Sadiq , S. and Papotti , P . 2016. Big data quality-whose problem is it? In 32nd International Conference on Data Engineering (ICDE) ( 2016 ), IEEE, 1446--1447. Sadiq, S. and Papotti, P. 2016. Big data quality-whose problem is it? In 32nd International Conference on Data Engineering (ICDE) (2016), IEEE, 1446--1447."},{"key":"e_1_2_1_45_1","unstructured":"Sadiq S. and Srivastava D. 2016. Special Issue on Data Quality Bulletin of the Technical Committee on Data Engineering. 39 2.  Sadiq S. and Srivastava D. 2016. Special Issue on Data Quality Bulletin of the Technical Committee on Data Engineering. 39 2."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the Twenty-Second Australasian Database Conference-Volume 115","author":"Sadiq S.","year":"2011","unstructured":"Sadiq , S. , Yeganeh , N.K. , and Indulska , M . 2011. 20 years of data quality research: themes, trends and synergies . In Proceedings of the Twenty-Second Australasian Database Conference-Volume 115 ( 2011 ), Australian Computer Society, Inc., 153--162. Sadiq, S., Yeganeh, N.K., and Indulska, M. 2011. 20 years of data quality research: themes, trends and synergies. In Proceedings of the Twenty-Second Australasian Database Conference-Volume 115 (2011), Australian Computer Society, Inc., 153--162."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899397"},{"key":"e_1_2_1_48_1","volume-title":"23rd International Conference on Data Engineering (ICDE)","author":"Soliman M.A.","year":"2007","unstructured":"Soliman , M.A. , Ilyas , I.F. , and Chang , K . 2007. Top-k query processing in uncertain databases . In 23rd International Conference on Data Engineering (ICDE) ( 2007 ), IEEE, 896--905. Soliman, M.A., Ilyas, I.F., and Chang, K. 2007. Top-k query processing in uncertain databases. In 23rd International Conference on Data Engineering (ICDE) (2007), IEEE, 896--905."},{"key":"e_1_2_1_49_1","unstructured":"Tamr: https:\/\/www.tamr.com\/  Tamr: https:\/\/www.tamr.com\/"},{"key":"e_1_2_1_50_1","unstructured":"Trifacta: https:\/\/www.trifacta.com\/  Trifacta: https:\/\/www.trifacta.com\/"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(92)90143-4"},{"volume-title":"44th International Conference on Very Large Data Bases 2018: http:\/\/vldb2018","author":"VLDB.","key":"e_1_2_1_52_1","unstructured":"VLDB. 44th International Conference on Very Large Data Bases 2018: http:\/\/vldb2018 .lncc.br\/call-forresearch- track.html VLDB. 44th International Conference on Very Large Data Bases 2018: http:\/\/vldb2018.lncc.br\/call-forresearch- track.html"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3186549.3186559","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3186549.3186559","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:27Z","timestamp":1750212687000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3186549.3186559"}},"subtitle":["The Role of Empiricism"],"short-title":[],"issued":{"date-parts":[[2018,2,22]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,2,22]]}},"alternative-id":["10.1145\/3186549.3186559"],"URL":"https:\/\/doi.org\/10.1145\/3186549.3186559","relation":{},"ISSN":["0163-5808"],"issn-type":[{"type":"print","value":"0163-5808"}],"subject":[],"published":{"date-parts":[[2018,2,22]]},"assertion":[{"value":"2018-02-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}