{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T06:31:52Z","timestamp":1767853912746,"version":"3.49.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,2,1]],"date-time":"2011-02-01T00:00:00Z","timestamp":1296518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2011,2]]},"abstract":"<jats:p>Data quality remains a persistent problem in practice and a challenge for research. In this study we focus on the four dimensions of data quality noted as the most important to information consumers, namely accuracy, completeness, consistency, and timeliness. These dimensions are of particular concern for operational systems, and most importantly for data warehouses, which are often used as the primary data source for analyses such as classification, a general type of data mining. However, the definitions and conceptual models of these dimensions have not been collectively considered with respect to data mining in general or classification in particular. Nor have they been considered for problem complexity. Conversely, these four dimensions of data quality have only been indirectly addressed by data mining research. Using definitions and constructs of data quality dimensions, our research evaluates the effects of both data quality and problem complexity on generated data and tests the results in a real-world case. Six different classification outcomes selected from the spectrum of classification algorithms show that data quality and problem complexity have significant main and interaction effects. From the findings of significant effects, the economics of higher data quality are evaluated for a frequent application of classification and illustrated by the real-world case.<\/jats:p>","DOI":"10.1145\/1891879.1891881","type":"journal-article","created":{"date-parts":[[2011,2,15]],"date-time":"2011-02-15T18:30:59Z","timestamp":1297794659000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":91,"title":["The Effects and Interactions of Data Quality and Problem Complexity on Classification"],"prefix":"10.1145","volume":"2","author":[{"given":"Roger","family":"Blake","sequence":"first","affiliation":[{"name":"University of Massachusetts, Boston"}]},{"given":"Paul","family":"Mangiameli","sequence":"additional","affiliation":[{"name":"University of Rhode Island"}]}],"member":"320","published-online":{"date-parts":[[2011,2]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2004.12.002"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/545151.545178"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.44.4.462"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.31.2.150"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2003.1161595"},{"key":"e_1_2_1_6_1","volume-title":"Analytics: The New Science of Winning","author":"Davenport T. H.","year":"2007","unstructured":"Davenport , T. H. and Harris , J. G . 2007 . Competing on Analytics: The New Science of Winning . Harvard Business School Publishing Company , Boston, MA . Davenport, T. H. and Harris, J. G. 2007. Competing on Analytics: The New Science of Winning. Harvard Business School Publishing Company, Boston, MA."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/64.180410"},{"key":"e_1_2_1_8_1","unstructured":"Eckerson W. W. 2002. Data warehousing special report: Data quality and the bottom line. In Applications Development Trends. Eckerson W. W. 2002. Data warehousing special report: Data quality and the bottom line. In Applications Development Trends."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJIQ.2007.013374"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1659225.1659228"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 12th International Conference on Information Quality.","author":"Fisher C.","unstructured":"Fisher , C. , Lauria , E. , and Matheus , C . 2007. In search of an accuracy metric . In Proceedings of the 12th International Conference on Information Quality. Fisher, C., Lauria, E., and Matheus, C. 2007. In search of an accuracy metric. In Proceedings of the 12th International Conference on Information Quality."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the International Conference on Information Quality.","author":"Ge M.","unstructured":"Ge , M. and Helfert , M . 2006. A framework to assess decision quality using information quality dimensions . In Proceedings of the International Conference on Information Quality. Ge, M. and Helfert, M. 2006. A framework to assess decision quality using information quality dimensions. In Proceedings of the International Conference on Information Quality."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 4th Asia-Pacific Conference on Conceptual Modeling. 17--26","author":"Gomes P.","unstructured":"Gomes , P. , Farinha , J. , and Trigueiros , M. J . 2007. A data quality metamodel extension to CWM . In Proceedings of the 4th Asia-Pacific Conference on Conceptual Modeling. 17--26 . Gomes, P., Farinha, J., and Trigueiros, M. J. 2007. A data quality metamodel extension to CWM. In Proceedings of the 4th Asia-Pacific Conference on Conceptual Modeling. 17--26."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cor.2005.11.007"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1515693.1515697"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(94)00094-8"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506007"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.stamet.2005.08.005"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.2307\/249418"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/545151.545177"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008334909089"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdm.2004010104"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Lee Y. W. Pipino L. L. Funk J. D. and Wang R. Y. 2006. Journey to Data Quality. The MIT Press. Lee Y. W. Pipino L. L. Funk J. D. and Wang R. Y. 2006. Journey to Data Quality . The MIT Press.","DOI":"10.7551\/mitpress\/4037.001.0001"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-7206(02)00043-5"},{"key":"e_1_2_1_25_1","unstructured":"Madnick S. and Wang R. Y. 1992. Introduction to total data quality management (TDQM). Research Program TDQM-92-01 Total Data Quality Management Program MIT Sloan School of Management. Madnick S. and Wang R. Y. 1992. Introduction to total data quality management (TDQM). Research Program TDQM-92-01 Total Data Quality Management Program MIT Sloan School of Management."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2005.05.029"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann Publishers, 254--262","author":"Oates T.","unstructured":"Oates , T. and Jensen , D . 1997. The effects of training set size on decision tree complexity . In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann Publishers, 254--262 . Oates, T. and Jensen, D. 1997. The effects of training set size on decision tree complexity. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann Publishers, 254--262."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2007.06.004"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2005.12.005"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1040.0237"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506010"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022643204877"},{"key":"e_1_2_1_33_1","volume-title":"Data: An unfolding quality disaster. DM Rev. 6.","author":"Redman T. C.","year":"2004","unstructured":"Redman , T. C. 2004 . Data: An unfolding quality disaster. DM Rev. 6. Redman, T. C. 2004. Data: An unfolding quality disaster. DM Rev. 6."},{"key":"e_1_2_1_34_1","first-page":"105","article-title":"Zero defections","volume":"68","author":"Reichheld F. F.","year":"1990","unstructured":"Reichheld , F. F. and Sasser , W. E. 1990 . Zero defections . Harvard Bus. Rev. 68 , 105 -- 111 . Reichheld, F. F. and Sasser, W. E. 1990. Zero defections. Harvard Bus. Rev. 68, 105--111.","journal-title":"Harvard Bus. Rev."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 11th International Conference on Information Quality.","author":"Sessions V.","unstructured":"Sessions , V. and Valtorta , M . 2006. Learning Bayesian networks from inaccurate data . In Proceedings of the 11th International Conference on Information Quality. Sessions, V. and Valtorta, M. 2006. Learning Bayesian networks from inaccurate data. In Proceedings of the 11th International Conference on Information Quality."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.2004.12.006"},{"key":"e_1_2_1_37_1","unstructured":"Su Y. and Jin Z. 2007. Assessment and improvement of data and information quality. In Information Quality Management: Theory and Applications. Idea Group Inc. Su Y. and Jin Z. 2007. Assessment and improvement of data and information quality. In Information Quality Management: Theory and Applications . Idea Group Inc."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1006\/obhd.2000.2941"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/240455.240479"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"},{"key":"e_1_2_1_41_1","unstructured":"Wang R. Y. Ziad M. and Lee Y. W. 2000. Data Quality. Kluwer Academic Publishers. Wang R. Y. Ziad M. and Lee Y. W. 2000. Data Quality. Kluwer Academic Publishers."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijpe.2006.06.004"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-004-0751-8"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1891879.1891881","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1891879.1891881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:40Z","timestamp":1750244380000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1891879.1891881"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2]]},"references-count":43,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,2]]}},"alternative-id":["10.1145\/1891879.1891881"],"URL":"https:\/\/doi.org\/10.1145\/1891879.1891881","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,2]]},"assertion":[{"value":"2008-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-02-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}