{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T18:22:55Z","timestamp":1761675775216,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2010,7,1]],"date-time":"2010-07-01T00:00:00Z","timestamp":1277942400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2010,7]]},"abstract":"<jats:p>In today\u2019s data-rich environment, decision makers draw conclusions from data repositories that may contain data quality problems. In this context, missing data is an important and known problem, since it can seriously affect the accuracy of conclusions drawn. Researchers have described several approaches for dealing with missing data, primarily attempting to infer values or estimate the impact of missing data on conclusions. However, few have considered approaches to characterize patterns of bias in missing data, that is, to determine the specific attributes that predict the missingness of data values. Knowledge of the specific systematic bias patterns in the incidence of missing data can help analysts more accurately assess the quality of conclusions drawn from data sets with missing data. This research proposes a methodology to combine a number of Knowledge Discovery and Data Mining techniques, including association rule mining, to discover patterns in related attribute values that help characterize these bias patterns. We demonstrate the efficacy of our proposed approach by applying it on a demo census dataset seeded with biased missing data. The experimental results show that our approach was able to find seeded biases and filter out most seeded noise.<\/jats:p>","DOI":"10.1145\/1805286.1805288","type":"journal-article","created":{"date-parts":[[2010,7,27]],"date-time":"2010-07-27T14:10:03Z","timestamp":1280239803000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Using Data Mining Techniques to Discover Bias Patterns in Missing Data"],"prefix":"10.1145","volume":"2","author":[{"given":"Monica Chiarini","family":"Tremblay","sequence":"first","affiliation":[{"name":"Florida International University"}]},{"given":"Kaushik","family":"Dutta","sequence":"additional","affiliation":[{"name":"Florida International University"}]},{"given":"Debra","family":"Vandermeer","sequence":"additional","affiliation":[{"name":"Florida International University"}]}],"member":"320","published-online":{"date-parts":[[2010,7]]},"reference":[{"volume-title":"Proceedings of the 20th International Conference on Very Large Data Bases. M. J. J. B. Bocca, and C. Zaniolo Eds., Morgan Kaufmann Publishers Inc.","author":"Agrawal R.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","first-page":"3","article-title":"The New Jersey data reduction report","volume":"20","author":"Barbara D.","year":"1997","journal-title":"IEEE Data Engin. Bull."},{"key":"e_1_2_1_3_1","unstructured":"Breiman L. Friedman J. H. Olshen R. A. and Stone C. J. 1984. Classification and Regression Trees. Wadsworth International. Breiman L. Friedman J. H. Olshen R. A. and Stone C. J. 1984. Classification and Regression Trees . Wadsworth International."},{"key":"e_1_2_1_4_1","unstructured":"Csirik J. Frenk J. B. G. Labbe M. and Zhang S. 1990. Heuristics for 0-1 min-knapsack problem erasmus. University of Rotterdam - Econometric Institute Rotterdam. Csirik J. Frenk J. B. G. Labbe M. and Zhang S. 1990. Heuristics for 0-1 min-knapsack problem erasmus. University of Rotterdam - Econometric Institute Rotterdam."},{"volume-title":"Proceedings of the 9th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE Computer Society, 532","author":"Dash M.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240616.1240623"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/240455.240464"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1287\/isre.14.2.170.16017"},{"volume-title":"Data Mining: Concepts and Techniques. Morgan Kaufmann","year":"2001","author":"Han J.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of the International Conference on Information Systems (ICIS).","author":"Heinrich B.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1198\/000313007X172556"},{"key":"e_1_2_1_12_1","unstructured":"Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data. Prentice Hall Englewood Cliffs NJ. Jain A. K. and Dubes R. C. 1988. Algorithms for Clustering Data . Prentice Hall Englewood Cliffs NJ."},{"key":"e_1_2_1_13_1","unstructured":"Jung W. Olfman L. Ryan T. and Park Y. T. 2005. An experimental study of the effects of contextual data quality and task complexity on decision performance 149--154. Jung W. Olfman L. Ryan T. and Park Y. T. 2005. An experimental study of the effects of contextual data quality and task complexity on decision performance 149--154."},{"volume-title":"Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining. D. W.-L. Cheung, G. J. Williams, and Q. Li Eds., Springer-Verlag, 364--375","author":"Li J.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/948383.948385"},{"key":"e_1_2_1_16_1","first-page":"183","article-title":"Data quality: A prerequisite for successful data warehouse implementation","volume":"25","author":"Mahnic V.","year":"2001","journal-title":"Informatica Slovene Soc. Informatika"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02243544"},{"volume-title":"Missing Data: A Gentle Introduction","year":"2007","author":"McKnight P. E.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","unstructured":"Microsoft. 2005. Microsoft SQL server 2005 sample data warehouse. Microsoft. 2005. Microsoft SQL server 2005 sample data warehouse."},{"key":"e_1_2_1_20_1","unstructured":"Neter J. and Wasserman W. 1974. Applied Linear Statistical Models; Regression Analysis of Variance and Experimental Designs. R. D. Irwin Homewood Ill. Neter J. and Wasserman W. 1974. Applied Linear Statistical Models; Regression Analysis of Variance and Experimental Designs . R. D. Irwin Homewood Ill."},{"volume-title":"Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. 78--85","author":"Ordonez C.","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2005.12.005"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1040.0237"},{"key":"e_1_2_1_24_1","unstructured":"Quinlan J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Quinlan J. R. 1993. C4.5: Programs for Machine Learning . Morgan Kaufmann."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/63.3.581"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/645805.670142"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.4018\/jdm.2003100102"},{"volume-title":"Proceedings of the 7th International Conference on Advanced Information Networking and Applications. IEEE Computer Society, 526","author":"Shen J.-J.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/253769.253804"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/11427995_38"},{"key":"e_1_2_1_31_1","unstructured":"Trochim W. M. K. 1999. The Research Methods Knowledge Base. Cornell University Ithaca. Trochim W. M. K. 1999. The Research Methods Knowledge Base . Cornell University Ithaca."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-9236(93)E0050-N"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.404034"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"},{"volume-title":"Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann","year":"2005","author":"Witten I. H.","key":"e_1_2_1_35_1"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1805286.1805288","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1805286.1805288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:43:36Z","timestamp":1750286616000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1805286.1805288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,7]]}},"alternative-id":["10.1145\/1805286.1805288"],"URL":"https:\/\/doi.org\/10.1145\/1805286.1805288","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2010,7]]},"assertion":[{"value":"2008-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-07-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}