{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T03:41:53Z","timestamp":1769744513775,"version":"3.49.0"},"publisher-location":"Cham","reference-count":48,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031377020","type":"print"},{"value":"9783031377037","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,18]],"date-time":"2023-07-18T00:00:00Z","timestamp":1689638400000},"content-version":"vor","delay-in-days":198,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the<jats:italic>k<\/jats:italic>-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness,<jats:inline-formula><jats:tex-math>$$\\epsilon $$<\/jats:tex-math><\/jats:inline-formula>-fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this<jats:italic>abstract interpretation<\/jats:italic>based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.<\/jats:p>","DOI":"10.1007\/978-3-031-37703-7_16","type":"book-chapter","created":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T17:02:15Z","timestamp":1689613335000},"page":"335-357","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Certifying the\u00a0Fairness of\u00a0KNN in\u00a0the\u00a0Presence of\u00a0Dataset Bias"],"prefix":"10.1007","author":[{"given":"Yannan","family":"Li","sequence":"first","affiliation":[]},{"given":"Jingbo","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Chao","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,7,18]]},"reference":[{"issue":"1","key":"16_CR1","first-page":"90","volume":"12","author":"DA Adeniyi","year":"2016","unstructured":"Adeniyi, D.A., Wei, Z., Yongquan, Y.: Automated web usage data mining and recommendation system using k-nearest neighbor (KNN) classification method. Appl. Comput. Inf. 12(1), 90\u2013108 (2016)","journal-title":"Appl. Comput. Inf."},{"key":"16_CR2","doi-asserted-by":"crossref","unstructured":"Albarghouthi, A., D\u2019Antoni, L., Drews, S., Nori, A.V.: FairSquare: probabilistic verification of program fairness. Proc. ACM Programm. Lang. 1(OOPSLA), 1\u201330 (2017)","DOI":"10.1145\/3133904"},{"key":"16_CR3","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1007\/978-3-319-63387-9_9","volume-title":"Computer Aided Verification","author":"A Albarghouthi","year":"2017","unstructured":"Albarghouthi, A., D\u2019Antoni, L., Drews, S.: Repairing decision-making programs under uncertainty. In: Majumdar, R., Kun\u010dak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 181\u2013200. Springer, Cham (2017). https:\/\/doi.org\/10.1007\/978-3-319-63387-9_9"},{"key":"16_CR4","unstructured":"Andersson, M., Tran, L.: Predicting movie ratings using KNN (2020)"},{"key":"16_CR5","doi-asserted-by":"crossref","unstructured":"Barthe, G., D\u2019Argenio, P.R., Rezk, T.: Secure information flow by self-composition. In: IEEE Computer Security Foundations Workshop, pp. 100\u2013114 (2004)","DOI":"10.1109\/CSFW.2004.1310735"},{"key":"16_CR6","doi-asserted-by":"crossref","unstructured":"Bastani, O., Zhang, X., Solar-Lezama, A.: Probabilistic verification of fairness properties via concentration. Proc. ACM Programm. Lang. 1(OOPSLA), 1\u201327 (2019)","DOI":"10.1145\/3360544"},{"key":"16_CR7","unstructured":"Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Annual Conference on Neural Information Processing Systems, vol. 29 (2016)"},{"issue":"8","key":"16_CR8","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1145\/2240236.2240262","volume":"55","author":"S Chaudhuri","year":"2012","unstructured":"Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8), 107\u2013115 (2012)","journal-title":"Commun. ACM"},{"key":"16_CR9","unstructured":"Cortez, P., Silva, A.M.G.: Using data mining to predict secondary school student performance. EUROSIS-ETI (2008)"},{"key":"16_CR10","doi-asserted-by":"crossref","unstructured":"Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: ACM Symposium on Principles of Programming Languages, pp. 238\u2013252 (1977)","DOI":"10.1145\/512950.512973"},{"key":"16_CR11","unstructured":"Dieterich, W., Mendoza, C., Brennan, T.: COMPAS risk scales: demonstrating accuracy equity and predictive parity. Northpointe Inc (2016)"},{"key":"16_CR12","doi-asserted-by":"crossref","unstructured":"Drews, S., Albarghouthi, A., D\u2019Antoni, L.: Proving data-poisoning robustness in decision trees. In: ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 1083\u20131097 (2020)","DOI":"10.1145\/3385412.3385975"},{"key":"16_CR13","unstructured":"Dua, D., Graff, C.: UCI machine learning repository (2017). http:\/\/archive.ics.uci.edu\/ml"},{"key":"16_CR14","doi-asserted-by":"crossref","unstructured":"Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.S.: Fairness through awareness. In: Innovations in Theoretical Computer Science, pp. 214\u2013226 (2012)","DOI":"10.1145\/2090236.2090255"},{"key":"16_CR15","doi-asserted-by":"crossref","unstructured":"Finkbeiner, B., Haas, L., Torfah, H.: Canonical representations of k-safety hyperproperties. In: IEEE Computer Security Foundations Symposium, pp. 17\u201331 (2019)","DOI":"10.1109\/CSF.2019.00009"},{"key":"16_CR16","doi-asserted-by":"crossref","unstructured":"Firdausi, I., Erwin, A., Nugroho, A.S., et al.: Analysis of machine learning techniques used in behavior-based malware detection. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 201\u2013203. IEEE (2010)","DOI":"10.1109\/ACT.2010.33"},{"key":"16_CR17","doi-asserted-by":"crossref","unstructured":"Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.T.: AI2: safety and robustness certification of neural networks with abstract interpretation. In: IEEE Symposium on Security and Privacy, pp. 3\u201318 (2018)","DOI":"10.1109\/SP.2018.00058"},{"key":"16_CR18","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"986","DOI":"10.1007\/978-3-540-39964-3_62","volume-title":"On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE","author":"G Guo","year":"2003","unstructured":"Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986\u2013996. Springer, Heidelberg (2003). https:\/\/doi.org\/10.1007\/978-3-540-39964-3_62"},{"key":"16_CR19","doi-asserted-by":"crossref","unstructured":"Guo, S., Wu, M., Wang, C.: Adversarial symbolic execution for detecting concurrency-related cache timing leaks. In: ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 377\u2013388 (2018)","DOI":"10.1145\/3236024.3236028"},{"key":"16_CR20","doi-asserted-by":"crossref","unstructured":"Jia, J., Liu, Y., Cao, X., Gong, N.Z.: Certified robustness of nearest neighbors against data poisoning and backdoor attacks. In: The AAAI Conference on Artificial Intelligence (2022)","DOI":"10.1609\/aaai.v36i9.21191"},{"key":"16_CR21","doi-asserted-by":"crossref","unstructured":"Kusano, M., Wang, C.: Flow-sensitive composition of thread-modular abstract interpretation. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 799\u2013809 (2016)","DOI":"10.1145\/2950290.2950291"},{"key":"16_CR22","doi-asserted-by":"crossref","unstructured":"Kusano, M., Wang, C.: Thread-modular static analysis for relaxed memory models. In: ACM Joint Meeting on European Software Engineering Conference and Symposium on Foundations of Software Engineering, pp. 337\u2013348 (2017)","DOI":"10.1145\/3106237.3106243"},{"key":"16_CR23","doi-asserted-by":"crossref","unstructured":"Li, Y., Fang, B., Guo, L., Chen, Y.: Network anomaly detection based on TCM-KNN algorithm. In: ACM Symposium on Information, Computer and Communications Security, pp. 13\u201319 (2007)","DOI":"10.1145\/1229285.1229292"},{"key":"16_CR24","unstructured":"Li, Y., Wang, J., Wang, C.: Proving robustness of KNN against adversarial data poisoning. In: International Conference on Formal Methods in Computer-Aided Design, pp. 7\u201316 (2022)"},{"key":"16_CR25","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, J., Wang, C.: Systematic testing of the data-poisoning robustness of KNN. In: ACM SIGSOFT International Symposium on Software Testing and Analysis (2023)","DOI":"10.1145\/3597926.3598129"},{"key":"16_CR26","unstructured":"Meyer, A.P., Albarghouthi, A., D\u2019Antoni, L.: Certifying robustness to programmable data bias in decision trees. In: Annual Conference on Neural Information Processing Systems, pp. 26276\u201326288 (2021)"},{"key":"16_CR27","unstructured":"Meyer, A.P., Albarghouthi, A., D\u2019Antoni, L.: Certifying data-bias robustness in linear regression. CoRR abs\/2206.03575 (2022)"},{"key":"16_CR28","doi-asserted-by":"crossref","unstructured":"Mohammadinejad, S., Paulsen, B., Deshmukh, J.V., Wang, C.: DiffRNN: differential verification of recurrent neural networks. In: International Conference on Formal Modeling and Analysis of Timed Systems, pp. 117\u2013134 (2021)","DOI":"10.1007\/978-3-030-85037-1_8"},{"issue":"1","key":"16_CR29","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1007\/s00500-014-1511-6","volume":"20","author":"FA Narudin","year":"2016","unstructured":"Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft. Comput. 20(1), 343\u2013357 (2016)","journal-title":"Soft. Comput."},{"key":"16_CR30","doi-asserted-by":"crossref","unstructured":"Paulsen, B., Wang, C.: Example guided synthesis of linear approximations for neural network verification. In: International Conference on Computer Aided Verification, pp. 149\u2013170 (2022)","DOI":"10.1007\/978-3-031-13185-1_8"},{"key":"16_CR31","doi-asserted-by":"crossref","unstructured":"Paulsen, B., Wang, J., Wang, C.: ReluDiff: differential verification of deep neural networks. In: International Conference on Software Engineering, pp. 714\u2013726 (2020)","DOI":"10.1145\/3377811.3380337"},{"key":"16_CR32","doi-asserted-by":"crossref","unstructured":"Paulsen, B., Wang, J., Wang, J., Wang, C.: NEURODIFF: scalable differential verification of neural networks using fine-grained approximation. In: International Conference on Automated Software Engineering, pp. 784\u2013796 (2020)","DOI":"10.1145\/3324884.3416560"},{"key":"16_CR33","unstructured":"Rosenfeld, E., Winston, E., Ravikumar, P., Kolter, J.Z.: Certified robustness to label-flipping attacks via randomized smoothing. In: International Conference on Machine Learning, vol. 119, pp. 8230\u20138241 (2020)"},{"key":"16_CR34","unstructured":"Ruoss, A., Balunovic, M., Fischer, M., Vechev, M.T.: Learning certified individually fair representations. In: Annual Conference on Neural Information Processing Systems (2020)"},{"key":"16_CR35","doi-asserted-by":"crossref","unstructured":"Sousa, M., Dillig, I.: Cartesian hoare logic for verifying k-safety properties. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 57\u201369 (2016)","DOI":"10.1145\/2980983.2908092"},{"issue":"4","key":"16_CR36","doi-asserted-by":"publisher","first-page":"3492","DOI":"10.1016\/j.eswa.2010.08.137","volume":"38","author":"MY Su","year":"2011","unstructured":"Su, M.Y.: Real-time anomaly detection systems for denial-of-service attacks by weighted k-nearest-neighbor classifiers. Expert Syst. Appl. 38(4), 3492\u20133498 (2011)","journal-title":"Expert Syst. Appl."},{"key":"16_CR37","doi-asserted-by":"crossref","unstructured":"Sung, C., Kusano, M., Wang, C.: Modular verification of interrupt-driven software. In: International Conference on Automated Software Engineering, pp. 206\u2013216 (2017)","DOI":"10.1109\/ASE.2017.8115634"},{"key":"16_CR38","doi-asserted-by":"publisher","unstructured":"Wang, J., Li, Y., Wang, C.: Synthesizing fair decision trees via iterative constraint solving. In: Shoham, S., Vizel, Y. (eds.) International Conference on Computer Aided Verification, pp. 364\u2013385. Springer, Cham (2022). https:\/\/doi.org\/10.1007\/978-3-031-13188-2_18","DOI":"10.1007\/978-3-031-13188-2_18"},{"key":"16_CR39","doi-asserted-by":"crossref","unstructured":"Wang, J., Sung, C., Raghothaman, M., Wang, C.: Data-driven synthesis of provably sound side channel analyses. In: International Conference on Software Engineering, pp. 810\u2013822 (2021)","DOI":"10.1109\/ICSE43902.2021.00079"},{"key":"16_CR40","doi-asserted-by":"crossref","unstructured":"Wang, J., Sung, C., Wang, C.: Mitigating power side channels during compilation. In: ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 590\u2013601 (2019)","DOI":"10.1145\/3338906.3338913"},{"key":"16_CR41","doi-asserted-by":"crossref","unstructured":"Wang, J., Wang, C.: Learning to synthesize relational invariants. In: International Conference on Automated Software Engineering, pp. 65:1\u201365:12 (2022)","DOI":"10.1145\/3551349.3556942"},{"key":"16_CR42","unstructured":"Weisberg, S.: Applied Linear Regression, p. 194. Wiley (1985)"},{"key":"16_CR43","doi-asserted-by":"crossref","unstructured":"Wu, M., Guo, S., Schaumont, P., Wang, C.: Eliminating timing side-channel leaks using program repair. In: ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 15\u201326 (2018)","DOI":"10.1145\/3213846.3213851"},{"key":"16_CR44","doi-asserted-by":"crossref","unstructured":"Wu, M., Wang, C.: Abstract interpretation under speculative execution. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 802\u2013815 (2019)","DOI":"10.1145\/3314221.3314647"},{"key":"16_CR45","doi-asserted-by":"crossref","unstructured":"Wu, W., Zhang, W., Yang, Y., Wang, Q.: DREX: developer recommendation with k-nearest-neighbor search and expertise ranking. In: Asia-Pacific Software Engineering Conference, pp. 389\u2013396 (2011)","DOI":"10.1109\/APSEC.2011.15"},{"issue":"8","key":"16_CR46","doi-asserted-by":"publisher","first-page":"1661","DOI":"10.1109\/TPDS.2012.261","volume":"24","author":"M Xie","year":"2012","unstructured":"Xie, M., Hu, J., Han, S., Chen, H.H.: Scalable hypergrid K-NN-based online anomaly detection in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1661\u20131670 (2012)","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"16_CR47","doi-asserted-by":"crossref","unstructured":"Yeh, I.C., Lien, C.h.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Exp. Syst. Appl. 36(2), 2473\u20132480 (2009)","DOI":"10.1016\/j.eswa.2007.12.020"},{"key":"16_CR48","doi-asserted-by":"crossref","unstructured":"Zhang, J., Gao, P., Song, F., Wang, C.: SCInfer: refinement-based verification of software countermeasures against side-channel attacks. In: International Conference on Computer Aided Verification, pp. 157\u2013177 (2018)","DOI":"10.1007\/978-3-319-96142-2_12"}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-37703-7_16","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T09:02:16Z","timestamp":1729760536000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-37703-7_16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031377020","9783031377037"],"references-count":48,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-37703-7_16","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"18 July 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Paris","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"France","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2023","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"17 July 2023","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22 July 2023","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"35","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2023","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/www.i-cav.org\/2023\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"hotcrp","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"261","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"67","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"26% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"11","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}