{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T08:18:42Z","timestamp":1763021922304,"version":"3.45.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2025,11,13]]},"abstract":"<jats:p>Object detection models, while achieving greater performance, often suffer from recurring errors such as misclassifications or missed detections. Existing explainable AI (XAI) tools primarily offer static, observation-based explanations and rarely support interactive correction or retraining, especially for non-expert users. To bridge this gap, we introduce CorrectMe, an interactive framework that integrates human-in-the-loop correction and explanation into object detection workflows. CorrectMe empowers users to iteratively explore, interpret, and rectify model errors through a unified interface featuring semantic embedding visualizations, saliency-based explanations, and natural language rationales. Users can revise predictions and incrementally retrain the model, streamlining the refinement process through lightweight updates rather than full-scale retraining or annotation. Through application scenarios and user studies, we demonstrate that CorrectMe enables more strategic corrections, improves model understanding, and lowers the barrier to practical refinement of object detection models.<\/jats:p>","DOI":"10.1145\/3773065","type":"journal-article","created":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T08:12:18Z","timestamp":1763021538000},"page":"170-186","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["CorrectMe: An Interactive Framework for Human-in-the-Loop Correction and Explanation of Object Detection Models"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7124-9328","authenticated-orcid":false,"given":"Yinuo","family":"Liu","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9041-5418","authenticated-orcid":false,"given":"Zhiyuan","family":"Wu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7151-2407","authenticated-orcid":false,"given":"Xiaoju","family":"Dong","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5403-4350","authenticated-orcid":false,"given":"Shaoxiong","family":"Jiang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8936-5843","authenticated-orcid":false,"given":"Weijie","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2025,11,13]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2019.12.012"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2020.2980246"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201327","author":"Bhattacharya Aditya","year":"2024","unstructured":"Aditya Bhattacharya, Simone Stumpf, Lucija Gosak, Gregor Stiglic, and Katrien Verbert. 2024. Exmos: Explanatory model steering through multifaceted explanations and data configurations. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201327. https:\/\/doi.org\/10.1145\/3613904.3642106 10.1145\/3613904.3642106"},{"key":"e_1_2_1_4_1","volume-title":"European conference on computer vision. 213\u2013229","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. 213\u2013229. https:\/\/doi.org\/10.1007\/978-3-030-58452-8_13 10.1007\/978-3-030-58452-8_13"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3084694"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3138933"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2020.2973258"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/ail2.60"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2024.102417"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the IEEE international conference on computer vision. 2961\u20132969","author":"He Kaiming","year":"2017","unstructured":"Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961\u20132969. https:\/\/doi.org\/10.1109\/ICCV.2017.322 10.1109\/ICCV.2017.322"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 2019 CHI conference on human factors in computing systems. 1\u201313","author":"Hohman Fred","year":"2019","unstructured":"Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M Drucker. 2019. Gamut: A design probe to understand how data scientists understand machine learning models. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1\u201313. https:\/\/doi.org\/10.1145\/3290605.3300809 10.1145\/3290605.3300809"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3124133"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.9734\/BJAST\/2015\/14975"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2023.1151303"},{"key":"e_1_2_1_16_1","volume-title":"European conference on computer vision. 740\u2013755","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. 740\u2013755. https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48 10.1007\/978-3-319-10602-1_48"},{"key":"e_1_2_1_17_1","volume-title":"Visual instruction tuning. Advances in neural information processing systems, 36","author":"Liu Haotian","year":"2023","unstructured":"Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. Advances in neural information processing systems, 36 (2023), 34892\u201334916."},{"key":"e_1_2_1_18_1","volume-title":"2018 IEEE Conference on Visual Analytics Science and Technology (VAST). 60\u201371","author":"Liu Mengchen","year":"2018","unstructured":"Mengchen Liu, Shixia Liu, Hang Su, Kelei Cao, and Jun Zhu. 2018. Analyzing the noise robustness of deep neural networks. In 2018 IEEE Conference on Visual Analytics Science and Technology (VAST). 60\u201371. https:\/\/doi.org\/10.1109\/VAST.2018.8802509 10.1109\/VAST.2018.8802509"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2864843"},{"key":"e_1_2_1_20_1","volume-title":"European conference on computer vision. 21\u201337","author":"Liu Wei","year":"2016","unstructured":"Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. 21\u201337. https:\/\/doi.org\/10.1007\/978-3-319-46448-0_2 10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 23799\u201323808","author":"Liu Yaoyao","year":"2023","unstructured":"Yaoyao Liu, Bernt Schiele, Andrea Vedaldi, and Christian Rupprecht. 2023. Continual detection transformer for incremental object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 23799\u201323808."},{"key":"e_1_2_1_22_1","volume-title":"A unified approach to interpreting model predictions. Advances in neural information processing systems, 30","author":"Lundberg Scott M","year":"2017","unstructured":"Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30 (2017), https:\/\/dl.acm.org\/doi\/abs\/10.5555\/3295222.3295230"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3122865"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 779\u2013788","author":"Redmon Joseph","year":"2016","unstructured":"Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779\u2013788. https:\/\/doi.org\/10.1109\/CVPR.2016.91 10.1109\/CVPR.2016.91"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135\u20131144","author":"Ribeiro Marco Tulio","year":"2016","unstructured":"Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. \" Why should i trust you?\" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135\u20131144. https:\/\/doi.org\/10.1145\/2939672.2939778 10.1145\/2939672.2939778"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","first-page":"109370","DOI":"10.1016\/j.compeleceng.2024.109370","article-title":"A review of Explainable Artificial Intelligence in healthcare","volume":"118","author":"Sadeghi Zahra","year":"2024","unstructured":"Zahra Sadeghi, Roohallah Alizadehsani, Mehmet Akif Cifci, Samina Kausar, Rizwan Rehman, Priyakshi Mahanta, Pranjal Kumar Bora, Ammar Almasri, Rami S Alkhawaldeh, Sadiq Hussain, et al. 2024. A review of Explainable Artificial Intelligence in healthcare. Computers and Electrical Engineering, 118 (2024), 109370.","journal-title":"Computers and Electrical Engineering"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-0212-3"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the IEEE international conference on computer vision. 618\u2013626","author":"Selvaraju Ramprasaath R","year":"2017","unstructured":"Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618\u2013626. https:\/\/doi.org\/10.1109\/ICCV.2017.74 10.1109\/ICCV.2017.74"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2019.2934629"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the ACM on Human-Computer Interaction, 7, CSCW2","author":"Sun Tong Steven","year":"2023","unstructured":"Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, and Sungsoo Ray Hong. 2023. Designing a direct feedback loop between humans and convolutional neural networks through local explanations. Proceedings of the ACM on Human-Computer Interaction, 7, CSCW2 (2023), 1\u201332. https:\/\/doi.org\/10.1145\/3610187 10.1145\/3610187"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 2019 CHI conference on human factors in computing systems. 1\u201315","author":"Wang Danding","year":"2019","unstructured":"Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1\u201315. https:\/\/doi.org\/10.1145\/3290605.3300831 10.1145\/3290605.3300831"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2024.3357065"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2024.3468352"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201321","author":"Wang Zhijie","year":"2024","unstructured":"Zhijie Wang, Yuheng Huang, Da Song, Lei Ma, and Tianyi Zhang. 2024. Promptcharm: Text-to-image generation through multi-modal prompting and refinement. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201321. https:\/\/doi.org\/10.1145\/3613904.3642803 10.1145\/3613904.3642803"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11301-023-00320-0"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2019.2934619"},{"key":"e_1_2_1_38_1","volume-title":"2019 IEEE Conference on Visual Analytics Science and Technology (VAST). 57\u201368","author":"Xiang Shouxing","year":"2019","unstructured":"Shouxing Xiang, Xi Ye, Jiazhi Xia, Jing Wu, Yang Chen, and Shixia Liu. 2019. Interactive correction of mislabeled training data. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST). 57\u201368. https:\/\/doi.org\/10.1109\/VAST47406.2019.8986943 10.1109\/VAST47406.2019.8986943"},{"key":"e_1_2_1_39_1","volume-title":"2020 IEEE conference on visual analytics science and technology (VAST). 12\u201323","author":"Yang Weikai","year":"2020","unstructured":"Weikai Yang, Zhen Li, Mengchen Liu, Yafeng Lu, Kelei Cao, Ross Maciejewski, and Shixia Liu. 2020. Diagnosing concept drift with visual analytics. In 2020 IEEE conference on visual analytics science and technology (VAST). 12\u201323. https:\/\/doi.org\/10.1109\/VAST50239.2020.00007 10.1109\/VAST50239.2020.00007"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201318","author":"Zeng Xingchen","year":"2024","unstructured":"Xingchen Zeng, Ziyao Gao, Yilin Ye, and Wei Zeng. 2024. IntentTuner: an interactive framework for integrating human intentions in fine-tuning text-to-image generative models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1\u201318. https:\/\/doi.org\/10.1145\/3613904.3642165 10.1145\/3613904.3642165"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2876865"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3773065","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T08:14:36Z","timestamp":1763021676000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3773065"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,13]]},"references-count":41,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,11,13]]}},"alternative-id":["10.1145\/3773065"],"URL":"https:\/\/doi.org\/10.1145\/3773065","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,13]]},"assertion":[{"value":"2025-07-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}