{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T04:43:25Z","timestamp":1777869805361,"version":"3.51.4"},"reference-count":49,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T00:00:00Z","timestamp":1777507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012456","name":"National Social Science Fund of China","doi-asserted-by":"publisher","award":["2022SKJJC024"],"award-info":[{"award-number":["2022SKJJC024"]}],"id":[{"id":"10.13039\/501100012456","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Software effort estimation (SEE) serves as a cornerstone of effective software project management, and case-based reasoning (CBR) stands out as one of the most extensively adopted approaches within this domain. Nevertheless, CBR-based SEE models are still plagued by two critical challenges: conventional case retrieval mechanisms lack the ability to differentiate the relative importance of various features, and data scarcity remains a persistent bottleneck. Both issues significantly compromise the estimation accuracy and interpretability of the models. To address these limitations, we propose a SHAP\u2013Mixup synergistic framework that enhances both feature-aware similarity learning and data distribution modeling. Specifically, we introduce (1) a stability-aware SHAP-weighted similarity metric that integrates both the magnitude and variance of feature contributions to improve retrieval robustness, and (2) a density-aware Mixup augmentation strategy that generates synthetic samples guided by local data manifold structure rather than random interpolation. Experimental results on seven benchmark datasets demonstrate that the proposed method reduces MAE and MSE by up to 20.2% on average compared to baseline CBR models, while consistently improving Pred(0.25). Furthermore, by enhancing model interpretability, the proposed method equips project managers with actionable insights into the key drivers of software effort, thereby facilitating more informed and efficient resource allocation. Building on these findings, this study provides a novel and effective pathway for developing SEE models that are more accurate, robust, and transparent.<\/jats:p>","DOI":"10.3390\/info17050431","type":"journal-article","created":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:35:53Z","timestamp":1777559753000},"page":"431","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SHAP-Value-Weighted Case-Based Reasoning Model with Improved Mixup Data Augmentation for Software Effort Estimation"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3506-3266","authenticated-orcid":false,"given":"Jing","family":"Li","sequence":"first","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Han","family":"Zhang","sequence":"additional","affiliation":[{"name":"The School of Nuclear Technology and Chemical and Biological College, Hubei University of Science and Technology, Xianning 437199, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1592-3751","authenticated-orcid":false,"given":"Shengxiang","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8966-1726","authenticated-orcid":false,"given":"Mingchi","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sishi","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0964-3816","authenticated-orcid":false,"given":"Chen","family":"Zhu","sequence":"additional","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2126-504X","authenticated-orcid":false,"given":"Kai","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Management Engineering and Equipment Economics, Naval University of Engineering, Wuhan 430033, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,4,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2599","DOI":"10.1109\/TSE.2020.3047072","article-title":"Sequential Model Optimization for Software Effort Estimation","volume":"48","author":"Xia","year":"2022","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"107088","DOI":"10.1016\/j.infsof.2022.107088","article-title":"An optimized case-based software project effort estimation using genetic algorithm","volume":"153","author":"Hameed","year":"2023","journal-title":"Inf. Softw. Technol."},{"key":"ref_3","first-page":"12151","article-title":"Multi-kernel support vector regression with improved moth-flame optimization algorithm for software effort estimation","volume":"12","author":"Li","year":"2022","journal-title":"Sci. Rep."},{"key":"ref_4","first-page":"22","article-title":"Software effort estimation by genetic algorithm tuned parameters of modified constructive cost model for NASA software projects","volume":"59","author":"Singh","year":"2012","journal-title":"Int. J. Comput. Appl."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1023\/B:EMSE.0000039882.39206.5a","article-title":"Group processes in software effort estimation","volume":"9","author":"Jorgensen","year":"2004","journal-title":"Empir. Softw. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/TSE.2007.256943","article-title":"A systematic review of software development cost estimation studies","volume":"33","author":"Jorgensen","year":"2007","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"107436","DOI":"10.1016\/j.infsof.2024.107447","article-title":"Ensemble effort estimation for novice agile teams","volume":"170","author":"Alsaadi","year":"2024","journal-title":"Inf. Softw. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.jss.2018.09.054","article-title":"Developing and using checklists to improve software effort estimation: A multi-case study","volume":"146","author":"Usman","year":"2018","journal-title":"J. Syst. Softw."},{"key":"ref_9","unstructured":"Boehm, B.W., Abts, A., Brown, A.W., Chulani, S., Clark, B.K., Horowitz, E., Madachy, R., Reifer, D., and Steece, B. (2000). Software Cost Estimation with COCOMO II, Prentice Hall."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"895","DOI":"10.1109\/32.553638","article-title":"Function points analysis: An empirical study of its measurement processes","volume":"22","author":"Abran","year":"1996","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.jss.2018.10.019","article-title":"Determining relevant training data for effort estimation using Window-based COCOMO calibration","volume":"147","author":"Nguyen","year":"2019","journal-title":"J. Syst. Softw."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"124513","DOI":"10.1016\/j.eswa.2024.124733","article-title":"A practical exploration of the convergence of Case-Based Reasoning and Explainable Artificial Intelligence","volume":"255","author":"Pradeep","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1016\/j.eswa.2005.06.021","article-title":"Least modification principle for case-based reasoning: A software project planning experience","volume":"30","author":"Lee","year":"2006","journal-title":"Expert Syst. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"106330","DOI":"10.1016\/j.infsof.2020.106330","article-title":"On an Optimal Analogy-based Software Effort Estimation","volume":"125","author":"Phannachitta","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1005","DOI":"10.4236\/jsea.2010.311118","article-title":"Case-Based Reasoning for Reducing Software Development Effort","volume":"3","author":"Brady","year":"2010","journal-title":"J. Softw. Eng. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1007\/s10586-024-04858-w","article-title":"Ensembling Harmony Search Algorithm with case-based reasoning for software development effort estimation","volume":"28","author":"Mustyala","year":"2025","journal-title":"Clust. Comput."},{"key":"ref_17","first-page":"4765","article-title":"A Unified Approach to Interpreting Model Predictions","volume":"30","author":"Lundberg","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst. (NIPS)"},{"key":"ref_18","first-page":"108815","article-title":"Multifactor Interpretability Method for Offshore Wind Power Output Prediction Based on TPE-CatBoost-SHAP","volume":"110","author":"Ruan","year":"2023","journal-title":"Comput. Electr. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1210","DOI":"10.1109\/JBHI.2023.3248139","article-title":"A Model-Agnostic Feature Attribution Approach to Magnetoencephalography Predictions Based on Shapley Value","volume":"27","author":"Fan","year":"2023","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"130650","DOI":"10.1016\/j.jhydrol.2024.130650","article-title":"Multiple spatio-temporal scale runoff forecasting and driving mechanism exploration by K-means optimized XGBoost and SHAP","volume":"630","author":"Wang","year":"2024","journal-title":"J. Hydrol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1500","DOI":"10.1109\/TEM.2020.2996799","article-title":"The Interactive Weighting Method Concerning Consistency, Inadequacy, Complementary and Supplementary Properties of Criteria","volume":"69","author":"Mostofi","year":"2022","journal-title":"IEEE Trans. Eng. Manag."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"856","DOI":"10.1016\/j.bbe.2022.06.007","article-title":"Diagnosis of Parkinson\u2019s disease based on SHAP value feature selection","volume":"42","author":"Liu","year":"2022","journal-title":"Biocybern. Biomed. Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"106257","DOI":"10.1016\/j.scs.2025.106257","article-title":"Impacts of land surface temperature and ambient factors on near-surface air temperature estimation: A multisource evaluation using SHAP analysis","volume":"122","author":"Li","year":"2025","journal-title":"Sustain. Cities Soc."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"120136","DOI":"10.1016\/j.eswa.2023.120136","article-title":"K-mixup: Data augmentation for offline reinforcement learning using mixup in a Koopman invariant subspace","volume":"225","author":"Jang","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_25","first-page":"5961","article-title":"Attention Mechanism and Mixup Data Augmentation for Classification of COVID-19 Computed Tomography Images","volume":"34","year":"2022","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_26","first-page":"5902713","article-title":"3-D gravity intelligent inversion by U-Net network with data augmentation","volume":"61","author":"Zhou","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"106789","DOI":"10.1016\/j.neunet.2025.107295","article-title":"Augmenting sparse behavior data for user identity linkage with self-generated by model and mixup-generated samples","volume":"187","author":"Huang","year":"2025","journal-title":"Neural Netw."},{"key":"ref_28","first-page":"108110","article-title":"A supervised case-based reasoning approach for explainable thyroid nodule diagnosis","volume":"240","author":"Xu","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.compind.2015.06.007","article-title":"Hybrid weighted mean for CBR adaptation in mechanical design by exploring effective, correlative and adaptative values","volume":"81","author":"Qi","year":"2016","journal-title":"Comput. Ind."},{"key":"ref_30","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (May, January 30). mixup: Beyond Empirical Risk Minimization. Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"4998","DOI":"10.1109\/TMM.2023.3330106","article-title":"A New Data Augmentation Method Based on Mixup and Dempster-Shafer Theory","volume":"26","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Multimed."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1384","DOI":"10.1109\/TPWRS.2023.3248941","article-title":"Using SHAP values and machine learning to understand trends in the transient stability limit","volume":"39","author":"Hamilton","year":"2024","journal-title":"IEEE Trans. Power Syst."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"103445","DOI":"10.1016\/j.redox.2024.103470","article-title":"Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants","volume":"79","author":"Qi","year":"2025","journal-title":"Redox Biol."},{"key":"ref_34","unstructured":"Boetticher, G., Menzies, T., and Ostrand, T. (2007, May 20). PROMISE Data Repository. Available online: https:\/\/promisedata.org\/."},{"key":"ref_35","unstructured":"(2022, January 24). International Software Benchmarking Standards Group, ISBSG Website. Available online: https:\/\/www.isbsg.org\/."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.infsof.2015.07.004","article-title":"An empirical analysis of data preprocessing for machine learning-based software cost estimation","volume":"67","author":"Huang","year":"2015","journal-title":"Inf. Softw. Technol."},{"key":"ref_37","first-page":"15","article-title":"On the software projects\u2019 duration estimation using support vector regression","volume":"663","author":"Van","year":"2023","journal-title":"Lect. Notes Netw. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"14449","DOI":"10.1007\/s00521-024-09855-z","article-title":"Software effort estimation: Advances, challenges, and future directions","volume":"36","author":"Azzeh","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/0950-5849(92)90077-3","article-title":"Empirical studies of assumptions that underlie software cost-estimation models","volume":"34","author":"Kitchenham","year":"1992","journal-title":"Inf. Softw. Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1023\/A:1021713901879","article-title":"Ranking learning algorithms: Using IBL and meta-learning","volume":"50","author":"Brazdil","year":"2003","journal-title":"Mach. Learn."},{"key":"ref_41","unstructured":"Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., and Bengio, Y. (2019, January 9\u201315). Manifold mixup: Better representations by interpolating hidden states. Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/s42256-019-0138-9","article-title":"From local explanations to global understanding with explainable AI for trees","volume":"2","author":"Lundberg","year":"2020","journal-title":"Nat. Mach. Intell."},{"key":"ref_43","unstructured":"Molnar, C. (2022). Interpretable Machine Learning, Lulu Press, Inc.. [2nd ed.]."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/1007730.1007735","article-title":"A study of the behavior of several methods for balancing machine learning training data","volume":"6","author":"Batista","year":"2004","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3379443","article-title":"A systematic review of software development effort estimation using machine learning techniques","volume":"54","author":"Li","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1007\/s10515-010-0069-5","article-title":"Defect prediction from static code features: Current results, limitations, new approaches","volume":"17","author":"Menzies","year":"2010","journal-title":"Autom. Softw. Eng."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Geva, M., and Wiseman, Y. (2007, January 13\u201315). Distributed shared memory integration. Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI), Las Vegas, NV, USA.","DOI":"10.1109\/IRI.2007.4296612"},{"key":"ref_49","unstructured":"Chen, Y., Chen, J., Weng, Y., Chang, C., Yu, D., and Lin, G. (2025). Adamixup: A dynamic defense framework for membership inference attack mitigation. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/5\/431\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:42:31Z","timestamp":1777560151000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/5\/431"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,30]]},"references-count":49,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2026,5]]}},"alternative-id":["info17050431"],"URL":"https:\/\/doi.org\/10.3390\/info17050431","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,30]]}}}