{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T13:39:10Z","timestamp":1776865150960,"version":"3.51.2"},"reference-count":42,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T00:00:00Z","timestamp":1756252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>In software development, software requirement engineering (SRE) is an essential stage that guarantees requirements are clear and unambiguous. However, incomplete inconsistency, and ambiguity in requirement documents often occur, which can cause project delay, cost escalation, or total failure. In response to these challenges, this paper introduces a machine learning method to automatically identify the risk levels of software requirements according to ensemble classification methods. The labeled textual requirement dataset was preprocessed utilizing conventional preprocessing techniques, label encoding, and oversampling with the synthetic minority oversampling technique (SMOTE) to handle class imbalance. Various ensemble and baseline models such as extra trees, random forest, bagging with decision trees, XGBoost, LightGBM, gradient boosting, decision trees, support vector machine, and multi-layer perceptron were trained and compared. Five-fold cross-validation was used to provide stable performance evaluation on accuracy, area under the ROC curve (AUC), F1-score, precision, recall, root mean square error (RMSE), and error rate. The bagging (DT) classifier achieved the best overall performance, with an accuracy of 99.55%, AUC of 0.9971 and an F1-score of 97.23%, while maintaining a low RMSE of 0.03 and error rate of 0.45%. These results demonstrate the effectiveness of ensemble-based classifiers, especially bagging (DT) classifiers, in accurately predicting high-risk software requirements. The proposed method enables early detection and mitigation of requirement risks, aiding project managers and software engineers in improving resource planning, reducing rework, and enhancing overall software quality.<\/jats:p>","DOI":"10.3390\/fi17090387","type":"journal-article","created":{"date-parts":[[2025,8,28]],"date-time":"2025-08-28T07:43:16Z","timestamp":1756366996000},"page":"387","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Ensemble Learning for Software Requirement-Risk Assessment: A Comparative Study of Bagging and Boosting Approaches"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1212-8418","authenticated-orcid":false,"given":"Chandan","family":"Kumar","sequence":"first","affiliation":[{"name":"School of Computing, Amrita Vishwa Vidyapeetham, Amaravati Campus, Agiripalli 522503, Andhra Pradesh, India"}]},{"given":"Pathan Shaheen","family":"Khan","sequence":"additional","affiliation":[{"name":"School of Computing, Amrita Vishwa Vidyapeetham, Amaravati Campus, Agiripalli 522503, Andhra Pradesh, India"}]},{"given":"Medandrao","family":"Srinivas","sequence":"additional","affiliation":[{"name":"Department of Data Science, NRI Institute of Technology, Agiripalli 521212, Andhra Pradesh, India"}]},{"given":"Sudhanshu Kumar","family":"Jha","sequence":"additional","affiliation":[{"name":"Department of Electronics and Communication, University of Allahabad, Prayagraj 211002, Uttar Pradesh, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7610-3004","authenticated-orcid":false,"given":"Shiv","family":"Prakash","sequence":"additional","affiliation":[{"name":"Department of Electronics and Communication, University of Allahabad, Prayagraj 211002, Uttar Pradesh, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4571-1888","authenticated-orcid":false,"given":"Rajkumar Singh","family":"Rathore","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff Llandaff Campus, Cardiff CF5 2YB, UK"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1683","DOI":"10.1002\/cae.22550","article-title":"Understanding general concepts of requirements engineering through design thinking: An experimental study with students","volume":"30","author":"Tiwari","year":"2022","journal-title":"Comput. Appl. Eng. Educ."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1049\/iet-sen.2019.0180","article-title":"Towards the implementation of requirements management specific practices (SP 1.1 and SP 1.2) for small-and medium-sized software development organisations","volume":"14","author":"Keshta","year":"2020","journal-title":"IET Software"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1007\/s12008-022-00968-0","article-title":"Revisiting requirement engineering for intelligent manufacturing","volume":"17","author":"Silva","year":"2023","journal-title":"Int. J. Interact. Des. Manuf."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.infsof.2016.03.008","article-title":"A multi-case study of agile requirements engineering and the use of test cases as requirements","volume":"77","author":"Bjarnason","year":"2016","journal-title":"Inf. Softw. Technol."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mahdi, M.N., Mohamed Zabil, M.H., Ahmad, A.R., Ismail, R., Yusoff, Y., Cheng, L.K., Azmi, M.S.B.M., Natiq, H., and Happala Naidu, H. (2021). Software Project Management Using Machine Learning Technique\u2014A Review. Appl. Sci., 11.","DOI":"10.3390\/app11115183"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kolahdouz-Rahimi, S., Lano, K., and Lin, C. (2023). Requirement formalisation using natural language processing and machine learning: A systematic review. arXiv.","DOI":"10.5220\/0011789700003402"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"62811","DOI":"10.1109\/ACCESS.2022.3182372","article-title":"The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review","volume":"10","author":"Sonbol","year":"2022","journal-title":"IEEE Access"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1016\/j.jss.2016.02.047","article-title":"Rapid quality assurance with requirements smells","volume":"123","author":"Femmer","year":"2017","journal-title":"J. Syst. Softw."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2204","DOI":"10.1007\/s13198-025-02809-1","article-title":"Predicting fault-prone software modules using bayesian belief network: An empirical study","volume":"16","author":"Kumar","year":"2025","journal-title":"Int. J. Syst. Assur. Eng. Manag."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"558","DOI":"10.5530\/jscires.12.3.053","article-title":"Comprehensive Scientometric Analysis and Longitudinal SDG Mapping of Quality and Reliability Engineering International Journal","volume":"12","author":"Kumar","year":"2023","journal-title":"J. Sci. Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1109\/TSE.2022.3173346","article-title":"Machine\/Deep Learning for Software Engineering: A Systematic Literature Review","volume":"49","author":"Wang","year":"2022","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hey, T., Keim, J., Koziolek, A., and Tichy, W.F. (September, January 31). Norbert: Transfer learning for requirements classification. Proceedings of the 2020 IEEE 28th International Requirements Engineering Conference (RE), Zurich, Switzerland.","DOI":"10.1109\/RE48521.2020.00028"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gupta, V., Fernandez-Crehuet, J.M., Hanne, T., and Telesko, R. (2020). Requirements Engineering in Software Startups: A Systematic Mapping Study. Appl. Sci., 10.","DOI":"10.3390\/app10176125"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Jimoh, R.G., Olusanya, O.O., Awotunde, J.B., Imoize, A.L., and Lee, C.-C. (2022). Identification of Risk Factors Using ANFIS-Based Security Risk Assessment Model for SDLC Phases. Future Internet, 14.","DOI":"10.3390\/fi14110305"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Alharbi, I.M., Alyoubi, A.A., Altuwairiqi, M., and Ellatif, M.A. (2021). Analysis of risks assessment in multi software projects development environment using classification techniques. Advanced Machine Learning Technologies and Applications, Proceedings of the AMLTA 2021, Cairo, Egypt, 20\u201322 March 2021, Springer International Publishing.","DOI":"10.1007\/978-3-030-69717-4_78"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Naseem, R., Shaukat, Z., Irfan, M., Shah, M.A., Ahmad, A., Muhammad, F., Glowacz, A., Dunai, L., Antonino-Daviu, J., and Sulaiman, A. (2021). Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction. Electronics, 10.","DOI":"10.3390\/electronics10020168"},{"key":"ref_17","unstructured":"Naumcheva, M. (2021). Deep learning models in software requirements engineering. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mahmud, M.H., Nayan, T.H., Ashir, D.M.N.A., and Kabir, A. (2022). Software Risk Prediction: Systematic Literature Review on Machine Learning Techniques. Appl. Sci., 12.","DOI":"10.3390\/app122211694"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"98220","DOI":"10.1109\/ACCESS.2022.3206382","article-title":"Analysis of Tree-Family Machine Learning Techniques for Risk Prediction in Software Requirements","volume":"10","author":"Khan","year":"2022","journal-title":"IEEE Access"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mamman, H., Balogun, A.O., Basri, S., Capretz, L.F., Adeyemo, V.E., Imam, A.A., and Kumar, G. (2023). Software Requirement Risk Prediction Using Enhanced Fuzzy Induction Models. Electronics, 12.","DOI":"10.3390\/electronics12183805"},{"key":"ref_21","unstructured":"Xu, J., Wang, Y., Li, R., Wang , Z., and Zhao, Q. (2024). An effective software risk prediction management analysis of data using machine learning and data mining method. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"19523","DOI":"10.1007\/s00521-023-08756-x","article-title":"A deep intelligent framework for software risk prediction using improved firefly optimization","volume":"35","author":"Pemmada","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1002\/spe.3009","article-title":"Software effort estimation accuracy prediction of machine learning techniques: A systematic performance evaluation","volume":"52","author":"Mahmood","year":"2022","journal-title":"Softw. Pract. Exp."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Shaukat, Z.S., Naseem, R., and Zubair, M. (2018, January 29\u201331). A dataset for software requirements risk prediction. Proceedings of the 2018 IEEE International Conference on Computational Science and Engineering (CSE), Bucharest, Romania.","DOI":"10.1109\/CSE.2018.00022"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"20678","DOI":"10.1109\/ACCESS.2025.3532716","article-title":"A Secure and Robust Machine Learning Model for Intrusion Detection in Internet of Vehicles","volume":"13","author":"Tiwari","year":"2025","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4903","DOI":"10.1007\/s10994-022-06296-4","article-title":"A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning","volume":"113","author":"Elreedy","year":"2024","journal-title":"Mach. Learn."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Genuer, R., and Poggi, J.M. (2020). Random Forests, Springer International Publishing.","DOI":"10.1007\/978-3-030-56485-8"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ashraf, M.W.A., Singh, A.R., Pandian, A., Rathore, R.S., Bajaj, M., and Zaitsev, I. (2024). A hybrid approach using support vector machine rule-based system: Detecting cyber threats in internet of things. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-78976-1"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Berrouachedi, A., Jaziri, R., and Bernard, G. (2019, January 12\u201315). Deep extremely randomized trees. Proceedings of the Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia.","DOI":"10.1007\/978-3-030-36708-4_59"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1007\/s00357-021-09397-2","article-title":"Comparing boosting and bagging for decision trees of rankings","volume":"39","author":"Plaia","year":"2022","journal-title":"J. Classif."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Abell\u00e1n, J., and Masegosa, A.R. (2010, January 15\u201319). Bagging decision trees on data sets with classification noise. Proceedings of the International Symposium on Foundations of Information and Knowledge Systems, Sofia, Bulgaria.","DOI":"10.1007\/978-3-642-11829-6_17"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1007\/s10462-020-09896-5","article-title":"A comparative analysis of gradient boosting algorithms","volume":"54","year":"2021","journal-title":"Artif. Intell. Rev."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bahad, P., and Saxena, P. (2019, January 20\u201321). Study of adaboost and gradient boosting algorithms for predictive analytics. Proceedings of the International Conference on Intelligent Computing and Smart Communication 2019, Tehri, India.","DOI":"10.1007\/978-981-15-0633-8_22"},{"key":"ref_35","first-page":"36","article-title":"A scalable tree boosting system: XG boost","volume":"7","author":"Nalluri","year":"2020","journal-title":"Int. J. Res. Stud. Sci. Eng. Technol"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., and Zhou, T. (2015). Xgboost: Extreme Gradient Boosting, R package, Scientific Research Publishing Inc.. version 0.4-2 1.","DOI":"10.32614\/CRAN.package.xgboost"},{"key":"ref_37","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). Lightgbm: A highly efficient gradient boosting decision tree. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Advances in Neural Information Processing Systems."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"121530","DOI":"10.1016\/j.eswa.2023.121530","article-title":"An efficient LightGBM-based differential evolution method for nonlinear inelastic truss optimization","volume":"237","author":"Truong","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Suthaharan, S. (2016). Decision tree learning. Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, Springer.","DOI":"10.1007\/978-1-4899-7641-3"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1007\/s10462-018-9614-6","article-title":"Problem formulations and solvers in linear SVM: A review","volume":"52","author":"Chauhan","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Taud, H., and Mas, J.F. (2017). Multilayer perception (MLP). Geomatic Approaches for Modeling Land Change Scenarios, Springer.","DOI":"10.1007\/978-3-319-60801-3_27"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Seraj, A., Mohammadi-Khanaposhtani, M., Daneshfar, R., Naseri, M., Esmaeili, M., Baghban, A., Habibzadeh, S., and Eslamian, S. (2023). Cross-validation. Handbook of Hydroinformatics, Elsevier.","DOI":"10.1016\/B978-0-12-821285-1.00021-X"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/9\/387\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:34:03Z","timestamp":1760034843000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/9\/387"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,27]]},"references-count":42,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["fi17090387"],"URL":"https:\/\/doi.org\/10.3390\/fi17090387","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,27]]}}}