{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T16:52:41Z","timestamp":1772643161884,"version":"3.50.1"},"reference-count":64,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T00:00:00Z","timestamp":1739923200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Ensuring high-quality data warehouses is crucial for organizations, as they provide the reliable information needed for informed decision-making. While various methodologies emphasize the importance of requirements, conceptual, logical, and physical models in developing data warehouses, empirical quality assessment of these models remains underexplored, especially requirements models. To bridge this gap, this study focuses on assessment of requirements metrics for predicting the understandability of requirements schemas, a key indicator of model quality. In this empirical study, 28 requirements schemas were classified into understandable and non-understandable clusters using the k-means clustering technique. The study then employed six classification techniques\u2014logistic regression, naive Bayes, linear discriminant analysis with decision tree, reinforcement learning, voting rule, and a hybrid approach\u2014within both univariate and multivariate models to identify strong predictors of schema understandability. Results indicate that 13 out of 17 requirements metrics are robust predictors of schema understandability. Furthermore, a comparative performance analysis of the classification techniques reveals that the hybrid classifier outperforms other techniques across key evaluation parameters, including accuracy, sensitivity, specificity, and AUC. These findings highlight the potential of requirements metrics as effective predictors of schema understandability, contributing to improved quality assessment and the development of better conceptual data models for data warehouses.<\/jats:p>","DOI":"10.3390\/info16020155","type":"journal-article","created":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T09:36:22Z","timestamp":1739957782000},"page":"155","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Schema Understandability: A Comprehensive Empirical Study of Requirements Metrics"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5658-6683","authenticated-orcid":false,"given":"Tanu","family":"Singh","sequence":"first","affiliation":[{"name":"School of Computer Science, UPES, Dehradun 248007, Uttarakhand, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1270-3454","authenticated-orcid":false,"given":"Vinod","family":"Patidar","sequence":"additional","affiliation":[{"name":"School of Computer Science, UPES, Dehradun 248007, Uttarakhand, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5164-1797","authenticated-orcid":false,"given":"Manu","family":"Singh","sequence":"additional","affiliation":[{"name":"School of Computing Science and Engineering, Galgotias University, Greater Noida 203201, Uttar Pradesh, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0750-8187","authenticated-orcid":false,"given":"\u00c1lvaro","family":"Rocha","sequence":"additional","affiliation":[{"name":"ISEG, University of Lisbon, 1649-004 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1145\/240455.240470","article-title":"The data warehouse and data mining","volume":"39","author":"Inmon","year":"1996","journal-title":"Commun. ACM"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bouzeghoub, M., and Kedad, Z. (2002). Quality in data warehousing. Information and Database Quality, Springer.","DOI":"10.1007\/978-1-4615-0831-1_8"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Rizzi, S., Abell\u00f3, A., Lechtenb\u00f6rger, J., and Trujillo, J. (2006, January 10). Research in data warehouse modeling and design: Dead or alive?. Proceedings of the 9th ACM international workshop on Data warehousing and OLAP, Arlington, VA, USA.","DOI":"10.1145\/1183512.1183515"},{"key":"ref_4","unstructured":"English, L. (1996). Information Quality Improvement: Principles, Methods and Management, Seminar, Information Impact International."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"851","DOI":"10.1016\/j.infsof.2006.09.008","article-title":"Metrics for data warehouse conceptual models understandability","volume":"49","author":"Serrano","year":"2007","journal-title":"Inf. Softw. Technol."},{"key":"ref_6","unstructured":"Lehner, W., Albrecht, J., and Wedekind, H. (1998, January 3). Normal forms for multidimensional databases. Proceedings of the Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), Capri, Italy."},{"key":"ref_7","unstructured":"Vassiliadis, P. (2000, January 5\u20136). Gulliver in the land of data warehousing: Practical experiences and observations of a researcher. Proceedings of the Second Intl. Workshop on Design and Management of Data Warehouses, DMDW 2000, Stockholm, Sweden."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Salinesi, C., and Gam, I. (2009, January 22\u201324). How Specific Should Requirements Engineering Be in the Context of Decision Information Systems?. Proceedings of the 2009 Third International Conference on Research Challenges in Information Science, Fez, Morocco.","DOI":"10.1109\/RCIS.2009.5089288"},{"key":"ref_9","unstructured":"Frendi, M., and Salinesi, C. (2003, January 16\u201317). Requirements engineering for data warehousing. Proceedings of the 9th International Workshop on Requirements Engineering: Foundations of Software Quality, Klagenfurt\/Velden, Austria."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Maz\u00f3n, J.N., Pardillo, J., and Trujillo, J. (2007). A model-driven goal-oriented requirement engineering approach for data warehouses. International Conference on Conceptual Modeling, Springer.","DOI":"10.1007\/978-3-540-76292-8_31"},{"key":"ref_11","unstructured":"Schiefer, J., List, B., and Bruckner, R. (2002, January 9\u201311). A holistic approach for managing requirements of data warehouse systems. Proceedings of the AMCIS 2002, Eighth Americas Conference on Information Systems, Dallas, TX, USA."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/0164-1212(90)90038-N","article-title":"Deriving structurally based software measures","volume":"12","author":"Fenton","year":"1990","journal-title":"J. Syst. Softw."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Fenton, N., and Bieman, J. (2014). Software Metrics: A rigorous and Practical Approach, CRC Press.","DOI":"10.1201\/b17461"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2659118.2659131","article-title":"Assessing the understandability of a data warehouse logical model using a decision-tree approach","volume":"39","author":"Gaur","year":"2014","journal-title":"ACM SIGSOFT Softw. Eng. Notes"},{"key":"ref_15","unstructured":"Serrano, M. (2004). Definition of a Set of Metrics for Assuring Data Warehouse Quality. [Ph.D. Thesis, Univeristy of Castilla]."},{"key":"ref_16","unstructured":"Inmon, W.H. (2005). Building the Data Warehouse, John Wiley & Sons."},{"key":"ref_17","unstructured":"Kimball, R., and Ross, M. (2002). The Data Warehouse Lifecycle Toolkit, John Wiley & Sons. [2nd ed.]."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1049\/iet-sen.2019.0150","article-title":"Comprehensive complexity metric for data warehouse multidimensional model understandability","volume":"14","author":"Gosain","year":"2020","journal-title":"IET Softw."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/s13198-013-0159-4","article-title":"Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse","volume":"5","author":"Kumar","year":"2014","journal-title":"Int. J. Syst. Assur. Eng. Manag."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Serrano, M., Calero, C., Trujillo, J., Luj\u00e1n-Mora, S., and Piattini, M. (2004). Empirical validation of metrics for conceptual models of data warehouses. International Conference on Advanced Information Systems Engineering, Springer.","DOI":"10.1007\/978-3-540-25975-6_36"},{"key":"ref_21","unstructured":"Kumar, M. (2015, January 11\u201313). Validation of data warehouse requirements-model traceability metrics using a formal framework. Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1504\/IJCSYSE.2012.050237","article-title":"Quality-oriented requirements engineering approach for data warehouse","volume":"1","author":"Kumar","year":"2012","journal-title":"Int. J. Comput. Syst. Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1504\/IJCSYSE.2013.057213","article-title":"On completeness and traceability metrics for data warehouse requirements engineering","volume":"1","author":"Kumar","year":"2013","journal-title":"Int. J. Comput. Syst. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Singh, T., and Kumar, M. (2020, January 10\u201313). Empirical Validation of Requirements Traceability Metrics for Requirements Model of Data Warehouse using SVM. Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India.","DOI":"10.1109\/INDICON49873.2020.9342245"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Singh, T., and Kumar, M. (2021, January 6\u20138). Formally Investigating Traceability Metrics of Data Warehouse Requirements Model Using Briand\u2019s Framework. Proceedings of the 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India.","DOI":"10.1109\/ICICCS51141.2021.9432071"},{"key":"ref_26","first-page":"329","article-title":"Empirical study to predict the understandability of requirements schemas of data warehouse using requirements metrics","volume":"9","author":"Singh","year":"2021","journal-title":"Int. J. Intell. Eng. Inform."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Singh, T., and Kumar, M. (2022). Theoretical Validation of Data Warehouse Requirements Metrics Based on Agent Goal Decision Information Model Using Zuse\u2019s Framework. The Communication and Intelligent Systems: Proceedings of ICCIS 2021, Springer Nature.","DOI":"10.1007\/978-981-19-2130-8_9"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9527","DOI":"10.1007\/s13369-021-06269-0","article-title":"Investigating requirements completeness metrics for requirements schemas using requirements engineering approach of data warehouse: A formal and empirical validation","volume":"47","author":"Singh","year":"2022","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"124754","DOI":"10.1016\/j.eswa.2024.124754","article-title":"A novel metric for assessing structural complexity of data warehouse requirements models","volume":"255","author":"Singh","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Singh, T., and Kaushik, B. (2024). Employ Metrics in the Data Warehouse\u2019s Requirements Model for Hospitals. Handbook on Augmenting Telehealth Services, CRC Press.","DOI":"10.1201\/9781003346289-21"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1023\/A:1008956910828","article-title":"A framework for improving the requirements engineering process management","volume":"8","author":"Williams","year":"1999","journal-title":"Softw. Qual. J."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"B\u00f6hnlein, M., and Ulbrich-vom Ende, A. (2000). Business process oriented development of data warehouse structures. Data Warehousing, Physica.","DOI":"10.1007\/978-3-642-57681-2_1"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Winter, R., and Strauch, B. (2003, January 6\u20139). A method for demand-driven information requirements analysis in data warehousing projects. Proceedings of the 36th Annual Hawaii International Conference on System Sciences, Waikoloa Village, HI, USA.","DOI":"10.1109\/HICSS.2003.1174602"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Winter, R., and Strauch, B. (2004, January 14\u201317). Information requirements engineering for data warehouse systems. Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus.","DOI":"10.1145\/967900.968174"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"385","DOI":"10.3745\/JIPS.2010.6.3.385","article-title":"Stakeholders driven requirements engineering approach for data warehouse development","volume":"6","author":"Kumar","year":"2010","journal-title":"J. Inf. Process. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1007\/s13198-015-0363-5","article-title":"A novel requirements engineering approach for designing data warehouses","volume":"7","author":"Kumar","year":"2016","journal-title":"Int. J. Syst. Assur. Eng. Manag."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1007\/s11219-019-09479-w","article-title":"A model-driven engineering approach for supporting questionnaire-based gap analysis processes through application lifecycle management systems","volume":"28","author":"Amalfitano","year":"2020","journal-title":"Softw. Qual. J."},{"key":"ref_38","first-page":"2003","article-title":"Synergizing Requirements Engineering and Quality Assurance: A Comprehensive Exploration in Software Quality Engineering","volume":"12","author":"Pargaonkar","year":"2023","journal-title":"Int. J. Sci. Res."},{"key":"ref_39","unstructured":"Prakash, N., and Gosain, A. (2003, January 16\u201320). Requirements Driven Data Warehouse Development. Proceedings of the CAiSE short paper proceedings, Klagenfurt\/Velden, Austria."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1109\/TSE.1984.5010301","article-title":"A methodology for collecting valid software engineering data","volume":"6","author":"Basili","year":"1984","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wohlin, C., Runeson, P., H\u00f6st, M., Ohlsson, M.C., Regnell, B., and Wessl\u00e9n, A. (2012). In Experimentation in Software Engineering, Springer Science & Business Media.","DOI":"10.1007\/978-3-642-29044-2"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Van Solingen, R., Basili, V., Caldiera, G., and Rombach, H.D. (2002). Goal question metric (gqm) approach. Encycl. Softw. Eng.","DOI":"10.1002\/0471028959.sof142"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Carver, J., Jaccheri, L., Morasca, S., and Shull, F. (2003). Using empirical studies during software courses. Empirical Methods and Studies in Software Engineering, Springer.","DOI":"10.1007\/978-3-540-45143-3_6"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1109\/TSE.2002.1027796","article-title":"Preliminary guidelines for empirical research in software engineering","volume":"28","author":"Kitchenham","year":"2002","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1007\/s11219-007-9030-7","article-title":"Empirical studies to assess the understandability of data warehouse schemas using structural metrics","volume":"16","author":"Serrano","year":"2008","journal-title":"Softw. Qual. J."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Arthur, D., and Vassilvitskii, S. (2006, January 21\u201324). Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. Proceedings of the 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS\u201906), Berkeley, CA, USA.","DOI":"10.1109\/FOCS.2006.79"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., and Lemeshow, S. (2000). Applied Logistic Regression, John Wiley & Sons.","DOI":"10.1002\/0471722146"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/s11334-017-0308-z","article-title":"Investigating structural metrics for understandability prediction of data warehouse multidimensional schemas using machine learning techniques","volume":"14","author":"Gosain","year":"2018","journal-title":"Innov. Syst. Softw. Eng."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1109\/TSE.2012.20","article-title":"Toward comprehensible software fault prediction models using bayesian network classifiers","volume":"39","author":"Dejaeger","year":"2012","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_50","unstructured":"John, G.H., and Langley, P. (2013). Estimating continuous distributions in Bayesian classifiers. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Izenman, A.J. (2013). Linear discriminant analysis. Modern Multivariate Statistical Techniques, Springer.","DOI":"10.1007\/978-0-387-78189-1_8"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1109\/21.97458","article-title":"A survey of decision tree classifier methodology","volume":"21","author":"Safavian","year":"1991","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1007\/s11036-019-01443-z","article-title":"Behavdt: A behavioral decision tree learning to build user-centric context-aware predictive model","volume":"25","author":"Sarker","year":"2020","journal-title":"Mob. Netw. Appl."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1016\/j.jmsy.2021.02.014","article-title":"A study on a Q-Learning algorithm application to a manufacturing assembly problem","volume":"59","author":"Neves","year":"2021","journal-title":"J. Manuf. Syst."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1016\/j.asoc.2016.10.016","article-title":"A Q-learning-based multi-agent system for data classification","volume":"52","author":"Pourpanah","year":"2017","journal-title":"Appl. Soft Comput."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1109\/TPAMI.2003.1159950","article-title":"Sum versus vote fusion in multiple classifier systems","volume":"25","author":"Kittler","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Sokolova, M., Japkowicz, N., and Szpakowicz, S. (2006). Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. Australasian Joint Conference on Artificial Intelligence, Springer.","DOI":"10.1007\/11941439_114"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1145\/507338.507355","article-title":"Data mining: Practical machine learning tools and techniques with Java implementations","volume":"31","author":"Witten","year":"2002","journal-title":"AcmSigmod Rec."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/MS.2005.149","article-title":"Building effective defect-prediction models in practice","volume":"22","author":"Koru","year":"2005","journal-title":"IEEE Softw."},{"key":"ref_60","unstructured":"El Emam, K., Benlarbi, S., Goel, N., and Rai, S. (1999). A Validation of Object-Oriented Metrics, National Research Council Canada, Institute for Information Technology."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0164-1212(99)00102-8","article-title":"Exploring the relationships between design measures and software quality in object-oriented systems","volume":"51","author":"Briand","year":"2000","journal-title":"J. Syst. Softw."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-validatory choice and assessment of statistical predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc. Ser. B"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/2\/155\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:38:05Z","timestamp":1760027885000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/2\/155"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,19]]},"references-count":64,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["info16020155"],"URL":"https:\/\/doi.org\/10.3390\/info16020155","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,19]]}}}