{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T10:27:13Z","timestamp":1776680833868,"version":"3.51.2"},"reference-count":67,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T00:00:00Z","timestamp":1674000000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Water is a valuable, necessary and unfortunately rare commodity in both developing and developed countries all over the world. It is undoubtedly the most important natural resource on the planet and constitutes an essential nutrient for human health. Geo-environmental pollution can be caused by many different types of waste, such as municipal solid, industrial, agricultural (e.g., pesticides and fertilisers), medical, etc., making the water unsuitable for use by any living being. Therefore, finding efficient methods to automate checking of water suitability is of great importance. In the context of this research work, we leveraged a supervised learning approach in order to design as accurate as possible predictive models from a labelled training dataset for the identification of water suitability, either for consumption or other uses. We assume a set of physiochemical and microbiological parameters as input features that help represent the water\u2019s status and determine its suitability class (namely safe or nonsafe). From a methodological perspective, the problem is treated as a binary classification task, and the machine learning models\u2019 performance (such as Naive Bayes\u2013NB, Logistic Regression\u2013LR, k Nearest Neighbours\u2013kNN, tree-based classifiers and ensemble techniques) is evaluated with and without the application of class balancing (i.e., use or nonuse of Synthetic Minority Oversampling Technique\u2013SMOTE), comparing them in terms of Accuracy, Recall, Precision and Area Under the Curve (AUC). In our demonstration, results show that the Stacking classification model after SMOTE with 10-fold cross-validation outperforms the others with an Accuracy and Recall of 98.1%, Precision of 100% and an AUC equal to 99.9%. In conclusion, in this article, a framework is presented that can support the researchers\u2019 efforts toward water quality prediction using machine learning (ML).<\/jats:p>","DOI":"10.3390\/computation11020016","type":"journal-article","created":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T02:04:35Z","timestamp":1674093875000},"page":"16","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":75,"title":["Efficient Data-Driven Machine Learning Models for Water Quality Prediction"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5647-2929","authenticated-orcid":false,"given":"Elias","family":"Dritsas","sequence":"first","affiliation":[{"name":"Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7793-0407","authenticated-orcid":false,"given":"Maria","family":"Trigka","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,18]]},"reference":[{"key":"ref_1","unstructured":"(2022, December 09). World Water Day. Available online: https:\/\/www.worldwaterday.org\/."},{"key":"ref_2","first-page":"35","article-title":"Drinking water quality source of life","volume":"2","author":"Khikmatovna","year":"2021","journal-title":"Web Sci. Int. Sci. Res. J."},{"key":"ref_3","unstructured":"Fateeva, K.V., and Filimonova, N.G. (2018, January 11\u201312). THE WATER IS THE SOURCE OF LIFE. THE PROBLEMS OF POLLUTION OF WATER SOURCES. Proceedings of the Experientia Est Optima Magistra, Belgorod, Russia."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11214-018-0476-7","article-title":"The importance of water for life","volume":"214","author":"Westall","year":"2018","journal-title":"Space Sci. Rev."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ward, M.H., Jones, R.R., Brender, J.D., De Kok, T.M., Weyer, P.J., Nolan, B.T., Villanueva, C.M., and Van Breda, S.G. (2018). Drinking water nitrate and human health: An updated review. Int. J. Environ. Res. Public Health, 15.","DOI":"10.3390\/ijerph15071557"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.mad.2013.11.009","article-title":"Water-loss dehydration and aging","volume":"136","author":"Hooper","year":"2014","journal-title":"Mech. Ageing Dev."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jayaswal, K., Sahu, V., and Gurjar, B. (2018). Water pollution, human health and remediation. Water Remediation, Springer.","DOI":"10.1007\/978-981-10-7551-3_2"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Dickens, C., and McCartney, M. (2020). Water-Related Ecosystems. Clean Water and Sanitation, Springer.","DOI":"10.1007\/978-3-319-70061-8_100-1"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hakimdavar, R., Hubbard, A., Policelli, F., Pickens, A., Hansen, M., Fatoyinbo, T., Lagomasino, D., Pahlevan, N., Unninayar, S., and Kavvada, A. (2020). Monitoring water-related ecosystems with earth observation data in support of Sustainable Development Goal (SDG) 6 reporting. Remote Sens., 12.","DOI":"10.3390\/rs12101634"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"133875","DOI":"10.1016\/j.chemosphere.2022.133875","article-title":"Twenty years of China\u2019s water pollution control: Experiences and challenges","volume":"295","author":"Tang","year":"2022","journal-title":"Chemosphere"},{"key":"ref_11","first-page":"1","article-title":"Factors affecting water pollution: A review","volume":"7","author":"Chaudhry","year":"2017","journal-title":"J. Ecosyst. Ecography"},{"key":"ref_12","unstructured":"World Health Organization (2021). A Global Overview of National Regulations and Standards for Drinking-Water Quality, World Health Organization."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wen, X., Chen, F., Lin, Y., Zhu, H., Yuan, F., Kuang, D., Jia, Z., and Yuan, Z. (2020). Microbial indicators and their use for monitoring drinking water quality\u2014A review. Sustainability, 12.","DOI":"10.3390\/su12062249"},{"key":"ref_14","first-page":"1","article-title":"Data centre water consumption","volume":"4","author":"Mytton","year":"2021","journal-title":"npj Clean Water"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Canter, L.W. (2020). Ground Water Pollution Control, CRC Press.","DOI":"10.1201\/9781003069775"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mishra, B.K., Kumar, P., Saraswat, C., Chakraborty, S., and Gautam, A. (2021). Water security in a changing environment: Concept, challenges and solutions. Water, 13.","DOI":"10.3390\/w13040490"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"119611","DOI":"10.1016\/j.envpol.2022.119611","article-title":"Indices and models of surface water quality assessment: Review and perspectives","volume":"308","author":"Yan","year":"2022","journal-title":"Environ. Pollut."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Park, J., Kim, K.T., and Lee, W.H. (2020). Recent advances in information and communications technology (ICT) and sensor technology for monitoring water quality. Water, 12.","DOI":"10.3390\/w12020510"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, P., Wang, J., Sangaiah, A.K., Xie, Y., and Yin, X. (2019). Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability, 11.","DOI":"10.3390\/su11072058"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Braga, F.H.R., Dutra, M.L.S., Lima, N.S., da Silva, G.M., de C\u00e1ssia Mendon\u00e7a de Miranda, R., da Cunha Ara\u00fajo Firmo, W., de Moura, A.R.L., de Souza Monteiro, A., da Silva, L.C.N., and da Silva, D.F. (2022). Study of the Influence of Physicochemical Parameters on the Water Quality Index (WQI) in the Maranh\u00e3o Amazon, Brazil. Water, 14.","DOI":"10.3390\/w14101546"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ahmed, U., Mumtaz, R., Anwar, H., Shah, A.A., Irfan, R., and Garc\u00eda-Nieto, J. (2019). Efficient water quality prediction using supervised machine learning. Water, 11.","DOI":"10.3390\/w11112210"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"124084","DOI":"10.1016\/j.jhydrol.2019.124084","article-title":"Machine learning methods for better water quality prediction","volume":"578","author":"Ahmed","year":"2019","journal-title":"J. Hydrol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"137612","DOI":"10.1016\/j.scitotenv.2020.137612","article-title":"Improving prediction of water quality indices using novel hybrid machine-learning algorithms","volume":"721","author":"Bui","year":"2020","journal-title":"Sci. Total Environ."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"126169","DOI":"10.1016\/j.chemosphere.2020.126169","article-title":"Hybrid decision tree-based machine learning models for short-term water quality prediction","volume":"249","author":"Lu","year":"2020","journal-title":"Chemosphere"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"144459","DOI":"10.1016\/j.scitotenv.2020.144459","article-title":"A novel machine learning application: Water quality resilience prediction Model","volume":"768","author":"Imani","year":"2021","journal-title":"Sci. Total. Environ."},{"key":"ref_26","first-page":"294","article-title":"Machine learning approaches for anomaly detection of water quality on a real-world data set","volume":"3","author":"Muharemi","year":"2019","journal-title":"J. Inf. Telecommun."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2166\/wqrj.2018.025","article-title":"Water quality prediction using machine learning methods","volume":"53","author":"Haghiabi","year":"2018","journal-title":"Water Qual. Res. J."},{"key":"ref_28","unstructured":"(2022, December 09). Water Quality. Available online: https:\/\/www.kaggle.com\/datasets\/mssmartypants\/water-quality."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"155189","DOI":"10.1016\/j.jallcom.2020.155189","article-title":"An overview on activation of aluminium-water reaction for enhanced hydrogen production","volume":"835","author":"Kumar","year":"2020","journal-title":"J. Alloys Compd."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.chemosphere.2018.03.098","article-title":"Ecological risks posed by ammonia nitrogen (AN) and un-ionized ammonia (NH3) in seven major river systems of China","volume":"202","author":"Zhang","year":"2018","journal-title":"Chemosphere"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s40726-019-0102-7","article-title":"Arsenic in drinking water: Is 10 \u03bcg\/L a safe limit?","volume":"5","author":"Ahmad","year":"2019","journal-title":"Curr. Pollut. Rep."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Oskarsson, A. (2022). Barium. Handbook on the Toxicology of Metals, Elsevier.","DOI":"10.1016\/B978-0-12-822946-0.00003-9"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4864365","DOI":"10.1155\/2018\/4864365","article-title":"Role of phytoremediation in reducing cadmium toxicity in soil and water","volume":"2018","author":"Mahajan","year":"2018","journal-title":"J. Toxicol."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1039\/D1EW00640A","article-title":"Review of chloramine decay models in drinking water system","volume":"8","author":"Hossain","year":"2022","journal-title":"Environ. Sci. Water Res. Technol."},{"key":"ref_35","unstructured":"World Health Organization (2020). Chromium in Drinking-Water, World Health Organization. Technical Report."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"9021","DOI":"10.1039\/C8DT01876F","article-title":"Water oxidation by a copper (II) complex: New findings, questions, challenges and a new hypothesis","volume":"47","author":"Najafpour","year":"2018","journal-title":"Dalton Trans."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1080\/10643389.2019.1647028","article-title":"Fluoride and human health: Systematic appraisal of sources, exposures, metabolism, and toxicity","volume":"50","author":"Kabir","year":"2020","journal-title":"Crit. Rev. Environ. Sci. Technol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"128188","DOI":"10.1016\/j.jclepro.2021.128188","article-title":"Microplastics act as an important protective umbrella for bacteria during water\/wastewater disinfection","volume":"315","author":"Shen","year":"2021","journal-title":"J. Clean. Prod."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"124656","DOI":"10.1016\/j.jhazmat.2020.124656","article-title":"Recent advances in biosensors for detecting viruses in water and wastewater","volume":"410","author":"Pilevar","year":"2021","journal-title":"J. Hazard. Mater."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1007\/s40572-018-0193-0","article-title":"Public health consequences of lead in drinking water","volume":"5","author":"Levallois","year":"2018","journal-title":"Curr. Environ. Health Rep."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"116303","DOI":"10.1016\/j.watres.2020.116303","article-title":"Evaluating biochar and its modifications for the removal of ammonium, nitrate, and phosphate in water","volume":"186","author":"Zhang","year":"2020","journal-title":"Water Res."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"139","DOI":"10.4265\/bio.23.139","article-title":"Behavior of nitrate-nitrogen and nitrite-nitrogen in drinking water","volume":"23","author":"Sato","year":"2018","journal-title":"Biocontrol Sci."},{"key":"ref_43","first-page":"22","article-title":"Recent advances in the analysis of mercury in water-review","volume":"12","author":"Foteinis","year":"2016","journal-title":"Curr. Anal. Chem."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lisco, G., De Tullio, A., Giagulli, V.A., De Pergola, G., and Triggiani, V. (2020). Interference on iodine uptake and human thyroid function by perchlorate-contaminated water and food. Nutrients, 12.","DOI":"10.3390\/nu12061669"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"100125","DOI":"10.1016\/j.wri.2020.100125","article-title":"Modern technologies for radium removal from water\u2013Polish mining industry case study","volume":"23","author":"Wysocka","year":"2020","journal-title":"Water Resour. Ind."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"28619","DOI":"10.1007\/s11356-018-2885-2","article-title":"Selenium and drinking water quality indicators in Mongolia","volume":"25","author":"Golubkina","year":"2018","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_47","unstructured":"World Health Organization (2021). Silver in Drinking Water: Background Document for Development of WHO Guidelines for Drinking-Water Quality, World Health Organization."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1551","DOI":"10.1007\/s00204-020-02676-8","article-title":"Uranium in drinking water: A public health threat","volume":"94","author":"Semenova","year":"2020","journal-title":"Arch. Toxicol."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1016\/j.asoc.2018.12.024","article-title":"An alternative SMOTE oversampling strategy for high-dimensional datasets","volume":"76","author":"Maldonado","year":"2019","journal-title":"Appl. Soft Comput."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Dritsas, E., Fazakis, N., Kocsis, O., Moustakas, K., and Fakotakis, N. (2021, January 12\u201314). Optimal Team Pairing of Elder Office Employees with Machine Learning on Synthetic Data. Proceedings of the 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), Chania Crete, Greece.","DOI":"10.1109\/IISA52424.2021.9555511"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1763","DOI":"10.1213\/ANE.0000000000002864","article-title":"Correlation coefficients: Appropriate use and interpretation","volume":"126","author":"Schober","year":"2018","journal-title":"Anesth. Analg."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Darst, B.F., Malecki, K.C., and Engelman, C.D. (2018). Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet., 19.","DOI":"10.1186\/s12863-018-0633-8"},{"key":"ref_53","first-page":"612","article-title":"Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm","volume":"11","author":"Tangirala","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_54","first-page":"3640","article-title":"Classification algorithms with attribute selection: An evaluation study using WEKA","volume":"9","author":"Gnanambal","year":"2018","journal-title":"Int. J. Adv. Netw. Appl."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Berrar, D. (2018). Bayes\u2019 theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, Elsevier.","DOI":"10.1016\/B978-0-12-809633-8.20473-1"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"e1249","DOI":"10.1002\/widm.1249","article-title":"Ensemble Learning: A survey","volume":"8","author":"Sagi","year":"2018","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/j.inffus.2020.07.007","article-title":"A practical tutorial on Bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities","volume":"64","author":"Rokach","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s42452-019-0394-7","article-title":"Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification","volume":"1","author":"Shuaib","year":"2019","journal-title":"SN Appl. Sci."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Parmar, A., Katariya, R., and Patel, V. (2018). A review on random forest: An ensemble classifier. Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things, Springer.","DOI":"10.1007\/978-3-030-03146-6_86"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Polat, K., and Sent\u00fcrk, U. (2018, January 19\u201321). A novel ML approach to prediction of breast cancer: Combining of mad normalization, KMC based feature weighting and AdaBoostM1 classifier. Proceedings of the 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, Turkey.","DOI":"10.1109\/ISMSIT.2018.8567245"},{"key":"ref_61","first-page":"40","article-title":"An ensemble approach for classification and prediction of diabetes mellitus using soft Voting classifier","volume":"2","author":"Kumari","year":"2021","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Pavlyshenko, B. (2018, January 21\u201325). Using Stacking approaches for machine learning models. Proceedings of the 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine.","DOI":"10.1109\/DSMP.2018.8478522"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/s12553-020-00509-3","article-title":"Multilayer perceptron based deep neural network for early detection of coronary heart disease","volume":"11","author":"Masih","year":"2021","journal-title":"Health Technol."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.jclinepi.2019.02.004","article-title":"A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models","volume":"110","author":"Christodoulou","year":"2019","journal-title":"J. Clin. Epidemiol."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3459665","article-title":"k-Nearest neighbour classifiers-A Tutorial","volume":"54","author":"Cunningham","year":"2021","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_66","unstructured":"(2022, December 09). Waikato Environment for Knowledge Analysis. Available online: https:\/\/www.weka.io\/."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5121\/ijdkp.2015.5201","article-title":"A review on evaluation metrics for data classification evaluations","volume":"5","author":"Hossin","year":"2015","journal-title":"Int. J. Data Min. Knowl. Manag. Process."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/11\/2\/16\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:09:45Z","timestamp":1760119785000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/11\/2\/16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,18]]},"references-count":67,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["computation11020016"],"URL":"https:\/\/doi.org\/10.3390\/computation11020016","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,18]]}}}