{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T08:25:15Z","timestamp":1773735915424,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T00:00:00Z","timestamp":1664236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Harmful cyanobacterial bloom (HCB) is problematic for drinking water treatment, and some of its strains can produce toxins that significantly affect human health. To better control eutrophication and HCB, catchment managers need to continuously keep track of nitrogen (N) and phosphorus (P) in the water bodies. However, the high-frequency monitoring of these water quality indicators is not economical. In these cases, machine learning techniques may serve as viable alternatives since they can learn directly from the available surrogate data. In the present work, a random forest, extremely randomized trees (ET), extreme gradient boosting, k-nearest neighbors, a light gradient boosting machine, and bagging regressor-based virtual sensors were used to predict N and P in two catchments with contrasting land uses. The effect of data scaling and missing value imputation were also assessed, while the Shapley additive explanations were used to rank feature importance. A specification book, sensitivity analysis, and best practices for developing virtual sensors are discussed. Results show that ET, MinMax scaler, and a multivariate imputer were the best predictive model, scaler, and imputer, respectively. The highest predictive performance, reported in terms of R2, was 97% in the rural catchment and 82% in an urban catchment.<\/jats:p>","DOI":"10.3390\/s22197338","type":"journal-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T03:30:37Z","timestamp":1664335837000},"page":"7338","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques"],"prefix":"10.3390","volume":"22","author":[{"given":"Thulane","family":"Paepae","sequence":"first","affiliation":[{"name":"Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9178-2700","authenticated-orcid":false,"given":"Pitshou","family":"Bokoro","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0773-9476","authenticated-orcid":false,"given":"Kyandoghere","family":"Kyamakya","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, Transportation Informatics, Alpen-Adria Universit\u00e4t Klagenfurt, 9020 Klagenfurt, Austria"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.hal.2016.10.001","article-title":"An overview of cyanobacterial bloom occurrences and research in Africa over the last decade","volume":"60","author":"Ndlela","year":"2016","journal-title":"Harmful Algae"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"103187","DOI":"10.1016\/j.earscirev.2020.103187","article-title":"Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing","volume":"205","author":"Sagan","year":"2020","journal-title":"Earth-Sci. Rev."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1007\/s10661-020-08731-2","article-title":"Estimation of nitrogen and phosphorus concentrations from water quality surrogates using machine learning in the Tri An Reservoir, Vietnam","volume":"192","author":"Ha","year":"2020","journal-title":"Environ. Monit. Assess."},{"key":"ref_4","first-page":"693","article-title":"Eutrophication: Present reality and future challenges for South Africa","volume":"37","year":"2011","journal-title":"Water SA"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1016\/j.hal.2016.02.002","article-title":"Health impacts from cyanobacteria harmful algae blooms: Implications for the North American Great Lakes","volume":"54","author":"Carmichael","year":"2016","journal-title":"Harmful Algae"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.17159\/sajs.2015\/20140193","article-title":"Eutrophication and cyanobacteria in South Africa\u2019s standing water bodies: A view from space","volume":"111","author":"Matthews","year":"2015","journal-title":"S. Afr. J. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1021\/es020793k","article-title":"Environmental costs of freshwater eutrophication in England and Wales","volume":"37","author":"Pretty","year":"2003","journal-title":"Environ. Sci. Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1021\/es801217q","article-title":"Eutrophication of U. S. freshwaters: Analysis of potential economic damages","volume":"43","author":"Dodds","year":"2009","journal-title":"Environ. Sci. Technol."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"115490","DOI":"10.1016\/j.watres.2020.115490","article-title":"Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods","volume":"172","author":"Castrillo","year":"2020","journal-title":"Water Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2033","DOI":"10.1007\/s13369-018-3253-8","article-title":"Chlorine Soft Sensor Based on Extreme Learning Machine for Water Quality Monitoring","volume":"44","author":"Djerioui","year":"2019","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1038\/s41597-020-0478-7","article-title":"Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework","volume":"7","author":"Shen","year":"2020","journal-title":"Sci. Data"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"143005","DOI":"10.1016\/j.scitotenv.2020.143005","article-title":"Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression","volume":"763","author":"Harrison","year":"2021","journal-title":"Sci. Total Environ."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Paepae, T., Bokoro, P.N., and Kyamakya, K. (2021). From fully physical to virtual sensing for water quality assessment: A comprehensive review of the relevant state-of-the-art. Sensors, 21.","DOI":"10.3390\/s21216971"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1111\/1752-1688.12386","article-title":"Emerging Tools for Continuous Nutrient Monitoring Networks: Sensors Advancing Science and Water Resources Protection","volume":"52","author":"Pellerin","year":"2016","journal-title":"J. Am. Water Resour. Assoc."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"14892","DOI":"10.1109\/JSEN.2020.3010134","article-title":"Development of Chemical Oxygen on Demand (COD) Soft Sensor Using Edge Intelligence","volume":"20","author":"Pattanayak","year":"2020","journal-title":"IEEE Sens. J."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1007\/s40747-020-00259-9","article-title":"Machine learning based soft sensor model for BOD estimation using intelligence at edge","volume":"7","author":"Pattnaik","year":"2021","journal-title":"Complex Intell. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wen, X., Hou, D., Tu, D., Zhu, N., Huang, P., Zhang, G., and Zhang, H. (2018). Application of least-squares support vector machines for quantitative evaluation of known contaminant in water distribution system using online water quality parameters. Sensors, 18.","DOI":"10.3390\/s18040938"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Bhattarai, A., Dhakal, S., Gautam, Y., and Bhattarai, R. (2021). Prediction of nitrate and phosphorus concentrations using machine learning algorithms in watersheds with different landuse. Water, 13.","DOI":"10.3390\/w13213096"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.envsoft.2013.12.016","article-title":"Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling","volume":"54","author":"Wu","year":"2014","journal-title":"Environ. Model. Softw."},{"key":"ref_21","first-page":"387","article-title":"Analysis and detection of functional outliers in water quality parameters from different automated monitoring stations in the Nal\u00f3n River Basin (Northern spain)","volume":"22","author":"Torres","year":"2014","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"115350","DOI":"10.1016\/j.watres.2019.115350","article-title":"Soft detection of 5-day BOD with sparse matrix in city harbor water using deep learning techniques","volume":"170","author":"Ma","year":"2020","journal-title":"Water Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1061\/(ASCE)0733-9372(2005)131:4(651)","article-title":"Identifying Outliers in Correlated Water Quality Data","volume":"131","author":"Robinson","year":"2005","journal-title":"J. Environ. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1007\/s12665-019-8087-5","article-title":"Spatial and seasonal variability of the water quality characteristics of a river in Northeast Brazil","volume":"78","author":"Cruz","year":"2019","journal-title":"Environ. Earth Sci."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ahsan, M.M., Mahmud, M.A.P., Saha, P.K., Gupta, K.D., and Siddique, Z. (2021). Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies, 9.","DOI":"10.3390\/technologies9030052"},{"key":"ref_26","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"150","DOI":"10.3390\/w6010150","article-title":"The water quality of the River Enborne, UK: Observations from high-frequency monitoring in a rural, lowland river system","volume":"6","author":"Halliday","year":"2014","journal-title":"Water"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3388","DOI":"10.1002\/hyp.10453","article-title":"High-frequency water quality monitoring in an urban catchment: Hydrochemical dynamics, primary production and implications for the Water Framework Directive","volume":"29","author":"Halliday","year":"2015","journal-title":"Hydrol. Process."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"4323","DOI":"10.5194\/hess-16-4323-2012","article-title":"Hydrochemical processes in lowland rivers: Insights from in situ, high-resolution monitoring","volume":"16","author":"Wade","year":"2012","journal-title":"Hydrol. Earth Syst. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"156377","DOI":"10.1016\/j.scitotenv.2022.156377","article-title":"A catchment-scale model of river water quality by Machine Learning","volume":"838","author":"Zanoni","year":"2022","journal-title":"Sci. Total Environ."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Raymaekers, J., and Rousseeuw, P.J. (2021). Transforming variables to central normality. Mach. Learn., 1\u201323.","DOI":"10.1007\/s10994-021-05960-5"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Linklater, N., and \u00d6rmeci, B. (2013). Real-Time and Near Real-Time Monitoring Options for Water Quality, Elsevier B.V.","DOI":"10.1016\/B978-0-444-59395-5.00008-X"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1016\/j.talanta.2014.09.045","article-title":"A low-cost autonomous optical sensor for water quality monitoring","volume":"132","author":"Murphy","year":"2015","journal-title":"Talanta"},{"key":"ref_34","unstructured":"Lundberg, S.M., and Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf., 30."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Badiru, A.B., and Racz, L. (2018). Handbook of Measurements: Benchmarks for Systems Accuracy and Precision, CRC Press.","DOI":"10.1201\/9781351228817"},{"key":"ref_36","first-page":"1","article-title":"Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development","volume":"5","author":"Scheuerman","year":"2021","journal-title":"Proc. ACM Hum.-Comput. Interact."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"100336","DOI":"10.1016\/j.patter.2021.100336","article-title":"Data and its (dis)contents: A survey of dataset development and use in machine learning research","volume":"2","author":"Paullada","year":"2021","journal-title":"Patterns"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1186\/s13040-017-0154-4","article-title":"PMLB: A large benchmark suite for machine learning evaluation and comparison","volume":"10","author":"Olson","year":"2017","journal-title":"BioData Min."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"89","DOI":"10.5194\/adgeo-5-89-2005","article-title":"Comparison of different efficiency criteria for hydrological model assessment","volume":"5","author":"Krause","year":"2005","journal-title":"Adv. Geosci."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"885","DOI":"10.13031\/2013.23153","article-title":"Model evaluation guidelines for systematic quantification of accuracy in watershed simulations","volume":"50","author":"Moriasi","year":"2007","journal-title":"Trans. ASABE"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1763","DOI":"10.13031\/trans.58.10715","article-title":"Hydrologic and water quality models: Performance measures and evaluation criteria","volume":"58","author":"Moriasi","year":"2015","journal-title":"Trans. ASABE"},{"key":"ref_42","first-page":"77","article-title":"Health hazards of nitrate in drinking water","volume":"17","author":"Terblanche","year":"1991","journal-title":"Water SA"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"101523","DOI":"10.1016\/j.asej.2021.06.009","article-title":"Development of prediction model for phosphate in reservoir water system based machine learning algorithms","volume":"13","author":"Latif","year":"2022","journal-title":"Ain Shams Eng. J."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.ecolmodel.2005.08.010","article-title":"The application of artificial neural networks to flow and phosphorus dynamics in small streams on the Boreal Plain, with emphasis on the role of wetlands","volume":"191","author":"Nour","year":"2006","journal-title":"Ecol. Modell."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7338\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:40:38Z","timestamp":1760143238000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7338"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,27]]},"references-count":44,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["s22197338"],"URL":"https:\/\/doi.org\/10.3390\/s22197338","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,27]]}}}