{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T07:10:01Z","timestamp":1770966601372,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T00:00:00Z","timestamp":1673913600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>To better control eutrophication, reliable and accurate information on phosphorus and nitrogen loading is desired. However, the high-frequency monitoring of these variables is economically impractical. This necessitates using virtual sensing to predict them by utilizing easily measurable variables as inputs. While the predictive performance of these data-driven, virtual-sensor models depends on the use of adequate training samples (in quality and quantity), the procurement and operational cost of nitrogen and phosphorus sensors make it impractical to acquire sufficient samples. For this reason, the variational autoencoder, which is one of the most prominent methods in generative models, was utilized in the present work for generating synthetic data. The generation capacity of the model was verified using water-quality data from two tributaries of the River Thames in the United Kingdom. Compared to the current state of the art, our novel data augmentation\u2014including proper experimental settings or hyperparameter optimization\u2014improved the root mean squared errors by 23\u201363%, with the most significant improvements observed when up to three predictors were used. In comparing the predictive algorithms\u2019 performances (in terms of the predictive accuracy and computational cost), k-nearest neighbors and extremely randomized trees were the best-performing algorithms on average.<\/jats:p>","DOI":"10.3390\/s23031061","type":"journal-article","created":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T04:28:47Z","timestamp":1673929727000},"page":"1061","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Data Augmentation for a Virtual-Sensor-Based Nitrogen and Phosphorus Monitoring"],"prefix":"10.3390","volume":"23","author":[{"given":"Thulane","family":"Paepae","sequence":"first","affiliation":[{"name":"Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9178-2700","authenticated-orcid":false,"given":"Pitshou","family":"Bokoro","sequence":"additional","affiliation":[{"name":"Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0773-9476","authenticated-orcid":false,"given":"Kyandoghere","family":"Kyamakya","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, Transportation Informatics, Alpen-Adria Universit\u00e4t Klagenfurt, 9020 Klagenfurt, Austria"},{"name":"Facult\u00e9 Polytechnique, Universit\u00e9 de Kinshasa, P.O. Box 127, Kinshasa XI, Democratic Republic of the Congo"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1631\/jzus.B0710626","article-title":"Mechanisms and assessment of water eutrophication","volume":"9","author":"Yang","year":"2008","journal-title":"J. Zhejiang Univ. Sci. B"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xia, R., Zhang, Y., Critto, A., Wu, J., Fan, J., Zheng, Z., and Zhang, Y. (2016). The Potential Impacts of Climate Change Factors on Freshwater Eutrophication: Implications for Research and Countermeasures of Water Management in China. Sustainability, 8.","DOI":"10.3390\/su8030229"},{"key":"ref_3","first-page":"693","article-title":"Eutrophication: Present reality and future challenges for South Africa","volume":"37","year":"2011","journal-title":"Water SA"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"101604","DOI":"10.1016\/j.eti.2021.101604","article-title":"World eutrophic pollution of lake and river: Biotreatment potential and future perspectives","volume":"23","author":"Kakade","year":"2021","journal-title":"Environ. Technol. Innov."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1111\/1752-1688.12386","article-title":"Emerging Tools for Continuous Nutrient Monitoring Networks: Sensors Advancing Science and Water Resources Protection","volume":"52","author":"Pellerin","year":"2016","journal-title":"J. Am. Water Resour. Assoc."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Paepae, T., Bokoro, P.N., and Kyamakya, K. (2021). From Fully Physical to Virtual Sensing for Water Quality Assessment: A Comprehensive Review of the Relevant State-of-the-Art. Sensors, 21.","DOI":"10.3390\/s21216971"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1016\/j.scitotenv.2016.06.116","article-title":"Real-time monitoring of nutrients and dissolved organic matter in rivers: Capturing event dynamics, technological opportunities and future directions","volume":"569\u2013570","author":"Blaen","year":"2016","journal-title":"Sci. Total Environ."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.jhydrol.2011.05.020","article-title":"Limitations of instantaneous water quality sampling in surface-water catchments: Comparison with near-continuous phosphorus time-series data","volume":"405","author":"Cassidy","year":"2011","journal-title":"J. Hydrol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Paepae, T., Bokoro, P.N., and Kyamakya, K. (2022). A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques. Sensors, 22.","DOI":"10.3390\/s22197338"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"7","DOI":"10.17159\/sajs.2015\/20140193","article-title":"Eutrophication and cyanobacteria in South Africa\u2019s standing water bodies: A view from space","volume":"111","author":"Matthews","year":"2015","journal-title":"S. Afr. J. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1016\/j.talanta.2014.09.045","article-title":"A low-cost autonomous optical sensor for water quality monitoring","volume":"132","author":"Murphy","year":"2015","journal-title":"Talanta"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"115490","DOI":"10.1016\/j.watres.2020.115490","article-title":"Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods","volume":"172","author":"Castrillo","year":"2020","journal-title":"Water Res."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1007\/s10661-020-08731-2","article-title":"Estimation of nitrogen and phosphorus concentrations from water quality surrogates using machine learning in the Tri An Reservoir, Vietnam","volume":"192","author":"Ha","year":"2020","journal-title":"Environ. Monit. Assess."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dilmi, S. (2022). Calcium Soft Sensor Based on the Combination of Support Vector Regression and 1-D Digital Filter for Water Quality Monitoring. Arab. J. Sci. Eng., 1\u201326.","DOI":"10.1007\/s13369-022-07263-w"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"17977","DOI":"10.1021\/acs.iecr.0c01942","article-title":"Novel Virtual Sample Generation Based on Locally Linear Embedding for Optimizing the Small Sample Problem: Case of Soft Sensor Applications","volume":"59","author":"Zhu","year":"2020","journal-title":"Ind. Eng. Chem. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/j.isatra.2020.10.006","article-title":"Novel manifold learning based virtual sample generation for optimizing soft sensor with small data","volume":"109","author":"Zhang","year":"2020","journal-title":"ISA Trans."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1016\/j.isatra.2021.07.033","article-title":"Enhanced virtual sample generation based on manifold features: Applications to developing soft sensor using small data","volume":"126","author":"He","year":"2022","journal-title":"ISA Trans."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1016\/j.compchemeng.2008.12.012","article-title":"Data-driven Soft Sensors in the process industry","volume":"33","author":"Kadlec","year":"2009","journal-title":"Comput. Chem. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.jprocont.2019.11.004","article-title":"Data supplement for a soft sensor using a new generative model based on a variational autoencoder and Wasserstein GAN","volume":"85","author":"Wang","year":"2020","journal-title":"J. Process Control"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1109\/JSEN.2021.3128562","article-title":"SVAE-WGAN-Based Soft Sensor Data Supplement Method for Process Industry","volume":"22","author":"Gao","year":"2022","journal-title":"IEEE Sens. J."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3296","DOI":"10.1109\/TNNLS.2019.2951708","article-title":"A Layer-Wise Data Augmentation Strategy for Deep Learning Networks and Its Soft Sensor Application in an Industrial Hydrocracking Process","volume":"32","author":"Yuan","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"107070","DOI":"10.1016\/j.asoc.2020.107070","article-title":"A virtual sample generation approach based on a modified conditional GAN and centroidal Voronoi tessellation sampling to cope with small sample size problems: Application to soft sensing for chemical process","volume":"101","author":"Chen","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"104497","DOI":"10.1016\/j.engappai.2021.104497","article-title":"Novel virtual sample generation using conditional GAN for developing soft sensor with small data","volume":"106","author":"Zhu","year":"2021","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2515910","DOI":"10.1109\/TIM.2021.3120135","article-title":"Novel Virtual Sample Generation Using Target-Relevant Autoencoder for Small Data-Based Soft Sensor","volume":"70","author":"Tian","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"30782","DOI":"10.1021\/acsomega.2c01747","article-title":"Horizontal Data Augmentation Strategy for Industrial Quality Prediction","volume":"7","author":"Gao","year":"2022","journal-title":"ACS Omega"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"13716","DOI":"10.1109\/TIE.2021.3139194","article-title":"Improving the Performance of Just-In-Time Learning-Based Soft Sensor Through Data Augmentation","volume":"69","author":"Jiang","year":"2022","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"116806","DOI":"10.1016\/j.watres.2021.116806","article-title":"Soft sensor predictor of E. coli concentration based on conventional monitoring parameters for wastewater disinfection control","volume":"191","author":"Foschi","year":"2021","journal-title":"Water Res."},{"key":"ref_28","unstructured":"Bowes, M.J., Gozzard, E., Newman, J., Loewenthal, M., Halliday, S., Skeffington, R., Jarvie, H., Wade, A., and Palmer-Felgate, E. (2015). Environmental Information Platform, NERC Environmental Information Data Centre."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"4323","DOI":"10.5194\/hess-16-4323-2012","article-title":"Hydrochemical processes in lowland rivers: Insights from in situ, high-resolution monitoring","volume":"16","author":"Wade","year":"2012","journal-title":"Hydrol. Earth Syst. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3388","DOI":"10.1002\/hyp.10453","article-title":"High-frequency water quality monitoring in an urban catchment: Hydrochemical dynamics, primary production and implications for the Water Framework Directive","volume":"29","author":"Halliday","year":"2015","journal-title":"Hydrol. Process."},{"key":"ref_31","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1561\/2200000056","article-title":"An Introduction to Variational Autoencoders","volume":"12","author":"Kingma","year":"2019","journal-title":"Found. Trends\u00ae Mach. Learn."},{"key":"ref_33","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"100172","DOI":"10.1016\/j.ese.2022.100172","article-title":"Data augmentation and machine learning techniques for control strategy development in bio-polymerization process","volume":"11","author":"Wei","year":"2022","journal-title":"Environ. Sci. Ecotechnol."},{"key":"ref_35","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"115350","DOI":"10.1016\/j.watres.2019.115350","article-title":"Soft detection of 5-day BOD with sparse matrix in city harbor water using deep learning techniques","volume":"170","author":"Ma","year":"2019","journal-title":"Water Res."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"143005","DOI":"10.1016\/j.scitotenv.2020.143005","article-title":"Prediction of stream nitrogen and phosphorus concentrations from high-frequency sensors using Random Forests Regression","volume":"763","author":"Harrison","year":"2021","journal-title":"Sci. Total Environ."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1023\/B:STCO.0000035301.49549.88","article-title":"A tutorial on support vector regression","volume":"14","author":"Smola","year":"2004","journal-title":"Stat. Comput."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_40","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhang, H., Zhang, L., and Jiang, Y. (2019, January 23\u201325). Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems. Proceedings of the 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), Xi\u2019an, China.","DOI":"10.1109\/WCSP.2019.8927876"},{"key":"ref_42","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv."},{"key":"ref_43","unstructured":"MathWorks (2022, December 04). Train Variational Autoencoder (VAE) to Generate Images. Available online: https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/train-a-variational-autoencoder-vae-to-generate-images.html#responsive_offcanvas."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"127001","DOI":"10.1016\/j.physa.2022.127001","article-title":"Quantum metrics based upon classical Jensen\u2013Shannon divergence","volume":"594","author":"Bussandri","year":"2022","journal-title":"Phys. A Stat. Mech. Its Appl."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1061\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:08:07Z","timestamp":1760119687000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,17]]},"references-count":44,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031061"],"URL":"https:\/\/doi.org\/10.3390\/s23031061","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,17]]}}}