{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T01:25:08Z","timestamp":1778635508654,"version":"3.51.4"},"reference-count":48,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T00:00:00Z","timestamp":1651795200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted with electronics, sensors, and network connectivity. IoT sensor networks have become integral aspects of environmental monitoring systems. However, data collected from IoT sensor devices are usually incomplete due to various reasons such as sensor failures, drifts, network faults and various other operational issues. The presence of incomplete or missing values can substantially affect the calibration of on-field environmental sensors. The aim of this study is to identify efficient missing data imputation techniques that will ensure accurate calibration of sensors. To achieve this, we propose an efficient and robust imputation technique based on k-means clustering that is capable of selecting the best imputation technique for missing data imputation. We then evaluate the accuracy of our proposed technique against other techniques and test their effect on various calibration processes for data collected from on-field low-cost environmental sensors in urban air pollution monitoring stations. To test the efficiency of the imputation techniques, we simulated missing data rates at 10\u201340% and also considered missing values occurring over consecutive periods of time (1 day, 1 week and 1 month). Overall, our proposed BFMVI model recorded the best imputation accuracy (0.011758 RMSE for 10% missing data and 0.169418 RMSE at 40% missing data) compared to the other techniques (kNearest-Neighbour (kNN), Regression Imputation (RI), Expectation Maximization (EM) and MissForest techniques) when evaluated using different performance indicators. Moreover, the results show a trade-off between imputation accuracy and computational complexity with benchmark techniques showing a low computational complexity at the expense of accuracy when compared with our proposed technique.<\/jats:p>","DOI":"10.3390\/fi14050143","type":"journal-article","created":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T14:49:38Z","timestamp":1651848578000},"page":"143","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Missing Data Imputation in the Internet of Things Sensor Networks"],"prefix":"10.3390","volume":"14","author":[{"given":"Benjamin","family":"Agbo","sequence":"first","affiliation":[{"name":"Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1920-7418","authenticated-orcid":false,"given":"Hussain","family":"Al-Aqrabi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0105-7730","authenticated-orcid":false,"given":"Richard","family":"Hill","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6004-3756","authenticated-orcid":false,"given":"Tariq","family":"Alsboui","sequence":"additional","affiliation":[{"name":"Department of Computer Science, School of Computing and Engineering, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1016\/j.future.2021.06.042","article-title":"MPdist-based missing data imputation for supporting big data analyses in IoT-based applications","volume":"125","author":"Lee","year":"2021","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Al-Aqrabi, H., Johnson, A.P., Hill, R., Lane, P., and Alsboui, T. (2020). Hardware-intrinsic multi-layer security: A new frontier for 5G enabled IIoT. Sensors, 20.","DOI":"10.3390\/s20071963"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Al-Aqrabi, H., Liu, L., Hill, R., and Antonopoulos, N. (2014, January 20\u201322). A multi-layer hierarchical inter-cloud connectivity model for sequential packet inspection of tenant sessions accessing BI as a service. Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and System (HPCC, CSS, ICESS), Paris, France.","DOI":"10.1109\/HPCC.2014.83"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Al-Aqrabi, H., Hill, R., Lane, P., and Aagela, H. (2019, January 22). Securing manufacturing intelligence for the industrial internet of things. Proceedings of the Fourth International Congress on Information and Communication Technology, Singapore.","DOI":"10.1007\/978-981-32-9343-4_21"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1016\/j.snb.2007.09.060","article-title":"On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario","volume":"129","author":"Massera","year":"2008","journal-title":"Sens. Actuators B Chem."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1504\/IJEP.2005.007664","article-title":"Evaluation of turbulence from traffic using experimental data obtained in a street canyon","volume":"25","author":"Mazzeo","year":"2005","journal-title":"Int. J. Environ. Pollut."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"107135","DOI":"10.1016\/j.buildenv.2020.107135","article-title":"Imputing missing indoor air quality data via variational convolutional autoencoders: Implications for ventilation management of subway metro systems","volume":"182","author":"Heo","year":"2020","journal-title":"Build. Environ."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1624","DOI":"10.1109\/TITS.2019.2910295","article-title":"Traffic flow imputation using parallel data and generative adversarial networks","volume":"21","author":"Chen","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sanjar, K., Bekhzod, O., Kim, J., Paul, A., and Kim, J. (2020). Missing data imputation for geolocation-based price prediction using KNN\u2013mcf method. ISPRS Int. J. Geo-Inf., 9.","DOI":"10.3390\/ijgi9040227"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.13063\/2327-9214.1035","article-title":"Strategies for handling missing data in electronic health record derived data","volume":"1","author":"Wells","year":"2013","journal-title":"Egems"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ehrlinger, L., Grubinger, T., Varga, B., Pichler, M., Natschl\u00e4ger, T., and Zeindl, J. (2018, January 24\u201326). Treating missing data in industrial data analytics. Proceedings of the 2018 Thirteenth International Conference on Digital Information Management (ICDIM), Berlin, Germany.","DOI":"10.1109\/ICDIM.2018.8846984"},{"key":"ref_12","unstructured":"Read, S.H. (2015). Applying Missing Data Methods to Routine Data Using the Example of a Population-Based Register of Patients with Diabetes. [Ph.D. Thesis, University of Edinburgh]."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"63279","DOI":"10.1109\/ACCESS.2018.2877269","article-title":"A survey on data imputation techniques: Water distribution system as a use case","volume":"6","author":"Osman","year":"2018","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1146\/annurev.psych.58.110405.085530","article-title":"Missing data analysis: Making it work in the real world","volume":"60","author":"Graham","year":"2009","journal-title":"Annu. Rev. Psychol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1002\/mpr.329","article-title":"Multiple imputation by chained equations: What is it and how does it work?","volume":"20","author":"Azur","year":"2011","journal-title":"Int. J. Methods Psychiatr. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/j.trc.2018.11.003","article-title":"A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation","volume":"98","author":"Chen","year":"2019","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1093\/bioinformatics\/btr597","article-title":"MissForest\u2014non-parametric missing value imputation for mixed-type data","volume":"28","author":"Stekhoven","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2345","DOI":"10.1007\/s11063-019-10012-0","article-title":"Artificial neural networks with random weights for incomplete datasets","volume":"50","author":"Mesquita","year":"2019","journal-title":"Neural Process. Lett."},{"key":"ref_19","unstructured":"Snow, D. (2022, May 02). MTSS-GAN: Multivariate Time Series Simulation Generative Adversarial Networks. Available online: https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=3616557."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2820","DOI":"10.1109\/TII.2019.2951622","article-title":"Supervised variational autoencoders for soft sensor modeling with missing data","volume":"16","author":"Xie","year":"2019","journal-title":"IEEE Trans. Ind. Inf."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"102051","DOI":"10.1016\/j.artmed.2021.102051","article-title":"Data imputation and compression for Parkinson\u2019s disease clinical questionnaires","volume":"114","author":"Peralta","year":"2021","journal-title":"Artif. Intell. Med."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv.","DOI":"10.18653\/v1\/K16-1002"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Agbo, B., Qin, Y., and Hill, R. (2020, January 7\u20139). Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things. Proceedings of the 5th International Conference on Internet of Things, Big Data and Security (IoTBDS 2020), Prague, Czech Republic. Available online: https:\/\/www.scitepress.org\/Papers\/2020\/95782\/95782.pdf.","DOI":"10.5220\/0009578201300137"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"22833","DOI":"10.1109\/JSEN.2021.3105442","article-title":"Missing Data Imputation on IoT Data Networks: Implications for On-site Sensor Calibration","volume":"21","author":"Okafor","year":"2021","journal-title":"IEEE Sens. J."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Little, R.J., and Rubin, D.B. (2019). Statistical Analysis with Missing Data, John Wiley & Sons. Available online: https:\/\/www.wiley.com\/en-us\/Statistical+Analysis+with+Missing+Data%2C+3rd+Edition-p-9780470526798.","DOI":"10.1002\/9781119482260"},{"key":"ref_26","unstructured":"Bashir, F. (2019). Handling of Missing Values in Static and Dynamic Data Sets. [PhD Thesis, University of Sheffield]. Available online: https:\/\/etheses.whiterose.ac.uk\/23283\/."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Alsaber, A.R., Pan, J., and Al-Hurban, A. (2021). Handling complex missing data using random forest approach for an air quality monitoring dataset: A case study of Kuwait environmental data (2012 to 2018). Int. J. Environ. Res. Public Health, 18.","DOI":"10.3390\/ijerph18031333"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_29","first-page":"1089","article-title":"Interval Fuzzy C-means Approach for Incomplete Data Clustering Based on Neural Networks","volume":"19","author":"Zhang","year":"2018","journal-title":"J. Internet Technol."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1057\/jors.1996.21","article-title":"Estimating missing values using neural networks","volume":"47","author":"Gupta","year":"1996","journal-title":"J. Oper. Res. Soc."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1016\/j.neucom.2014.02.037","article-title":"A new online data imputation method based on general regression auto associative neural network","volume":"138","author":"Ravi","year":"2014","journal-title":"Neurocomputing"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Guastella, D.A., Marcillaud, G., and Valenti, C. (2021). Edge-based missing data imputation in large-scale environments. Information, 12.","DOI":"10.3390\/info12050195"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.snb.2015.03.031","article-title":"Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: Ozone and nitrogen dioxide","volume":"215","author":"Spinelle","year":"2015","journal-title":"Sens. Actuators B Chem."},{"key":"ref_34","unstructured":"(2022, February 02). UCI Air Quality Data Set. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/air+quality."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.patrec.2017.08.019","article-title":"Dynamic time warping-based imputation for univariate time series data","volume":"139","author":"Phan","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1007\/BF02295842","article-title":"An EM algorithm for fitting two-level structural equation models","volume":"69","author":"Liang","year":"2004","journal-title":"Psychometrika"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1111\/j.0006-341X.1999.00463.x","article-title":"Finite mixture modeling with mixture outcomes using the EM algorithm","volume":"55","author":"Shedden","year":"1999","journal-title":"Biometrics"},{"key":"ref_38","unstructured":"Neale, M.C., Boker, S.M., Xie, G., and Maes, H.M. (1999). Statistical Modeling, Department of Psychiatry, Virginia Commonwealth University. Available online: http:\/\/ftp.vcu.edu\/pub\/mx\/doc\/mxmang10.pdf."},{"key":"ref_39","unstructured":"Raudenbush, S.W., and Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, SAGE. Available online: https:\/\/us.sagepub.com\/en-us\/nam\/hierarchical-linear-models\/book9230."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Neal, R.M., and Hinton, G.E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, Springer. Available online: https:\/\/link.springer.com\/chapter\/10.1007\/978-94-011-5014-9_12.","DOI":"10.1007\/978-94-011-5014-9_12"},{"key":"ref_41","first-page":"126","article-title":"A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models","volume":"4","author":"Bilmes","year":"1998","journal-title":"Int. Comput. Sci. Inst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.knosys.2016.06.012","article-title":"kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data","volume":"117","author":"Maillo","year":"2017","journal-title":"Knowl. Based Syst."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1080\/02331930902878333","article-title":"A Euclidean distance-based measure of efficiency in data envelopment analysis","volume":"59","author":"Amirteimoori","year":"2010","journal-title":"Optimization"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-021-00516-9","article-title":"A Survey On Missing Data in Machine Learning","volume":"8","author":"Emmanuel","year":"2021","journal-title":"J. Big Data"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2160","DOI":"10.1109\/JSYST.2015.2423499","article-title":"A High-Order Possibilistic C-Means Algorithm for Clustering Incomplete Multimedia Data","volume":"11","author":"Zhang","year":"2015","journal-title":"IEEE Syst. J."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1610","DOI":"10.1109\/JSYST.2016.2576026","article-title":"Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems","volume":"12","author":"Zhao","year":"2018","journal-title":"IEEE Syst. J."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1142\/9789812791245_0002","article-title":"The Running Time of an Algorithm","volume":"13","author":"Maresca","year":"2003","journal-title":"Ser. Softw. Eng. Knowl. Eng."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/5\/143\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:06:53Z","timestamp":1760137613000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/5\/143"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,6]]},"references-count":48,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["fi14050143"],"URL":"https:\/\/doi.org\/10.3390\/fi14050143","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,6]]}}}