{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T17:44:43Z","timestamp":1772819083703,"version":"3.50.1"},"reference-count":63,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T00:00:00Z","timestamp":1673481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Day by day pollution in cities is increasing due to urbanization. One of the biggest challenges posed by the rapid migration of inhabitants into cities is increased air pollution. Sustainable Development Goal 11 indicates that 99 percent of the world\u2019s urban population breathes polluted air. In such a trend of urbanization, predicting the concentrations of pollutants in advance is very important. Predictions of pollutants would help city administrations to take timely measures for ensuring Sustainable Development Goal 11. In data engineering, imputation and the removal of outliers are very important steps prior to forecasting the concentration of air pollutants. For pollution and meteorological data, missing values and outliers are critical problems that need to be addressed. This paper proposes a novel method called multiple iterative imputation using autoencoder-based long short-term memory (MIA-LSTM) which uses iterative imputation using an extra tree regressor as an estimator for the missing values in multivariate data followed by an LSTM autoencoder for the detection and removal of outliers present in the dataset. The preprocessed data were given to a multivariate LSTM for forecasting PM2.5 concentration. This paper also presents the effect of removing outliers and missing values from the dataset as well as the effect of imputing missing values in the process of forecasting the concentrations of air pollutants. The proposed method provides better results for forecasting with a root mean square error (RMSE) value of 9.8883. The obtained results were compared with the traditional gated recurrent unit (GRU), 1D convolutional neural network (CNN), and long short-term memory (LSTM) approaches for a dataset of the Aotizhonhxin area of Beijing in China. Similar results were observed for another two locations in China and one location in India. The results obtained show that imputation and outlier\/anomaly removal improve the accuracy of air pollution forecasting.<\/jats:p>","DOI":"10.3390\/a16010052","type":"journal-article","created":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T05:03:03Z","timestamp":1673499783000},"page":"52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Novel MIA-LSTM Deep Learning Hybrid Model with Data Preprocessing for Forecasting of PM2.5"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7266-4664","authenticated-orcid":false,"given":"Gaurav","family":"Narkhede","sequence":"first","affiliation":[{"name":"School of Electronics & Communication Engineering, MIT World Peace University, Pune 411038, India"}]},{"given":"Anil","family":"Hiwale","sequence":"additional","affiliation":[{"name":"School of Electronics & Communication Engineering, MIT World Peace University, Pune 411038, India"}]},{"given":"Bharat","family":"Tidke","sequence":"additional","affiliation":[{"name":"School of Computer Engineering & Technology, MIT World Peace University, Pune 411038, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4719-8734","authenticated-orcid":false,"given":"Chetan","family":"Khadse","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, MIT World Peace University, Pune 411038, India"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yang, Y., Bao, W., Li, Y., Wang, Y., and Chen, Z. (2020). Land Use Transition and Its Eco-Environmental Effects in the Beijing\u2013Tianjin\u2013Hebei Urban Agglomeration: A Production\u2013Living\u2013Ecological Perspective. Land, 9.","DOI":"10.3390\/land9090285"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"g1597","DOI":"10.1136\/bmj.g1597","article-title":"Delhi has overtaken Beijing as the world\u2019s most polluted city, report says","volume":"348","author":"Bagcchi","year":"2014","journal-title":"BMJ"},{"key":"ref_3","unstructured":"Hazlewood, W.R., and Coyle, L. (2011). On Ambient Information Systems: Challenges of Design and Evaluation. Ubiquitous Developments in Ambient Computing and Intelligence: Human-Centered Applications, IGI Global."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1000","DOI":"10.1016\/j.envpol.2017.11.016","article-title":"Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2.5 concentrations in Taiwan from 2005 to 2015","volume":"237","author":"Jung","year":"2018","journal-title":"Environ. Pollut."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"365","DOI":"10.5094\/APR.2015.040","article-title":"Anomaly detection and assessment of PM10 functional data at several locations in the Klang Valley, Malaysia","volume":"6","author":"Shaadan","year":"2015","journal-title":"Atmos. Pollut. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.ijepes.2016.03.020","article-title":"Conjugate gradient back-propagation based artificial neural network for real time power quality assessment","volume":"82","author":"Khadse","year":"2016","journal-title":"Int. J. Electr. Power Energy Syst."},{"key":"ref_7","first-page":"7","article-title":"Artificial Neural Network based Fault Detection System for 11 kV Transmission Line","volume":"1","author":"Pandey","year":"2021","journal-title":"IEEE Xplore"},{"key":"ref_8","unstructured":"Allison, P.D. (2001). Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, Sage."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Little, D.R. (2002). Rubin, Statistical Analysis with Missing Data, John Wiley and Sons.","DOI":"10.1002\/9781119013563"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/S0168-1923(99)00056-8","article-title":"Forest climatology: Estimation of missing values for Bavaria, Germany","volume":"96","author":"Xia","year":"1999","journal-title":"Agric. For. Meteorol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2895","DOI":"10.1016\/j.atmosenv.2004.02.026","article-title":"Methods for imputation of missing values in air quality data sets","volume":"38","author":"Junninen","year":"2004","journal-title":"Atmos. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"7316","DOI":"10.1016\/j.atmosenv.2006.06.040","article-title":"Single imputation method of missing values in environmental pollution data sets","volume":"40","author":"Plaia","year":"2006","journal-title":"Atmos. Environ."},{"key":"ref_13","first-page":"1","article-title":"Artificial Neural Network for the Prediction of Particulate Matter (PM2.5)","volume":"1","author":"Narkhede","year":"2021","journal-title":"IEEE"},{"key":"ref_14","first-page":"611","article-title":"Handling missing data in multivariate time series using a vector autoregressive model based imputation (VAR-IM) algorithm: Part I: VAR-IM algorithm versus traditional methods","volume":"1","author":"Bashir","year":"2016","journal-title":"IEEE"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"449","DOI":"10.17576\/jsm-2015-4403-17","article-title":"A Comparison of Various Imputation Methods for Missing Values in Air Quality Data","volume":"44","author":"Zainuri","year":"2015","journal-title":"Sains Malays."},{"key":"ref_16","unstructured":"Arai, K., Kapoor, S., and Bhatia, R. (2020). Liyanage, Comparison of Imputation Methods for Missing Values in Air Pollution Data: Case Study on Sydney Air Quality Index. Advances in Information and Communication. FICC 2020. Advances in Intelligent Systems and Computing, Springer."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Samal, K.K.R., Babu, K.S., and Das, S.K. (2021, January 19\u201321). A Neural Network Approach with Iterative Strategy for Long-term PM2.5 Forecasting. Proceedings of the 2021 IEEE 18th India Council International Conference (INDICON), Guwahati, India.","DOI":"10.1109\/INDICON52576.2021.9691552"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v045.i03","article-title":"Mice: Multivariate Imputation by Chained Equations in R","volume":"45","author":"Buuren","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Alsaber, A.R., and Pan, J.A. (2021). Al-Hurban, Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018). Int. J. Environ. Res. Public Health, 18.","DOI":"10.3390\/ijerph18031333"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Kim, T., Kim, J., Yang, W., Lee, H., and Choo, J. (2021). Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks. Int. J. Environ. Res. Public Health, 18.","DOI":"10.3390\/ijerph182212213"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1145\/126482.126486","article-title":"Handling missing data by using stored truth values","volume":"20","author":"Gessert","year":"1991","journal-title":"ACM SIGMOD Rec."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/S0933-3657(98)00027-X","article-title":"Treatment of missing data values in a neural network based decision support system for acute abdominal pain","volume":"13","author":"Pesonen","year":"1998","journal-title":"Artif. Intell. Med."},{"key":"ref_23","unstructured":"Caruana, R. (2001, January 4\u20137). An non-parametric EM-style algorithm for imputing missing values. Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA. Available online: https:\/\/proceedings.mlr.press\/r3\/caruana01a.html."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1109\/34.917578","article-title":"Minimal projective reconstruction including missing data","volume":"23","author":"Kahl","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1016\/j.jss.2010.11.887","article-title":"Missing data imputation by utilizing information within incomplete instances","volume":"84","author":"Zhang","year":"2011","journal-title":"J. Syst. Softw."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"619","DOI":"10.7717\/peerj-cs.619","article-title":"Advanced methods for missing values imputation based on similarity learning","volume":"7","author":"Fouad","year":"2021","journal-title":"PeerJ Comput. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"106070","DOI":"10.1016\/j.asoc.2020.106070","article-title":"Adaptive LSSVM based iterative prediction method for NOx concentration prediction in coal-fired power plant considering system delay","volume":"89","author":"Zhai","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"38155","DOI":"10.1007\/s11356-020-09855-1","article-title":"An ensemble learning based hybrid model and framework for air pollution forecasting","volume":"27","author":"Chang","year":"2020","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Samal, K., Babu, K., and Das, S. (2018). Spatio-temporal Prediction of Air Quality using Distance Based Interpolation and Deep Learning Techniques. EAI Endorsed Trans. Smart Cities.","DOI":"10.4108\/eai.15-1-2021.168139"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Samal, K.K.R., Babu, K.S., and Das, S.K. (2021, January 19\u201321). Time Series Forecasting of Air Pollution using Deep Neural Net-work with Multi-output Learning. Proceedings of the 2021 IEEE 18th India Council International Conference (INDICON), Guwahati, India.","DOI":"10.1109\/INDICON52576.2021.9691669"},{"key":"ref_31","unstructured":"Samal, K.K., Babu, K., Panda, A.K., and Das, S.K. (2020, January 10\u201313). Data Driven Multivariate Air Quality Forecasting using Dynamic Fine Tuning Autoencoder Layer. Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Mahajan, S., Kumar, B., and Pant, U.K. (2020, January 26\u201327). Tiwari, Incremental Outlier Detection in Air Quality Data Using Statistical Methods. Proceedings of the 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Sakheer, Bahrain.","DOI":"10.1109\/ICDABI51230.2020.9325683"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, Z., Peng, Z., Zou, X., Sun, H., Lu, W., Zhang, Y., Wen, W., Yan, H., and Li, C. (2022). Deep Learning Based Anomaly Detection for Muti-dimensional Time Series: A Survey. Cyber Security, Springer. CNCERT 2021.","DOI":"10.1007\/978-981-16-9229-1_5"},{"key":"ref_34","unstructured":"Zhang, C., Li, S., Zhang, H., and Chen, Y. (2019). VELC: A New Variational AutoEncoder Based Model for Time Series Anomaly Detection. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Provotar, O.I., Linder, Y.M., and Veres, M.M. (2019, January 18\u201320). Unsupervised Anomaly Detection in Time Series Using LSTM-Based Autoencoders. Proceedings of the 2019 IEEE International Conference on Advanced Trends in Information Theory (ATIT), Kyiv, Ukraine.","DOI":"10.1109\/ATIT49449.2019.9030505"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"117859","DOI":"10.1016\/j.envpol.2021.117859","article-title":"Fathnia, Spatio-temporal modeling of PM2.5 risk mapping using three machine learning algorithms","volume":"289","author":"Shogrkhodaei","year":"2021","journal-title":"Environ. Pollut."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Pun, T.B., and Shahi, T.B. (2018, January 9\u201310). Nepal Stock Exchange Prediction Using Support Vector Regression and Neural Networks. Proceedings of the 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bangalore, India.","DOI":"10.1109\/ICAECC.2018.8479456"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1615","DOI":"10.1121\/1.395916","article-title":"Learning the hidden structure of speech","volume":"83","author":"Elman","year":"1988","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/91.660809","article-title":"Fuzzy finite-state automata can be deterministically encoded into recurrent neural networks","volume":"6","author":"Omlin","year":"1998","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chandra, R., Jain, A., and Chauhan, D.S. (2022). Deep learning via LSTM models for COVID-19 infection forecasting in India. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0262708"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shahi, T.B., Shrestha, A., Neupane, A., and Guo, W. (2020). Stock Price Forecasting with Deep Learning: A Comparative Study. Mathematics, 8.","DOI":"10.3390\/math8091441"},{"key":"ref_42","first-page":"6596397","article-title":"A Review on Deep Sequential Models for Forecasting Time Series Data","volume":"2022","author":"Ahmed","year":"2022","journal-title":"Appl. Comput. Intell. Soft Comput."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Branco, N.W., Cavalca, M.S.M., Stefenon, S.F., and Leithardt, V.R.Q. (2022). Wavelet LSTM for Fault Forecasting in Electrical Power Grids. Sensors, 22.","DOI":"10.20944\/preprints202210.0004.v1"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Neto, N.F.S., Stefenon, S.F., Meyer, L.H., Ovejero, R.G., and Leithardt, V.R.Q. (2022). Fault Prediction Based on Leakage Current in Contaminated Insulators Using Enhanced Time Series Forecasting Models. Sensors, 22.","DOI":"10.3390\/s22166121"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"732","DOI":"10.3390\/forecast4030040","article-title":"Evaluating State-of-the-Art, Forecasting Ensembles and Meta-Learning Strategies for Model Fusion","volume":"4","author":"Cawood","year":"2022","journal-title":"Forecasting"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"107584","DOI":"10.1016\/j.epsr.2021.107584","article-title":"Time series forecasting using ensemble learning methods for emergency prevention in hydroelectric power plants with dam","volume":"202","author":"Stefenon","year":"2021","journal-title":"Electr. Power Syst. Res."},{"key":"ref_47","unstructured":"Tiwari, A., Gupta, R., and Chandra, R. (2021). Delhi air quality prediction using LSTM deep learning models with a focus on COVID-19 lockdown. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1007\/s12647-020-00371-8","article-title":"A Review of Air Quality Modeling","volume":"35","author":"Karroum","year":"2020","journal-title":"Mapan"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"101019","DOI":"10.1016\/j.ecoinf.2019.101019","article-title":"Predicting air quality with deep learning LSTM: Towards comprehensive models","volume":"55","author":"Navares","year":"2019","journal-title":"Ecol. Inform."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/s11869-020-00795-w","article-title":"A novel hybrid model for multi-step daily AQI forecasting driven by air pollution big data","volume":"13","author":"Xu","year":"2020","journal-title":"Air Qual. Atmos. Health"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Zheng, J., Wang, Y., Li, S., and Chen, H. (2021). The Stock Index Prediction Based on SVR Model with Bat Optimization Algorithm. Algorithms, 14.","DOI":"10.3390\/a14100299"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"106620","DOI":"10.1016\/j.asoc.2020.106620","article-title":"A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting","volume":"96","author":"Du","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1080\/10962247.2019.1577314","article-title":"Detection of anomalous nitrogen dioxide (NO2) concentration in urban air of India using proximity and clustering methods","volume":"69","author":"Aggarwal","year":"2019","journal-title":"J. Air Waste Manag. Assoc."},{"key":"ref_54","first-page":"661","article-title":"A new method for prediction of air pollution based on intelligent computation","volume":"24","author":"Mohammad","year":"2019","journal-title":"Soft Comput."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Xayasouk, T., Lee, H., and Lee, G. (2020). Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models. Sustainability, 12.","DOI":"10.3390\/su12062570"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Kalajdjieski, J., Zdravevski, E., Corizzo, R., Lameski, P., Kalajdziski, S., Pires, I., Garcia, N., and Trajkovik, V. (2020). Air Pollution Prediction with Multi-Modal Data and Deep Neural Networks. Remote. Sens., 12.","DOI":"10.3390\/rs12244142"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"235","DOI":"10.3390\/signals3020015","article-title":"Applying and Comparing LSTM and ARIMA to Predict CO Levels for a Time-Series Measurements in a Port Area","volume":"3","author":"Spyrou","year":"2022","journal-title":"Signals"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Dey, P., Emam, H., Md, H., Mohammed, C., Md, A., and Andersson, H.K.M. (2021). Comparative Analysis of Recurrent Neural Networks in Stock Price Prediction for Different Frequency Domains. Algorithms, 14.","DOI":"10.3390\/a14080251"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Ding, W., and Zhu, Y. (2022). Prediction of PM2.5 Concentration in Ningxia Hui Autonomous Region Based on PCA-Attention-LSTM. Atmosphere, 13.","DOI":"10.3390\/atmos13091444"},{"key":"ref_60","unstructured":"Chen, S.X. (2022, March 01). Beijing Multi-Site Air-Quality Data Data Set. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/Beijing+Multi-Site+Air-Quality+Data."},{"key":"ref_61","unstructured":"CPCB (2022, March 10). Air Pollution. Available online: https:\/\/cpcb.nic.in\/air-pollution."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"102282","DOI":"10.1016\/j.ijinfomgt.2020.102282","article-title":"Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management","volume":"57","author":"Nguyen","year":"2020","journal-title":"Int. J. Inf. Manag."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"034520","DOI":"10.1117\/1.JRS.15.034520","article-title":"Deep learning-based framework for spatiotemporal data fusion: An instance of Landsat 8 and Sentinel 2 NDVI","volume":"15","author":"Mishra","year":"2021","journal-title":"J. Appl. Remote. Sens."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:04:11Z","timestamp":1760119451000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,12]]},"references-count":63,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["a16010052"],"URL":"https:\/\/doi.org\/10.3390\/a16010052","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,12]]}}}