{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T23:03:12Z","timestamp":1768345392161,"version":"3.49.0"},"reference-count":46,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Time series analysis is pivotal for business and financial decision making, especially with the increasing integration of the Internet of Things (IoT). However, leveraging time series data for forecasting requires extensive preprocessing to address challenges such as missing values, heteroscedasticity, seasonality, outliers, and noise. Different approaches are necessary for univariate and multivariate time series, Gaussian and non-Gaussian time series, and stationary versus non-stationary time series. Handling missing data alone is complex, demanding unique solutions for each type. Extracting statistical features, identifying data quality issues, and selecting appropriate cleaning and forecasting techniques require significant effort, time, and expertise. To streamline this process, we propose an automated strategy called Preptimize, which integrates statistical and machine learning techniques and recommends prediction model blueprints, suggesting the most suitable approaches for a given dataset as an initial step towards further analysis. Preptimize reads a sample from a large dataset and recommends the blueprint model based on optimization, making it easy to use even for non-experts. The results of various experiments indicated that Preptimize either outperformed or had comparable performance to benchmark models across multiple sectors, including stock prices, cryptocurrency, and power consumption prediction. This demonstrates the framework\u2019s effectiveness in recommending suitable prediction models for various time series datasets, highlighting its broad applicability across different domains in time series forecasting.<\/jats:p>","DOI":"10.3390\/a17080332","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T08:14:46Z","timestamp":1722500086000},"page":"332","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Preptimize: Automation of Time Series Data Preprocessing and Forecasting"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-7582-250X","authenticated-orcid":false,"given":"Mehak","family":"Usmani","sequence":"first","affiliation":[{"name":"Fast School of Computing, National University of Computer and Emerging Sciences, Karachi 65200, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6448-097X","authenticated-orcid":false,"given":"Zulfiqar Ali","family":"Memon","sequence":"additional","affiliation":[{"name":"Fast School of Computing, National University of Computer and Emerging Sciences, Karachi 65200, Pakistan"}]},{"given":"Adil","family":"Zulfiqar","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National University of Computer and Emerging Sciences, Faisalabad Campus, Faisalabad 38000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0039-982X","authenticated-orcid":false,"given":"Rizwan","family":"Qureshi","sequence":"additional","affiliation":[{"name":"Center for Regenerative Medicine and Health, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences, Science Park, Hong Kong SAR 999077, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"91896","DOI":"10.1109\/ACCESS.2021.3091162","article-title":"Forecast methods for time series data: A survey","volume":"9","author":"Liu","year":"2021","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.knosys.2012.05.003","article-title":"Hybridization of evolutionary Levenberg\u2013Marquardt neural networks and data pre-processing for stock market prediction","volume":"35","author":"Asadi","year":"2012","journal-title":"Knowl. Based Syst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Di Persio, L., and Fraccarolo, N. (2023). Energy consumption forecasts by gradient boosting regression trees. Mathematics, 11.","DOI":"10.3390\/math11051068"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Cryer, J.D., and Kellet, N. (2008). Time Series Analysis: With Applications in R, Springer. [2nd ed.].","DOI":"10.1007\/978-0-387-75959-3"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"110525","DOI":"10.1016\/j.enbuild.2020.110525","article-title":"Influence of data preprocessing on neural network performance for reproducing CFD simulations of non-isothermal indoor airflow distribution","volume":"230","author":"Zhou","year":"2021","journal-title":"Energy Build."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Song, S., Zhang, A., Wang, J., and Yu, P.S. (June, January 31). SCREEN: Stream data cleaning under-speed constraints. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia.","DOI":"10.1145\/2723372.2723730"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bilalli, B., Abell\u00f3, A., Aluja-Banet, T., and Wrembel, R. (2016, January 21\u201323). Automated data pre-processing via meta-learning. Proceedings of the International Conference on Model and Data Engineering, Almer\u00eda, Spain.","DOI":"10.1007\/978-3-319-45547-1_16"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.14778\/3115404.3115410","article-title":"Time series data cleaning: From anomaly detection to anomaly repairing","volume":"10","author":"Zhang","year":"2017","journal-title":"Proc. VLDB Endow."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1139\/cjfr-2016-0244","article-title":"Using Landsat time series imagery to detect forest disturbance in selectively logged tropical forests in Myanmar","volume":"47","author":"Shimizu","year":"2017","journal-title":"Can. J. For. Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1016\/j.isprsjprs.2017.06.013","article-title":"Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications","volume":"130","author":"Zhu","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1662","DOI":"10.1109\/ACCESS.2017.2779939","article-title":"LSTM fully convolutional networks for time series classification","volume":"6","author":"Karim","year":"2017","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gschwandtner, T., and Erhart, O. (2018, January 10\u201313). Know your enemy: Identifying quality problems of time series data. Proceedings of the IEEE Pacific Visualization Symposium (PacificVis), Kobe, Japan.","DOI":"10.1109\/PacificVis.2018.00034"},{"key":"ref_13","first-page":"37","article-title":"Time series outlier detection for short-term electricity load demand forecasting","volume":"2","author":"Jeenanunta","year":"2018","journal-title":"Int. Sci. J. Eng. Technol. (ISJET)"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1866","DOI":"10.1109\/ACCESS.2019.2962152","article-title":"Time series data cleaning: A survey","volume":"8","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1786","DOI":"10.14778\/3352063.3352066","article-title":"Cleanits: A data cleaning system for industrial time series","volume":"12","author":"Ding","year":"2019","journal-title":"Proc. VLDB Endow."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"113731","DOI":"10.1016\/j.eswa.2020.113731","article-title":"A time-series clustering methodology for knowledge extraction in energy consumption data","volume":"160","author":"Ruiz","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_17","unstructured":"Jarrett, D., Yoon, J., Bica, I., Qian, Z., Ercole, A., and Schaar, M.V.D. (2021, January 3\u20137). Clairvoyance: A Pipeline Toolkit for Medical Time Series. Proceedings of the International Conference on Learning Representations, Virtual."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Desai, V., and Dinesha, H.A. (2020, January 6\u20138). A Hybrid Approach to Data Pre-processing Methods. Proceedings of the IEEE International Conference for Innovation in Technology (INOCON), Bangalore, India.","DOI":"10.1109\/INOCON50539.2020.9298378"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sousa, R., Amado, C., and Henriques, R. (2020, January 29\u201330). AutoMTS: Fully autonomous processing of multivariate time series data from heterogeneous sensor networks. Proceedings of the International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness, Virtual.","DOI":"10.1007\/978-3-030-77569-8_12"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, X., Deng, L., Huang, F., Zhang, C., Zhang, Z., Zhao, Y., and Zheng, K. (2021, January 19\u201322). Daemon: Unsupervised anomaly detection and interpretation for multivariate time series. Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece.","DOI":"10.1109\/ICDE51399.2021.00228"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chauhan, K., Jani, S., Thakkar, D., Dave, R., Bhatia, J., Tanwar, S., and Obaidat, M.S. (2020, January 5\u20137). Automated machine learning: The new wave of machine learning. Proceedings of the 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India.","DOI":"10.1109\/ICIMIA48430.2020.9074859"},{"key":"ref_22","unstructured":"Sarafanov, M. (2023, September 01). AutoML for Time Series: Definitely a Good Idea. Available online: https:\/\/towardsdatascience.com\/automl-for-time-series-definitelya-good-idea-c51d39b2b3f."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"6692390","DOI":"10.1155\/2021\/6692390","article-title":"A Robust Data-Driven Method for Multiseasonality and Heteroscedasticity in Time Series Preprocessing","volume":"2021","author":"Sun","year":"2021","journal-title":"Wirel. Commun. Mob. Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1016\/j.ejor.2003.08.037","article-title":"Neural network forecasting for seasonal and trend time series","volume":"160","author":"Zhang","year":"2005","journal-title":"Eur. J. Oper. Res."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ranjan, K.G., Prusty, B.R., and Jena, D. (2019, January 29\u201331). Comparison of two data cleaning methods as applied to volatile time-series. Proceedings of the International Conference on Power Electronics Applications and Technology in Present Energy Scenario (PETPES), Mangalore, India.","DOI":"10.1109\/PETPES47060.2019.9004012"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"e2816","DOI":"10.1002\/jnm.2816","article-title":"An improved sliding window prediction-based outlier detection and correction for volatile time-series","volume":"34","author":"Ranjan","year":"2021","journal-title":"Int. J. Numer. Model. Electron. Netw. Devices Fields"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lv, P., Wu, Q., Xu, J., and Shu, Y. (2022). Stock Index Prediction Based on Time Series Decomposition and Hybrid Model. Entropy, 24.","DOI":"10.3390\/e24020146"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"104444","DOI":"10.1016\/j.chemolab.2021.104444","article-title":"Toward automated machine learning in vibrational spectroscopy: Use and settings of genetic algorithms for pre-processing and regression optimization","volume":"219","author":"Brunel","year":"2021","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_29","unstructured":"Kumar, S. (2023, August 01). 8 AutoML Libraries to Automate Machine Learning Pipeline. Available online: https:\/\/medium.com\/swlh\/8-automl-libraries-toautomate-machine-learning-pipeline-3da0af08f636."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Jang, W.-J., Lee, S.-T., Kim, J.-B., and Gim, G.-Y. (2019). A study on data profiling: Focusing on attribute value quality index. Appl. Sci., 9.","DOI":"10.3390\/app9235054"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ghaderpour, E., Pagiatakis, S.D., and Hassan, Q.K. (2021). A survey on change detection and time series analysis with applications. Appl. Sci., 11.","DOI":"10.3390\/app11136141"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/S0169-2070(03)00004-9","article-title":"Combining time series models for forecasting","volume":"20","author":"Zou","year":"2004","journal-title":"Int. J. Forecast."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"20200209","DOI":"10.1098\/rsta.2020.0209","article-title":"Time-series forecasting with deep learning: A survey","volume":"379","author":"Lim","year":"2021","journal-title":"Philos. Trans. R. Soc. A Math. Phys. Eng. Sci."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"3135","DOI":"10.1007\/s00521-021-06548-9","article-title":"A novel approach based on combining deep learning models with statistical methods for COVID-19 time series forecasting","volume":"34","author":"Abbasimehr","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_35","unstructured":"Brown, T.A. (2006). Confirmatory Factor Analysis for Applied Research, The Guilford Press."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Haroon, D. (2017). Time Series-Differencing. Python Machine Learning Case Studies: Five Case Studies for the Data Scientist, Apress.","DOI":"10.1007\/978-1-4842-2823-4"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1111\/j.1467-9892.1992.tb00121.x","article-title":"Empirical evidence on Dickey-Fuller-type tests","volume":"13","author":"Agiakloglou","year":"1992","journal-title":"J. Time Ser. Anal."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/3422622","article-title":"Generative adversarial networks","volume":"63","author":"Goodfellow","year":"2020","journal-title":"Commun. ACM"},{"key":"ref_39","unstructured":"Yoon, J., Jordon, J., and Schaar, M. (2018, January 10\u201315). Gain: Missing data imputation using generative adversarial nets. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_40","first-page":"1","article-title":"DataWig: Missing Value Imputation for Tables","volume":"20","author":"Biessmann","year":"2019","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Joenssen, D.W., and Bankhofer, U. (2012, January 13\u201320). Hot deck methods for imputing missing data. Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Berlin, Germany.","DOI":"10.1007\/978-3-642-31537-4_6"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Holland, J.H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press.","DOI":"10.7551\/mitpress\/1090.001.0001"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"McDowall, D., McCleary, R., and Bartos, B.J. (1980). Interrupted Time Series Analysis, SAGE. [21st ed.].","DOI":"10.4135\/9781412984607"},{"key":"ref_44","unstructured":"Dua, D., and Graff, C. (2019). UCI Machine Learning Repository, University of California, School of Information and Computer Science. Available online: http:\/\/archive.ics.uci.edu\/ml."},{"key":"ref_45","unstructured":"Sourav, D., Apan, P., Sayan, S., Sayan, G., Udatya, D., Chandra, D., and Shilpi, B. (2024). A Novel Hybrid Model Using Lstm and Rnn for Stock Market Prediction. Int. J. Eng. Res. Technol., 13."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1049\/cit2.12060","article-title":"Deep learning for time series forecasting: The electric load case","volume":"7","author":"Gasparin","year":"2022","journal-title":"CAAI Trans. Intell. Technol."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/17\/8\/332\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:27:44Z","timestamp":1760110064000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/17\/8\/332"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,1]]},"references-count":46,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["a17080332"],"URL":"https:\/\/doi.org\/10.3390\/a17080332","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,1]]}}}