{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T10:24:46Z","timestamp":1775471086548,"version":"3.50.1"},"reference-count":62,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T00:00:00Z","timestamp":1740700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mathematics"],"abstract":"<jats:p>Accurate demand forecasting is essential for retail operations as it directly impacts supply chain efficiency, inventory management, and financial performance. However, forecasting retail time series presents significant challenges due to their irregular patterns, hierarchical structures, and strong dependence on external factors such as promotions, pricing strategies, and socio-economic conditions. This study evaluates the effectiveness of Transformer-based architectures, specifically Vanilla Transformer, Informer, Autoformer, ETSformer, NSTransformer, and Reformer, for probabilistic time series forecasting in retail. A key focus is the integration of explanatory variables, such as calendar-related indicators, selling prices, and socio-economic factors, which play a crucial role in capturing demand fluctuations. This study assesses how incorporating these variables enhances forecast accuracy, addressing a research gap in the comprehensive evaluation of explanatory variables within multiple Transformer-based models. Empirical results, based on the M5 dataset, show that incorporating explanatory variables generally improves forecasting performance. Models leveraging these variables achieve up to 12.4% reduction in Normalized Root Mean Squared Error (NRMSE) and 2.9% improvement in Mean Absolute Scaled Error (MASE) compared to models that rely solely on past sales. Furthermore, probabilistic forecasting enhances decision making by quantifying uncertainty, providing more reliable demand predictions for risk management. These findings underscore the effectiveness of Transformer-based models in retail forecasting and emphasize the importance of integrating domain-specific explanatory variables to achieve more accurate, context-aware predictions in dynamic retail environments.<\/jats:p>","DOI":"10.3390\/math13050814","type":"journal-article","created":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T10:12:33Z","timestamp":1740737553000},"page":"814","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Transformer-Based Models for Probabilistic Time Series Forecasting with Explanatory Variables"],"prefix":"10.3390","volume":"13","author":[{"given":"Ricardo","family":"Caetano","sequence":"first","affiliation":[{"name":"ISCAP, Polytechnic of Porto, Rua Jaime Lopes Amorim s\/n, 4465-004 S\u00e3o Mamede de Infesta, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8516-6418","authenticated-orcid":false,"given":"Jos\u00e9 Manuel","family":"Oliveira","sequence":"additional","affiliation":[{"name":"Institute for Systems and Computer Engineering, Technology and Science, Campus da FEUP, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"},{"name":"Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0959-8446","authenticated-orcid":false,"given":"Patr\u00edcia","family":"Ramos","sequence":"additional","affiliation":[{"name":"Institute for Systems and Computer Engineering, Technology and Science, Campus da FEUP, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"},{"name":"CEOS.PP, ISCAP, Polytechnic of Porto, Rua Jaime Lopes Amorim s\/n, 4465-004 S\u00e3o Mamede de Infesta, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/j.ijforecast.2021.11.001","article-title":"Forecasting: Theory and practice","volume":"38","author":"Petropoulos","year":"2022","journal-title":"Int. J. Forecast."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1016\/j.ijforecast.2019.06.004","article-title":"Retail forecasting: Research and practice","volume":"38","author":"Fildes","year":"2022","journal-title":"Int. J. Forecast."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Oliveira, J.M., and Ramos, P. (2019). Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector. Entropy, 21.","DOI":"10.3390\/e21040436"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2450001","DOI":"10.1142\/S0218213024500015","article-title":"Retail Demand Forecasting: A Multivariate Approach and Comparison of Boosting and Deep Learning Methods","volume":"33","author":"Theodoridis","year":"2024","journal-title":"Int. J. Artif. Intell. Tools"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ramos, P., and Oliveira, J.M. (2016). A procedure for identification of appropriate state space and ARIMA models based on time-series cross-validation. Algorithms, 9.","DOI":"10.3390\/a9040076"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3533382","article-title":"Deep Learning for Time Series Forecasting: Tutorial and Literature Survey","volume":"55","author":"Benidis","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ramos, P., and Oliveira, J.M. (2023). Robust Sales Forecasting Using Deep Learning with Static and Dynamic Covariates. Appl. Syst. Innov., 6.","DOI":"10.20944\/preprints202308.0427.v1"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1016\/j.ijforecast.2020.07.007","article-title":"Kaggle forecasting competitions: An overlooked learning opportunity","volume":"37","author":"Bojer","year":"2021","journal-title":"Int. J. Forecast."},{"key":"ref_9","unstructured":"Iliadis, L., Maglogiannis, I., Alonso, S., Jayne, C., and Pimenidis, E. (2023, January 14\u201317). Cross-Learning-Based Sales Forecasting Using Deep Learning via Partial Pooling from Multi-level Data. Proceedings of the Engineering Applications of Neural Networks, Le\u00f3n, Spain."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2659","DOI":"10.3390\/make6040128","article-title":"Enhancing Hierarchical Sales Forecasting with Promotional Data: A Comparative Study Using ARIMA and Deep Neural Networks","volume":"6","author":"Teixeira","year":"2024","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Oliveira, J.M., and Ramos, P. (2023). Investigating the Accuracy of Autoregressive Recurrent Networks Using Hierarchical Aggregation Structure-Based Data Partitioning. Big Data Cogn. Comput., 7.","DOI":"10.20944\/preprints202304.0222.v1"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"122666","DOI":"10.1016\/j.eswa.2023.122666","article-title":"A comprehensive survey on applications of transformers for deep learning tasks","volume":"241","author":"Islam","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Oliveira, J.M., and Ramos, P. (2024). Evaluating the Effectiveness of Time Series Transformers for Demand Forecasting in Retail. Mathematics, 12.","DOI":"10.3390\/math12172728"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1089\/big.2020.0159","article-title":"Deep Learning for Time Series Forecasting: A Survey","volume":"9","author":"Torres","year":"2021","journal-title":"Big Data"},{"key":"ref_15","first-page":"462","article-title":"Sales Demand Forecast in E-commerce Using a Long Short-Term Memory Neural Network Methodology","volume":"Volume 11955","author":"Bandara","year":"2019","journal-title":"Proceedings of the Neural Information Processing, ICONIP 2019"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"108358","DOI":"10.1016\/j.compeleceng.2022.108358","article-title":"A hybrid deep learning framework with CNN and Bi-directional LSTM for store item demand forecasting","volume":"103","author":"Joseph","year":"2022","journal-title":"Comput. Electr. Eng."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"565","DOI":"10.3390\/forecast4020031","article-title":"Deep Learning for Demand Forecasting in the Fashion and Apparel Retail Industry","volume":"4","author":"Giri","year":"2022","journal-title":"Forecasting"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"537","DOI":"10.3390\/telecom5030028","article-title":"Bi-GRU-APSO: Bi-Directional Gated Recurrent Unit with Adaptive Particle Swarm Optimization Algorithm for Sales Forecasting in Multi-Channel Retail","volume":"5","author":"Kollu","year":"2024","journal-title":"Telecom"},{"key":"ref_19","unstructured":"Arai, K. (2024). Deep Learning Models for Inventory Decisions: A Comparative Analysis. Proceedings of the Intelligent Systems and Applications, Springer."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1278","DOI":"10.1002\/for.3073","article-title":"Hybrid convolutional long short-term memory models for sales forecasting in retail","volume":"43","author":"Yuan","year":"2024","journal-title":"J. Forecast."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"125066","DOI":"10.1016\/j.eswa.2024.125066","article-title":"Unveiling consumer preferences: A two-stage deep learning approach to enhance accuracy in multi-channel retail sales forecasting","volume":"257","author":"Wu","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"125313","DOI":"10.1016\/j.eswa.2024.125313","article-title":"Predicting demand for new products in fashion retailing using censored data","volume":"259","author":"Sousa","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1016\/j.ejor.2014.02.022","article-title":"The value of competitive information in forecasting FMCG retail product sales and the variable selection problem","volume":"237","author":"Huang","year":"2014","journal-title":"Eur. J. Oper. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.dss.2018.08.010","article-title":"Exploring the use of deep neural networks for sales forecasting in fashion retail","volume":"114","author":"Loureiro","year":"2018","journal-title":"Decis. Support Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"4964","DOI":"10.1080\/00207543.2020.1735666","article-title":"Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail","volume":"58","author":"Punia","year":"2020","journal-title":"Int. J. Prod. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1748","DOI":"10.1016\/j.ijforecast.2021.03.012","article-title":"Temporal Fusion Transformers for interpretable multi-horizon time series forecasting","volume":"37","author":"Lim","year":"2021","journal-title":"Int. J. Forecast."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"107965","DOI":"10.1016\/j.cie.2022.107965","article-title":"Considering economic indicators and dynamic channel interactions to conduct sales forecasting for retail sectors","volume":"165","author":"Wang","year":"2022","journal-title":"Comput. Ind. Eng."},{"key":"ref_28","first-page":"2857850","article-title":"Deep Learning Based Purchase Forecasting for Food Producer-Retailer Team Merchandising","volume":"2022","author":"Kao","year":"2022","journal-title":"Sci. Program."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ramos, P., Oliveira, J.M., Kourentzes, N., and Fildes, R. (2023). Forecasting Seasonal Sales with Many Drivers: Shrinkage or Dimensionality Reduction?. Appl. Syst. Innov., 6.","DOI":"10.3390\/asi6010003"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"109956","DOI":"10.1016\/j.knosys.2022.109956","article-title":"Predictive analytics for demand forecasting: A deep learning-based decision support system","volume":"258","author":"Punia","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Nasseri, M., Falatouri, T., Brandtner, P., and Darbanian, F. (2023). Applying Machine Learning in Retail Demand Prediction\u2014A Comparison of Tree-Based Ensembles and Long Short-Term Memory-Based Deep Learning. Appl. Sci., 13.","DOI":"10.3390\/app131911112"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1016\/j.ejor.2023.10.039","article-title":"Simplifying tree-based methods for retail sales forecasting with explanatory variables","volume":"314","author":"Wellens","year":"2024","journal-title":"Eur. J. Oper. Res."},{"key":"ref_33","first-page":"735","article-title":"A Hybrid Deep Learning Based Deep Prophet Memory Neural Network Approach for Seasonal Items Demand Forecasting","volume":"15","author":"Praveena","year":"2024","journal-title":"J. Adv. Inf. Technol."},{"key":"ref_34","unstructured":"Wen, R., Torkkola, K., Narayanaswamy, B., and Madeka, D. (2018). A Multi-Horizon Quantile Recurrent Forecaster. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1016\/j.ijforecast.2019.07.001","article-title":"DeepAR: Probabilistic forecasting with autoregressive recurrent networks","volume":"36","author":"Salinas","year":"2020","journal-title":"Int. J. Forecast."},{"key":"ref_36","unstructured":"Rasul, K., Seward, C., Schuster, I., and Vollgraf, R. (2021, January 18\u201324). Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting. Proceedings of the 38th International Conference on Machine Learning, Online."},{"key":"ref_37","unstructured":"Rasul, K., Sheikh, A.S., Schuster, I., Bergmann, U., and Vollgraf, R. (2021). Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows. arXiv."},{"key":"ref_38","first-page":"6404","article-title":"Probabilistic Forecasting: A Level-Set Approach","volume":"Volume 34","author":"Ranzato","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_39","unstructured":"Meila, M., and Zhang, T. (2021, January 18\u201324). End-to-End Learning of Coherent Probabilistic Forecasts for Hierarchical Time Series. Proceedings of the 38th International Conference on Machine Learning, Online. PMLR; Proceedings of Machine Learning Research."},{"key":"ref_40","unstructured":"Kan, K., Aubet, F.X., Januschowski, T., Park, Y., Benidis, K., Ruthotto, L., and Gasthaus, J. (2022, January 28\u201330). Multivariate Quantile Function Forecaster. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, Virtual. PMLR; Proceedings of Machine Learning Research."},{"key":"ref_41","unstructured":"Shchur, O., Turkmen, C., Erickson, N., Shen, H., Shirkov, A., Hu, T., and Wang, Y. (2023, January 12\u201315). AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting. Proceedings of the International Conference on Automated Machine Learning, Potsdam, Germany. PMLR."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"119410","DOI":"10.1016\/j.ins.2023.119410","article-title":"Enhancing time series forecasting: A hierarchical transformer with probabilistic decomposition representation","volume":"647","author":"Tong","year":"2023","journal-title":"Inf. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1016\/j.ijforecast.2021.11.011","article-title":"Parameter-efficient deep probabilistic forecasting","volume":"39","author":"Sprangers","year":"2023","journal-title":"Int. J. Forecast."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1016\/j.ijforecast.2023.04.007","article-title":"Probabilistic hierarchical forecasting with deep Poisson mixtures","volume":"40","author":"Olivares","year":"2024","journal-title":"Int. J. Forecast."},{"key":"ref_45","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., and Polosukhin, I. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_46","first-page":"11106","article-title":"Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting","volume":"35","author":"Zhou","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_47","unstructured":"Wu, H., Xu, J., Wang, J., and Long, M. (2021, January 6\u201314). Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Proceedings of the Advances in Neural Information Processing Systems, Online."},{"key":"ref_48","unstructured":"Woo, G., Liu, C., Sahoo, D., Kumar, A., and Hoi, S. (2022). ETSformer: Exponential Smoothing Transformers for Time-series Forecasting. arXiv."},{"key":"ref_49","unstructured":"Liu, Y., Wu, H., and Wang, J. (December, January 28). Non-stationary transformers: Exploring the stationarity in time series forecasting. Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA."},{"key":"ref_50","unstructured":"Kitaev, N., \u0141ukasz, K., and Levskaya, A. (2020). Reformer: The Efficient Transformer. arXiv."},{"key":"ref_51","unstructured":"Rasul, K. (2024, September 06). Time Series Transformer. Hugging Face., Available online: https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/time_series_transformer."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Casolaro, A., Capone, V., Iannuzzo, G., and Camastra, F. (2023). Deep Learning for Time Series Forecasting: Advances and Open Problems. Information, 14.","DOI":"10.3390\/info14110598"},{"key":"ref_53","unstructured":"Ansari, A.F., Stella, L., Turkmen, C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S.S., Arango, S.P., and Kapoor, S. (2024). Chronos: Learning the Language of Time Series. arXiv."},{"key":"ref_54","unstructured":"Rasul, K., Ashok, A., Williams, A.R., Ghonia, H., Bhagwatkar, R., Khorasani, A., Bayazi, M.J.D., Adamopoulos, G., Riachi, R., and Hassen, N. (2024). Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. arXiv."},{"key":"ref_55","first-page":"4629","article-title":"GluonTS: Probabilistic and Neural Time Series Modeling in Python","volume":"21","author":"Alexandrov","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_56","unstructured":"Rasul, K. (2024, December 04). pytorch-transformer-ts. Available online: https:\/\/github.com\/kashif\/pytorch-transformer-ts."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1016\/j.ijforecast.2021.07.007","article-title":"The M5 competition: Background, organization, and implementation","volume":"38","author":"Makridakis","year":"2022","journal-title":"Int. J. Forecast."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1346","DOI":"10.1016\/j.ijforecast.2021.11.013","article-title":"M5 accuracy competition: Results, findings, and conclusions","volume":"38","author":"Makridakis","year":"2022","journal-title":"Int. J. Forecast."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019, January 4\u20138). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330701"},{"key":"ref_60","unstructured":"Chaudhuri, K., and Sugiyama, M. (2019, January 16\u201318). Probabilistic Forecasting with Spline Quantile Function RNNs. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, Naha, Japan. PMLR; Proceedings of Machine Learning Research."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1016\/j.ijforecast.2006.03.001","article-title":"Another look at measures of forecast accuracy","volume":"22","author":"Hyndman","year":"2006","journal-title":"Int. J. Forecast."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1257\/jep.15.4.143","article-title":"Quantile Regression","volume":"15","author":"Koenker","year":"2001","journal-title":"J. Econ. Perspect."}],"container-title":["Mathematics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-7390\/13\/5\/814\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:44:46Z","timestamp":1760028286000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-7390\/13\/5\/814"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,28]]},"references-count":62,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["math13050814"],"URL":"https:\/\/doi.org\/10.3390\/math13050814","relation":{},"ISSN":["2227-7390"],"issn-type":[{"value":"2227-7390","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,28]]}}}