{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T01:53:49Z","timestamp":1780883629993,"version":"3.54.1"},"reference-count":59,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T00:00:00Z","timestamp":1724544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Most current methods use spatial\u2013temporal graph neural networks (STGNNs) to analyze complex spatial\u2013temporal information from traffic data collected from hundreds of sensors. STGNNs combine graph neural networks (GNNs) and sequence models to create hybrid structures that allow for the two networks to collaborate. However, this collaboration has made the model increasingly complex. This study proposes a framework that relies solely on original Transformer architecture and carefully designs embeddings to efficiently extract spatial\u2013temporal dependencies in traffic flow. Additionally, we used pre-trained language models to enhance forecasting performance. We compared our new framework with current state-of-the-art STGNNs and Transformer-based models using four real-world traffic datasets: PEMS04, PEMS08, METR-LA, and PEMS-BAY. The experimental results demonstrate that our framework outperforms the other models in most metrics.<\/jats:p>","DOI":"10.3390\/s24175502","type":"journal-article","created":{"date-parts":[[2024,8,26]],"date-time":"2024-08-26T03:32:01Z","timestamp":1724643121000},"page":"5502","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Spatial\u2013Temporal Transformer Networks for Traffic Flow Forecasting Using a Pre-Trained Language Model"],"prefix":"10.3390","volume":"24","author":[{"given":"Ju","family":"Ma","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5770-8523","authenticated-orcid":false,"given":"Juan","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yao","family":"Hou","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Xia, Y., Jian, X., Yan, B., and Su, D. (2019). Infrastructure safety oriented traffic load monitoring using multi-sensor and single camera for short and medium span bridges. Remote Sens., 11.","DOI":"10.3390\/rs11222651"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1699","DOI":"10.1111\/mice.13154","article-title":"A hybrid virtual\u2013real traffic simulation approach to reproducing the spatiotemporal distribution of bridge loads","volume":"39","author":"Zhou","year":"2024","journal-title":"Comput. Aided Civ. Infrastruct. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Chen, W., Yue, H., and Zhong, L. (2016, January 28\u201330). A novel short-term traffic forecast model based on travel distance estimation and ARIMA[C]. Proceedings of the Chinese Control and Decision Conference (CCDC), Yinchuan, China.","DOI":"10.1109\/CCDC.2016.7532126"},{"key":"ref_4","first-page":"ii-429","article-title":"Switching ARIMA model based forecasting for traffic flow","volume":"Volume 2","author":"Yu","year":"2004","journal-title":"Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chikkakrishna, N.K., Hardik, C., Deepika, K., and Sparsha, N. (2019, January 13\u201315). Short-term traffic prediction using sarima and FbPROPHET. Proceedings of the 2019 IEEE 16th INDIA Council International Conference (INDICON), Rajkot, India.","DOI":"10.1109\/INDICON47234.2019.9028937"},{"key":"ref_6","first-page":"418303","article-title":"Accurate multisteps traffic flow prediction based on SVM","volume":"1","author":"Mingheng","year":"2013","journal-title":"Math. Probl. Eng."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2001","DOI":"10.1109\/TITS.2018.2854913","article-title":"Adaptive multi-kernel SVM with spatial\u2013temporal correlation for short-term traffic flow prediction","volume":"20","author":"Feng","year":"2018","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Duan, M. (2018, January 25\u201326). Short-time prediction of traffic flow based on PSO optimized SVM. Proceedings of the 2018 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xiamen, China.","DOI":"10.1109\/ICITBS.2018.00018"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Dong, X., Lei, T., Jin, S., and Hou, Z. (2018, January 25\u201327). Short-term traffic flow prediction based on XGBoost. Proceedings of the 2018 IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS), Enshi, China.","DOI":"10.1109\/DDCLS.2018.8516114"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.neucom.2017.03.049","article-title":"\u03b4-agree AdaBoost stacked autoencoder for short-term traffic flow forecasting","volume":"247","author":"Zhou","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_11","first-page":"1","article-title":"Traffic flow prediction using adaboost algorithm with random forests as a weak learner","volume":"1","author":"Leshem","year":"2007","journal-title":"Int. J. Math. Comput. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_13","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014, January 13). Empirical evaluation of gated recurrent neural networks on sequence modeling. Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada. Available online: https:\/\/nyuscholars.nyu.edu\/en\/publications\/empirical-evaluation-of-gated-recurrent-neural-networks-on-sequen."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"16453","DOI":"10.1007\/s00500-020-04954-0","article-title":"Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station","volume":"24","author":"Hewage","year":"2020","journal-title":"Soft Comput."},{"key":"ref_16","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 12). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA. Available online: https:\/\/api.semanticscholar.org\/CorpusID:13756489."},{"key":"ref_17","first-page":"3104","article-title":"Sequence to sequence learning with neural networks","volume":"Volume 2","author":"Sutskever","year":"2014","journal-title":"Proceedings of the 27th Conference on Neural Information Processing Systems"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1111\/tgis.12644","article-title":"Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting","volume":"24","author":"Cai","year":"2020","journal-title":"Trans. GIS"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2009","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Grover, A., and Leskovec, J. (2016, January 13\u201317). node2vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939754"},{"key":"ref_21","unstructured":"Hamilton, W.L., Ying, R., and Leskovec, J. (2017, January 4\u20139). Inductive representation learning on large graphs. In Proceeding of the 31st International Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA. Available online: https:\/\/api.semanticscholar.org\/CorpusID:4755450."},{"key":"ref_22","unstructured":"Kipf, T.N., and Welling, M. (2017, January 24\u201326). Semi-supervised classification with graph convolutional networks. Proceedings of the 5th International Conference on Learning Representations (ICRL), Toulon, France. Available online: https:\/\/openreview.net\/forum?id=SJU4ayYgl."},{"key":"ref_23","unstructured":"Li, Y., Yu, R., Shahabi, C., and Liu, Y. (May, January 30). Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. Proceedings of the 5th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada. Available online: https:\/\/www.khoury.northeastern.edu\/published_research\/diffusion-convolutional-recurrent-neural-network-data-driven-traffic-forecasting\/."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wu, Z., Pan, S., Long, G., Jiang, J., and Zhang, C. (2019, January 10\u201316). Graph wavenet for deep spatial-temporal graph modeling. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China. Available online: https:\/\/dl.acm.org\/doi\/abs\/10.5555\/3367243.3367303.","DOI":"10.24963\/ijcai.2019\/264"},{"key":"ref_25","unstructured":"Bai, L., Yao, L., Li, C., Wang, X., and Wang, C. (2020, January 6\u201312). Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada. Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3495724.3497218."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., and Zhang, C. (2020, January 6\u201310). Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Virtual Event. Available online: https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403118.","DOI":"10.1145\/3394486.3403118"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"120281","DOI":"10.1016\/j.eswa.2023.120281","article-title":"Spatio-temporal graph mixformer for traffic forecasting","volume":"288","author":"Lablack","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ruan, H., Feng, X., and Zheng, H. (2021, January 10\u201313). Graph transformer attention networks for traffic flow prediction. Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Chengdu, China.","DOI":"10.1109\/ICCC54389.2021.9674238"},{"key":"ref_29","first-page":"28877","article-title":"Do Transformers Really Perform Bad for Graph Representation?","volume":"34","author":"Ying","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, H., Dong, Z., Jiang, R., Deng, J., Chen, Q., and Song, X. (2023, January 21\u201325). STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformers SOTA for Traffic Forecasting. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM), Birmingham, UK.","DOI":"10.1145\/3583780.3615160"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Gao, H., Jiang, R., Dong, Z., Deng, J., and Song, X. (2023). Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting. arXiv.","DOI":"10.24963\/ijcai.2024\/442"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"102241","DOI":"10.1016\/j.inffus.2024.102241","article-title":"PreSTNet: Pretrained Spatio-Temporal Network for traffic forecasting","volume":"106","author":"Fang","year":"2024","journal-title":"Inf. Fusion"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"107518","DOI":"10.1016\/j.knosys.2021.107518","article-title":"Ensembles of localised models for time series forecasting","volume":"233","author":"Godahewa","year":"2020","journal-title":"Knowl. Based Syst."},{"key":"ref_34","unstructured":"Zhou, T., Niu, P., Wang, X., Sun, L., and Jin, R. (2023, January 10\u201316). One Fits All: Power General Time Series Analysis by Pretrained LM. Proceedings of the Neural Information Processing Systems, New Orleans, LA, USA. Available online: https:\/\/www.semanticscholar.org\/paper\/One-Fits-All%3A-Power-General-Time-Series-Analysis-by-Zhou-Niu\/5b7f5488c380cf5085a5dd93e993ad293b225eee."},{"key":"ref_35","unstructured":"Chen, Y., Wang, X., and Xu, G. (2023). Gatgpt: A pre-trained large language model with graph attention network for spatiotemporal imputation. arXiv."},{"key":"ref_36","unstructured":"Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J.Y., Shi, X.L., Chen, P.-Y., Liang, Y.-F., Pan, S., and Wen, Q. (2023). TimeLLM: Time Series Forecasting by Reprogramming Large Language Models. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, C., Yang, S., Xu, Q., Li, Z., Long, C., Li, Z., and Zhao, R. (2024). Spatial-Temporal Large Language Model for Traffic Prediction. arXiv.","DOI":"10.1109\/MDM61037.2024.00025"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Li, Z., Xia, L., Tang, J., Xu, Y., Shi, L., Xia, L., Yin, D., and Huang, C. (2024). Urbangpt: Spatio-temporal large language models. arXiv.","DOI":"10.1145\/3637528.3671578"},{"key":"ref_39","unstructured":"Chen, D., O\u2019Bray, L., and Borgwardt, K.M. (2022, January 17\u201323). Structure-Aware Transformer for Graph Representation Learning. Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA. Available online: https:\/\/proceedings.mlr.press\/v162\/chen22r.html."},{"key":"ref_40","unstructured":"Jiang, J., Han, C., Zhao, W.X., and Wang, J. (2024, January 20\u201327). PDFormer: Propagation Delay-aware Dynamic Long-range Transformer for Traffic Flow Prediction. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vancouver, BC, Canada."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shao, Z., Zhang, Z., Wang, F., Wei, W., and Xu, Y. (2022, January 17\u201321). Spatial-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting. Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), Atlanta, GA, USA.","DOI":"10.1145\/3511808.3557702"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1109\/TPAMI.1983.4767370","article-title":"A Maximum Likelihood Approach to Continuous Speech Recognition","volume":"PAMI-5","author":"Bahl","year":"1983","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","article-title":"An empirical study of smoothing techniques for language modeling","volume":"13","author":"Chen","year":"1999","journal-title":"Comput. Speech Lang."},{"key":"ref_44","unstructured":"Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, W., Shi, Y., and Zhao, J. (2021, January 13\u201315). A robustly optimized BERT pre-training approach with post-training. Proceedings of the China National Conference on Chinese Computational Linguistics, Hohhot, China.","DOI":"10.1007\/978-3-030-84186-7_31"},{"key":"ref_46","unstructured":"Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training, OpenAI. Available online: https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf."},{"key":"ref_47","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., Casas, D.D.L., Sayed, W.E., Lavril, T., Wang, T., and Lacroix, T. (2023). Mistral 7B. arXiv."},{"key":"ref_48","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martine, X., Lachaux, M.-A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1017\/S1351324921000322","article-title":"Emerging trends: A gentle introduction to fine-tuning","volume":"27","author":"Church","year":"2021","journal-title":"Nat. Lang. Eng."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Huang, J., and Chang, K.C.C. (2023, January 14). Towards reasoning in large language models: A survey. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.findings-acl.67"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1038\/s42256-023-00626-4","article-title":"Parameter-efficient fine-tuning of large-scale pre-trained language models","volume":"5","author":"Ding","year":"2023","journal-title":"Nat. Mach. Intell."},{"key":"ref_52","first-page":"12799","article-title":"On the effectiveness of parameter-efficient fine-tuning","volume":"37","author":"Fu","year":"2023","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_53","unstructured":"Lv, K., Yang, Y., Liu, T., Gao, Q., Guo, Q., and Qiu, X. (2023). Full parameter fine-tuning for large language models with limited resources. arXiv."},{"key":"ref_54","unstructured":"Hu, J.E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., and Chen, W. (2022, January 25\u201329). LoRA: Low-Rank Adaptation of Large Language Models. Proceedings of the Tenth International Conference on Learning Representations (ICRL), Virtual Event. Available online: https:\/\/openreview.net\/forum?id=nZeVKeeFYf9."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., and Tang, J. (2022, January 22\u201327). P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dunlin, Lreland. (Short Papers).","DOI":"10.18653\/v1\/2022.acl-short.8"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Rubin, O., Herzig, J., and Berant, J. Learning To Retrieve Prompts for In-Context Learning. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA.","DOI":"10.18653\/v1\/2022.naacl-main.191"},{"key":"ref_57","unstructured":"Zhang, S., Dong, L., Li, X., Zhang, S., Sun, X., Wang, S., Li, J., Hu, R., Zhang, T., and Wang, G. (2023). Instruction tuning for large language models: A survey. arXiv."},{"key":"ref_58","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_59","first-page":"914","article-title":"Spatial-Temporal Synchronous Graph Convolutional Networks: A New Framework for Spatial-Temporal Network Data Forecasting","volume":"34","author":"Song","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/17\/5502\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:42:22Z","timestamp":1760110942000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/17\/5502"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,25]]},"references-count":59,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["s24175502"],"URL":"https:\/\/doi.org\/10.3390\/s24175502","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,25]]}}}