{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T05:25:23Z","timestamp":1766294723851,"version":"3.48.0"},"reference-count":33,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T00:00:00Z","timestamp":1766102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U2344216"],"award-info":[{"award-number":["U2344216"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Origin\u2013destination (OD) flow prediction is fundamental to intelligent transportation systems, yet existing diffusion-based models face two critical limitations. First, they inadequately exploit spatial semantics, focusing primarily on temporal dependencies or topological correlations while neglecting urban functional heterogeneity encoded in Points of Interest (POIs). Second, static embedding fusion cannot dynamically capture semantic importance variations during denoising\u2014particularly during traffic surges in POI-dense areas. To address these gaps, we propose the Cross-Attention Diffusion Model (CADM), a semantically conditioned framework for short-term OD flow forecasting. CADM integrates POI embeddings as spatial semantic priors and employs cross-attention to enable semantic-guided denoising, facilitating dynamic spatiotemporal feature fusion. This design adaptively reweights regional representations throughout reverse diffusion, enhancing the model\u2019s capacity to capture complex mobility patterns. Experiments on real-world datasets demonstrate that CADM achieves balanced performance across multiple metrics. At the 30 min horizon, CADM attains the lowest RMSE of 5.77, outperforming iTransformer by 1.9%, while maintaining competitive performance at the 15 min horizon. Ablation studies confirm that removing POI features increases prediction errors by 15\u201320%, validating the critical role of semantic conditioning. These findings advance semantic-aware generative modeling for spatiotemporal prediction and provide practical insights for intelligent transportation systems, particularly for newly established transportation hubs or functional zone reconfigurations where semantic understanding is essential.<\/jats:p>","DOI":"10.3390\/ijgi15010002","type":"journal-article","created":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T12:59:57Z","timestamp":1766149197000},"page":"2","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Cross-Attention Diffusion Model for Semantic-Aware Short-Term Urban OD Flow Prediction"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-8573-4892","authenticated-orcid":false,"given":"Hongxiang","family":"Li","sequence":"first","affiliation":[{"name":"College of Computer Science, Beijing University of Technology, Beijing 100124, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9677-4152","authenticated-orcid":false,"given":"Zhiming","family":"Gui","sequence":"additional","affiliation":[{"name":"College of Computer Science, Beijing University of Technology, Beijing 100124, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9106-039X","authenticated-orcid":false,"given":"Zhenji","family":"Gao","sequence":"additional","affiliation":[{"name":"Integrated Natural Resources Survey Center, CGS, No. 55 Yard, Honglian South Road, Xicheng District, Beijing 100055, China"},{"name":"Technology Innovation Center of Geological Information Engineering of Ministry of Natural Resources, Beijing 100055, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,19]]},"reference":[{"key":"ref_1","first-page":"8780","article-title":"Diffusion Models Beat GANs on Image Synthesis","volume":"34","author":"Dhariwal","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"unstructured":"Ho, J., Saharia, C., Chan, W., Fleet, D.J., Norouzi, M., and Salimans, T. (2021). Cascaded Diffusion Models for High Fidelity Image Generation. arXiv.","key":"ref_2"},{"doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. arXiv.","key":"ref_3","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"ref_4","first-page":"17981","article-title":"Structured Denoising Diffusion Models in Discrete State-Spaces","volume":"34","author":"Austin","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"unstructured":"Gong, S., Li, M., Feng, J., Wu, Z., and Kong, L. (2023). DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models. arXiv.","key":"ref_5"},{"unstructured":"Li, X.L., Thickstun, J., Gulrajani, I., Liang, P., and Hashimoto, T.B. (2022). Diffusion-LM Improves Controllable Text Generation. arXiv.","key":"ref_6"},{"unstructured":"Yu, P., Ravula, A., Yang, Z., Chen, Y., and Liu, J. (2023). Latent Diffusion Energy-Based Model for Interpretable Text Modeling. arXiv.","key":"ref_7"},{"unstructured":"Kong, Z., Ping, W., Huang, J., Zhao, K., and Catanzaro, B. (2020). DiffWave: A Versatile Diffusion Model for Audio Synthesis. arXiv.","key":"ref_8"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1720","DOI":"10.1109\/TASLP.2023.3268730","article-title":"DiffSound: Discrete Diffusion Model for Text-to-Sound Generation","volume":"31","author":"Yang","year":"2023","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"doi-asserted-by":"crossref","unstructured":"Yang, R., Srivastava, P., and Mandt, S. (2022). Diffusion Probabilistic Modeling for Video Generation. arXiv.","key":"ref_10","DOI":"10.3390\/e25101469"},{"unstructured":"Harvey, W., Naderiparizi, S., Masrani, V., Weilbach, C., and Wood, F. (2022). Flexible Diffusion Modeling of Long Videos. arXiv.","key":"ref_11"},{"unstructured":"Ho, J., Saharia, C., Chowdhery, A., Niki, P., Jain, A., Fleet, D.J., Salimans, T., Chen, M., and Norouzi, M. (2022). Imagen Video: High Definition Video Generation with Diffusion Models. arXiv.","key":"ref_12"},{"unstructured":"Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., and Fleet, D.J. (2022). Video Diffusion Models. arXiv.","key":"ref_13"},{"key":"ref_14","first-page":"6840","article-title":"Denoising Diffusion Probabilistic Models","volume":"33","author":"Ho","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"unstructured":"Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., and Poole, B. (2021). Score-Based Generative Modeling through Stochastic Differential Equations. arXiv.","key":"ref_15"},{"unstructured":"Rasul, K., Seward, C., Schuster, I., and Vollgraf, R. (2021). Autoregressive Denoising Diffusion Models for Multivariate Probabilistic Time Series Forecasting. arXiv.","key":"ref_16"},{"unstructured":"Yan, T., Zhang, H., Zhou, T., Zhan, Y., and Xia, Y. (2021). ScoreGrad: Multivariate Probabilistic Time Series Forecasting with Continuous Energy-Based Generative Models. arXiv.","key":"ref_17"},{"key":"ref_18","first-page":"24804","article-title":"CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation","volume":"34","author":"Tashiro","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"unstructured":"Bilo\u0161, M., Rasul, K., Schneider, A., Nevmyvaka, Y., and G\u00fcnnemann, S. (2022). Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion. arXiv.","key":"ref_19"},{"unstructured":"Shen, L., and Kwok, J. (2023). Non-Autoregressive Conditional Diffusion Models for Time Series Prediction. arXiv.","key":"ref_20"},{"unstructured":"Kollovieh, M., Ansari, A.F., Bohlke-Schneider, M., Zschiegner, J., Wang, H., and Wang, Y. (2023). Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting. arXiv.","key":"ref_21"},{"key":"ref_22","first-page":"1","article-title":"A summary of traffic flow forecasting methods","volume":"2004","author":"Liu","year":"2004","journal-title":"Transp. Res. Circ. E-C026 Traffic Flow Theory"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1023\/B:STCO.0000035301.49549.88","article-title":"A tutorial on support vector regression","volume":"14","author":"Smola","year":"2004","journal-title":"Stat. Comput."},{"unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv.","key":"ref_24"},{"unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv.","key":"ref_25"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3848","DOI":"10.1109\/TITS.2019.2935152","article-title":"T-GCN: A temporal graph convolutional network for traffic prediction","volume":"21","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"doi-asserted-by":"crossref","unstructured":"Yu, B., Yin, H., and Zhu, Z. (2017). Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv.","key":"ref_27","DOI":"10.24963\/ijcai.2018\/505"},{"unstructured":"Guo, S., Lin, Y., Feng, N., Song, C., and Wan, H. (February, January 27). Attention-based spatial-temporal graph convolutional networks for traffic flow forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.","key":"ref_28"},{"unstructured":"Li, Y., Yu, R., Shahabi, C., and Liu, Y. (May, January 30). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada.","key":"ref_29"},{"doi-asserted-by":"crossref","unstructured":"Wu, Z., Pan, S., Long, G., Jiang, J., Chang, X., and Zhang, C. (2019, January 10\u201316). Graph WaveNet for deep spatial-temporal graph modeling. Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China.","key":"ref_30","DOI":"10.24963\/ijcai.2019\/264"},{"unstructured":"Zou, H., Chen, C., Zheng, C., Shen, Y., and Cui, P. (2023, January 10\u201316). Spatial-Temporal Graph Informer Networks for Long-Sequence Traffic Forecasting. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New-Orleans, LA, USA.","key":"ref_31"},{"unstructured":"Liu, Y., Zhang, Y., Liu, Z., Wang, H., and Chen, G. (2024). iTransformer: Inverted Transformers are Effective for Time Series Forecasting. arXiv.","key":"ref_32"},{"unstructured":"Wang, Z., Qin, T., Liu, T., and Zhang, X. (2024). TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. arXiv.","key":"ref_33"}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/15\/1\/2\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T05:22:40Z","timestamp":1766294560000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/15\/1\/2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,19]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["ijgi15010002"],"URL":"https:\/\/doi.org\/10.3390\/ijgi15010002","relation":{},"ISSN":["2220-9964"],"issn-type":[{"type":"electronic","value":"2220-9964"}],"subject":[],"published":{"date-parts":[[2025,12,19]]}}}