{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:40:19Z","timestamp":1756485619035,"version":"3.44.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:p>\n            Spatio-Temporal Prediction (STP) is crucial for various smart city applications, such as traffic management and resource allocation. However, training samples can be scarce in data-constrained scenarios, which often degrades the predictive capability of existing deep STP models. Although recent STP foundation models excel in few-shot and zero-shot learning through extensive pre-training on large-scale, multi-domain spatio-temporal data, they often rely on large parameter scale to achieve enhanced performance, resulting in high computational demands that hinder practical deployment. In response, we develop CompactST, an efficient, compact, and versatile pre-trained model for STP in data-scarce settings. Recognizing the complexities posed by large-scale, heterogeneous pre-training datasets, CompactST integrates three specialized components: (1) a mixture-of-normalizers module to address domain and spatial heterogeneity, (2) a multi-scale spatio-temporal mixer that captures diverse patterns from datasets with varying spatio-temporal resolutions, and (3) an adaptive dataset-oriented tuning module that transfers the handling of dataset-specific parameters from pre-training to fine-tuning stage. These tailored designs enable CompactST to maximize generalizability across diverse datasets while maintaining a compact model size (\n            <jats:italic toggle=\"yes\">i.e.<\/jats:italic>\n            , only 300K parameters). To validate its effectiveness, we pre-train CompactST on a substantial corpus of public spatio-temporal datasets spanning over 10 domains and encompassing 300 million data points. Extensive experimental results on ten real-world datasets demonstrate CompactST's significantly improved prediction accuracy and efficiency in data-scarce scenarios.\n          <\/jats:p>","DOI":"10.14778\/3734839.3734851","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:01:06Z","timestamp":1756483266000},"page":"2149-2158","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Scalable Pre-Training of Compact Urban Spatio-Temporal Predictive Models on Large-Scale Multi-Domain Data"],"prefix":"10.14778","volume":"18","author":[{"given":"Jindong","family":"Han","sequence":"first","affiliation":[{"name":"Shandong University"}]},{"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou)"}]},{"given":"Hui","family":"Xiong","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou) and The Hong Kong University of Science and Technology"}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology, (Guangzhou) and The Hong Kong University of Science and Technology"}]}],"member":"320","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i6.25880"},{"key":"e_1_2_1_2_1","volume-title":"Graph deep learning for time series forecasting. arXiv preprint arXiv:2310.15978","author":"Cini Andrea","year":"2023","unstructured":"Andrea Cini, Ivan Marisca, Daniele Zambon, and Cesare Alippi. 2023. Graph deep learning for time series forecasting. arXiv preprint arXiv:2310.15978 (2023)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2019.2950416"},{"key":"e_1_2_1_4_1","volume-title":"Forty-first International Conference on Machine Learning.","author":"Das Abhimanyu","unstructured":"Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. [n.d.]. A decoder-only foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467330"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2015.03.121"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599533"},{"key":"e_1_2_1_8_1","volume-title":"TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series. arXiv preprint arXiv:2401.03955","author":"Ekambaram Vijay","year":"2024","unstructured":"Vijay Ekambaram, Arindam Jati, Nam H Nguyen, Pankaj Dayama, Chandra Reddy, Wesley M Gifford, and Jayant Kalagnanam. 2024. TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series. arXiv preprint arXiv:2401.03955 (2024)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457394"},{"key":"e_1_2_1_10_1","volume-title":"MOMENT: A Family of Open Time-series Foundation Models. In Forty-first International Conference on Machine Learning.","author":"Goswami Mononito","year":"2024","unstructured":"Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. 2024. MOMENT: A Family of Open Time-series Foundation Models. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301922"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641217"},{"key":"e_1_2_1_13_1","volume-title":"Machine learning for urban air quality analytics: A survey. arXiv preprint arXiv:2310.09620","author":"Han Jindong","year":"2023","unstructured":"Jindong Han, Weijia Zhang, Hao Liu, and Hui Xiong. 2023. Machine learning for urban air quality analytics: A survey. arXiv preprint arXiv:2310.09620 (2023)."},{"key":"e_1_2_1_14_1","volume-title":"Spatio-temporal graph neural networks for predictive learning in urban computing: A survey","author":"Jin Guangyin","year":"2023","unstructured":"Guangyin Jin, Yuxuan Liang, Yuchen Fang, Zezhi Shao, Jincai Huang, Junbo Zhang, and Yu Zheng. 2023. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. IEEE Transactions on Knowledge and Data Engineering (2023)."},{"key":"e_1_2_1_15_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Jin Ming","year":"2024","unstructured":"Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. 2024. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539250"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599529"},{"key":"e_1_2_1_18_1","volume-title":"International conference on machine learning. PMLR, 5156\u20135165","author":"Katharopoulos Angelos","year":"2020","unstructured":"Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and Fran\u00e7ois Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 5156\u20135165."},{"key":"e_1_2_1_19_1","volume-title":"International Conference on Learning Representations.","author":"Kim Taesung","year":"2021","unstructured":"Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. 2021. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations."},{"key":"e_1_2_1_20_1","volume-title":"Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.","author":"Li Yaguang","year":"2018","unstructured":"Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations."},{"key":"e_1_2_1_21_1","volume-title":"Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. arXiv preprint arXiv:2302.04501","author":"Li Zhe","year":"2023","unstructured":"Zhe Li, Zhongwen Rao, Lujia Pan, and Zenglin Xu. 2023. Mts-mixers: Multivariate time series forecasting via factorized temporal and channel mixing. arXiv preprint arXiv:2302.04501 (2023)."},{"key":"e_1_2_1_22_1","volume-title":"OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction. arXiv preprint arXiv:2408.10269","author":"Li Zhonghang","year":"2024","unstructured":"Zhonghang Li, Long Xia, Lei Shi, Yong Xu, Dawei Yin, and Chao Huang. 2024. OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction. arXiv preprint arXiv:2408.10269 (2024)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671578"},{"key":"e_1_2_1_24_1","volume-title":"GPT-ST: generative pre-training of spatio-temporal graph neural networks. Advances in Neural Information Processing Systems 36","author":"Li Zhonghang","year":"2024","unstructured":"Zhonghang Li, Lianghao Xia, Yong Xu, and Chao Huang. 2024. GPT-ST: generative pre-training of spatio-temporal graph neural networks. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_25_1","volume-title":"Spatial-temporal large language model for traffic prediction. arXiv preprint arXiv:2401.10134","author":"Liu Chenxi","year":"2024","unstructured":"Chenxi Liu, Sun Yang, Qianxiong Xu, Zhishuai Li, Cheng Long, Ziyue Li, and Rui Zhao. 2024. Spatial-temporal large language model for traffic prediction. arXiv preprint arXiv:2401.10134 (2024)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3615160"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3036057"},{"key":"e_1_2_1_28_1","volume-title":"How can large language models understand spatial-temporal data? arXiv preprint arXiv:2401.14192","author":"Liu Lei","year":"2024","unstructured":"Lei Liu, Shuo Yu, Runze Wang, Zhenxun Ma, and Yanming Shen. 2024. How can large language models understand spatial-temporal data? arXiv preprint arXiv:2401.14192 (2024)."},{"key":"e_1_2_1_29_1","volume-title":"Reinventing Node-centric Traffic Forecasting for Improved Accuracy and Efficiency. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 21\u201338","author":"Liu Xu","year":"2024","unstructured":"Xu Liu, Yuxuan Liang, Chao Huang, Hengchang Hu, Yushi Cao, Bryan Hooi, and Roger Zimmermann. 2024. Reinventing Node-centric Traffic Forecasting for Improved Accuracy and Efficiency. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 21\u201338."},{"key":"e_1_2_1_30_1","volume-title":"Largest: A benchmark dataset for large-scale traffic forecasting. Advances in Neural Information Processing Systems 36","author":"Liu Xu","year":"2024","unstructured":"Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu, Bryan Hooi, and Roger Zimmermann. 2024. Largest: A benchmark dataset for large-scale traffic forecasting. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_31_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Liu Yong","unstructured":"Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. [n.d.]. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_1_32_1","unstructured":"Yong Liu Haoran Zhang Chenyu Li Xiangdong Huang Jianmin Wang and Mingsheng Long. [n.d.]. Timer: Generative Pre-trained Transformers Are Large Time Series Models. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539281"},{"key":"e_1_2_1_34_1","volume-title":"The Eleventh International Conference on Learning Representations.","author":"Nie Yuqi","unstructured":"Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. [n.d.]. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330884"},{"key":"e_1_2_1_36_1","volume-title":"Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. arXiv preprint arXiv:2403.20150","author":"Qiu Xiangfei","year":"2024","unstructured":"Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S Jensen, Zhenli Sheng, et al. 2024. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. arXiv preprint arXiv:2403.20150 (2024)."},{"key":"e_1_2_1_37_1","unstructured":"Zezhi Shao Fei Wang Yongjun Xu Wei Wei Chengqing Yu Zhao Zhang Di Yao Guangyin Jin Xin Cao Gao Cong et al. 2023. Exploring progress in multi-variate time series forecasting: Comprehensive benchmarking and heterogeneity analysis. arXiv preprint arXiv:2310.06119 (2023)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557702"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539396"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551827"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5438"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3627673.3679749"},{"key":"e_1_2_1_43_1","volume-title":"Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems 34","author":"Tolstikhin Ilya O","year":"2021","unstructured":"Ilya O Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, et al. 2021. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems 34 (2021), 24261\u201324272."},{"key":"e_1_2_1_44_1","volume-title":"Instance normalization: The missing ingredient for fast stylization. CoRR abs\/1607.08022","author":"Ulyanov Dmitry","year":"2016","unstructured":"Dmitry Ulyanov, Andrea Vedaldi, and Victor S Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. CoRR abs\/1607.08022 (2016). arXiv preprint arXiv:1607.08022 (2016)."},{"key":"e_1_2_1_45_1","volume-title":"Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani A","year":"2017","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474717.3483923"},{"key":"e_1_2_1_47_1","unstructured":"Shiyu Wang Haixu Wu Xiaoming Shi Tengge Hu Huakun Luo Lintao Ma James Y Zhang and JUN ZHOU. [n.d.]. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_1_48_1","volume-title":"Unified Training of Universal Time Series Forecasting Transformers. In Forty-first International Conference on Machine Learning.","author":"Woo Gerald","year":"2024","unstructured":"Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. 2024. Unified Training of Universal Time Series Forecasting Transformers. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_49_1","volume-title":"zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting. The VLDB Journal","author":"Wu Xinle","year":"2024","unstructured":"Xinle Wu, Xingjian Wu, Bin Yang, Lekui Zhou, Chenjuan Guo, Xiangfei Qiu, Jilin Hu, Zhenli Sheng, and Christian S Jensen. 2024. AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting. The VLDB Journal (2024), 1\u201328."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503604"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588951"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2978386"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403118"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/264"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015668"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3654621.3654628"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671662"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i9.26317"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3000761"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10735"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614969"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3636218.3636230"},{"key":"e_1_2_1_63_1","volume-title":"T-GCN: A temporal graph convolutional network for traffic prediction","author":"Zhao Ling","year":"2019","unstructured":"Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2019. T-GCN: A temporal graph convolutional network for traffic prediction. IEEE transactions on intelligent transportation systems 21, 9 (2019), 3848\u20133858."},{"key":"e_1_2_1_64_1","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2629592","article-title":"Urban computing: concepts, methodologies, and applications","volume":"5","author":"Zheng Yu","year":"2014","unstructured":"Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 3 (2014), 1\u201355.","journal-title":"ACM Transactions on Intelligent Systems and Technology (TIST)"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/3654621.3654637"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17325"},{"key":"e_1_2_1_68_1","unstructured":"Tian Zhou Peisong Niu Liang Sun Rong Jin et al. 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023) 43322\u201343355."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3734839.3734851","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:01:17Z","timestamp":1756483277000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3734839.3734851"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":68,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10.14778\/3734839.3734851"],"URL":"https:\/\/doi.org\/10.14778\/3734839.3734851","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,3]]},"assertion":[{"value":"2025-08-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}