{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:14:58Z","timestamp":1760145298399,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T00:00:00Z","timestamp":1720569600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100011787","name":"Key R&amp;D Program Projects of Heilongjiang Province","doi-asserted-by":"publisher","award":["2022ZX01A15","23-1-5-yqpy-11-qy"],"award-info":[{"award-number":["2022ZX01A15","23-1-5-yqpy-11-qy"]}],"id":[{"id":"10.13039\/501100011787","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Cultivation Project of Qingdao Science and Technology Plan Park","award":["2022ZX01A15","23-1-5-yqpy-11-qy"],"award-info":[{"award-number":["2022ZX01A15","23-1-5-yqpy-11-qy"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the continuous advancement of sensing technology, applying large amounts of sensor data to practical prediction processes using artificial intelligence methods has become a developmental direction. In sensing images and remote sensing meteorological data, the dynamic changes in the prediction targets relative to their background information often exhibit more significant dynamic characteristics. Previous prediction methods did not specifically analyze and study the dynamic change information of prediction targets at spatiotemporal multi-scale. Therefore, this paper proposes a neural prediction network based on perceptual multi-scale spatiotemporal dynamic changes (PMSTD-Net). By designing Multi-Scale Space Motion Change Attention Unit (MCAU) to perceive the local situation and spatial displacement dynamic features of prediction targets at different scales, attention is ensured on capturing the dynamic information in their spatial dimensions adequately. On this basis, this paper proposes Multi-Scale Spatiotemporal Evolution Attention (MSEA) unit, which further integrates the spatial change features perceived by MCAU units in higher channel dimensions, and learns the spatiotemporal evolution characteristics at different scales, effectively predicting the dynamic characteristics and regularities of targets in sensor information.Through experiments on spatiotemporal prediction standard datasets such as Moving MNIST, video prediction dataset KTH, and Human3.6m, PMSTD-Net demonstrates prediction performance surpassing previous methods. We construct the GPM satellite remote sensing precipitation dataset, demonstrating the network\u2019s advantages in perceiving multi-scale spatiotemporal dynamic changes in remote sensing meteorological data. Finally, through extensive ablation experiments, the performance of each module in PMSTD-Net is thoroughly validated.<\/jats:p>","DOI":"10.3390\/s24144467","type":"journal-article","created":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T15:22:05Z","timestamp":1720624925000},"page":"4467","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["PMSTD-Net: A Neural Prediction Network for Perceiving Multi-Scale Spatiotemporal Dynamics"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8417-8428","authenticated-orcid":false,"given":"Feng","family":"Gao","sequence":"first","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"},{"name":"Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266400, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4381-9328","authenticated-orcid":false,"given":"Sen","family":"Li","sequence":"additional","affiliation":[{"name":"Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266400, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9524-0071","authenticated-orcid":false,"given":"Yuankang","family":"Ye","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}]},{"given":"Chang","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"},{"name":"Qingdao Hatran Ocean Intelligence Technology Co., Ltd., Qingdao 266400, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"103898","DOI":"10.1016\/j.imavis.2020.103898","article-title":"A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators","volume":"96","author":"Sahin","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.neucom.2021.03.091","article-title":"A review on the attention mechanism of deep learning","volume":"452","author":"Niu","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1167\/7.11.5","article-title":"Attention changes perceived size of moving visual patterns","volume":"7","author":"Henrich","year":"2007","journal-title":"J. Vis."},{"key":"ref_4","unstructured":"Srivastava, N., Mansimov, E., and Salakhudinov, R. (2015, January 6\u201311). Unsupervised learning of video representations using lstms. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Schuldt, C., Laptev, I., and Caputo, B. (2004, January 26). Recognizing human actions: A local SVM approach. Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, Cambridge, UK.","DOI":"10.1109\/ICPR.2004.1334462"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1109\/TPAMI.2013.248","article-title":"Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments","volume":"36","author":"Ionescu","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","first-page":"30","article-title":"NASA global precipitation measurement (GPM) integrated multi-satellite retrievals for GPM (IMERG)","volume":"4","author":"Huffman","year":"2015","journal-title":"Algorithm Theor. Basis Doc. Version"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Gao, Z., Tan, C., Wu, L., and Li, S.Z. (2022, January 18\u201324). Simvp: Simpler yet better video prediction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00317"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.4249\/scholarpedia.1888","article-title":"Recurrent neural networks","volume":"8","author":"Grossberg","year":"2013","journal-title":"Scholarpedia"},{"key":"ref_10","unstructured":"Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., and Woo, W.C. (2015, January 7\u201312). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, Montreal, QC, USA."},{"key":"ref_11","unstructured":"Wang, Y., Long, M., Wang, J., Gao, Z., and Yu, P.S. (2017, January 4\u20139). Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_12","unstructured":"Wang, Y., Gao, Z., Long, M., Wang, J., and Philip, S.Y. (2018, January 10\u201315). Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_13","unstructured":"Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., and Fei-Fei, L. (May, January 30). Eidetic 3D LSTM: A model for video prediction and beyond. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, Y., Zhang, J., Zhu, H., Long, M., Wang, J., and Yu, P.S. (2019, January 15\u201320). Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00937"},{"key":"ref_15","first-page":"26950","article-title":"Mau: A motion-aware unit for video prediction and beyond","volume":"34","author":"Chang","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wu, H., Yao, Z., Wang, J., and Long, M. (2021, January 20\u201325). MotionRNN: A flexible model for video prediction with spacetime-varying motions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01518"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tang, S., Li, C., Zhang, P., and Tang, R. (2023, January 1\u20136). Swinlstm: Improving spatiotemporal prediction accuracy using swin transformer and lstm. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01239"},{"key":"ref_18","unstructured":"Jia, X., De Brabandere, B., Tuytelaars, T., and Gool, L.V. (2016). Dynamic filter networks. Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 5\u201310 December 2016, Curran Associates, Inc."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xu, Z., Wang, Y., Long, M., Wang, J., and KLiss, M. (2018, January 13\u201319). PredCNN: Predictive Learning with Cascade Convolutions. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/408"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Liang, L., Zharkov, I., and Neumann, U. (2023, January 1\u20136). Mmvp: Motion-matrix-based video prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00394"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tan, C., Gao, Z., Wu, L., Xu, Y., Xia, J., Li, S., and Li, S.Z. (2023, January 17\u201324). Temporal attention unit: Towards efficient spatiotemporal predictive learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01800"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, S., and Yang, N. (2023). STMP-Net: A Spatiotemporal Prediction Network Integrating Motion Perception. Sensors, 23.","DOI":"10.3390\/s23115133"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ye, Y., Gao, F., Cheng, W., Liu, C., and Zhang, S. (2023). MSSTNet: A Multi-Scale Spatiotemporal Prediction Neural Network for Precipitation Nowcasting. Remote Sens., 15.","DOI":"10.3390\/rs15010137"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Naz, F., She, L., Sinan, M., and Shao, J. (2024). Enhancing Radar Echo Extrapolation by ConvLSTM2D for Precipitation Nowcasting. Sensors, 24.","DOI":"10.3390\/s24020459"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tang, R., Zhang, P., Wu, J., Chen, Y., Dong, L., Tang, S., and Li, C. (2023). Pred-SF: A Precipitation Prediction Model Based on Deep Neural Networks. Sensors, 23.","DOI":"10.3390\/s23052609"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2208","DOI":"10.1109\/TPAMI.2022.3165153","article-title":"Predrnn: A recurrent neural network for spatiotemporal predictive learning","volume":"45","author":"Wang","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., and Yan, S. (2023, January 17\u201324). Metaformer is actually what you need for vision. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01055"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1007\/s41095-023-0364-2","article-title":"Visual attention network","volume":"9","author":"Guo","year":"2023","journal-title":"Comput. Vis. Media"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ding, X., Zhang, X., Han, J., and Ding, G. (2022, January 18\u201324). Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01166"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Oliu, M., Selva, J., and Escalera, S. (2018, January 8\u201314). Folded recurrent neural networks for future video prediction. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_44"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lee, S., Kim, H.G., Choi, D.H., Kim, H.I., and Ro, Y.M. (2021, January 20\u201325). Video prediction recalling long-term motion context via memory alignment learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00307"},{"key":"ref_32","unstructured":"Guen, V.L., and Thome, N. (2020, January 13\u201319). Disentangling physical dynamics from unknown factors for unsupervised video prediction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_33","unstructured":"Yu, W., Lu, Y., Easterbrook, S., and Fidler, S. (2020, January 26\u201330). Efficient and information-preserving future frame prediction and beyond. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_34","unstructured":"Villegas, R., Yang, J., Hong, S., Lin, X., and Lee, H. (2017). Decomposing motion and content for natural video sequence prediction. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"7519","DOI":"10.1080\/01431161.2021.1963003","article-title":"SAR ship detection in complex background based on multi-feature fusion and non-local channel attention mechanism","volume":"42","author":"Wang","year":"2021","journal-title":"Int. J. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1745","DOI":"10.1109\/LGRS.2017.2733548","article-title":"Prediction of sea surface temperature using long short-term memory","volume":"14","author":"Zhang","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4467\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:12:56Z","timestamp":1760109176000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4467"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,10]]},"references-count":36,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["s24144467"],"URL":"https:\/\/doi.org\/10.3390\/s24144467","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2024,7,10]]}}}