{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:22:36Z","timestamp":1772907756294,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T00:00:00Z","timestamp":1616976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["No.2020M680109"],"award-info":[{"award-number":["No.2020M680109"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["10.13039\/501100012226"],"award-info":[{"award-number":["10.13039\/501100012226"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Satellite video single object tracking has attracted wide attention. The development of remote sensing platforms for earth observation technologies makes it increasingly convenient to acquire high-resolution satellite videos, which greatly accelerates ground target tracking. However, overlarge images with small object size, high similarity among multiple moving targets, and poor distinguishability between the objects and the background make this task most challenging. To solve these problems, a deep Siamese network (DSN) incorporating an interframe difference centroid inertia motion (ID-CIM) model is proposed in this paper. In object tracking tasks, the DSN inherently includes a template branch and a search branch; it extracts the features from these two branches and employs a Siamese region proposal network to obtain the position of the target in the search branch. The ID-CIM mechanism was proposed to alleviate model drift. These two modules build the ID-DSN framework and mutually reinforce the final tracking results. In addition, we also adopted existing object detection datasets for remotely sensed images to generate training datasets suitable for satellite video single object tracking. Ablation experiments were performed on six high-resolution satellite videos acquired from the International Space Station and \u201cJilin-1\u201d satellites. We compared the proposed ID-DSN results with other 11 state-of-the-art trackers, including different networks and backbones. The comparison results show that our ID-DSN obtained a precision criterion of 0.927 and a success criterion of 0.694 with a frames per second (FPS) value of 32.117 implemented on a single NVIDIA GTX1070Ti GPU.<\/jats:p>","DOI":"10.3390\/rs13071298","type":"journal-article","created":{"date-parts":[[2021,3,29]],"date-time":"2021-03-29T16:01:57Z","timestamp":1617033717000},"page":"1298","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":34,"title":["Single Object Tracking in Satellite Videos: Deep Siamese Network Incorporating an Interframe Difference Centroid Inertia Motion Model"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6279-9215","authenticated-orcid":false,"given":"Kun","family":"Zhu","sequence":"first","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"given":"Xiaodong","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0733-9122","authenticated-orcid":false,"given":"Guanzhou","family":"Chen","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"given":"Xiaoliang","family":"Tan","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"given":"Puyun","family":"Liao","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China"}]},{"given":"Hongyu","family":"Wu","sequence":"additional","affiliation":[{"name":"Satellite Image Geometric Correction Processing of Chang Guang Satellite Technology Co., Ltd., Changchun 130051, China"}]},{"given":"Xiujuan","family":"Cui","sequence":"additional","affiliation":[{"name":"School of Geoscience, Yangtze University, Wuhan 430100, China"}]},{"given":"Yinan","family":"Zuo","sequence":"additional","affiliation":[{"name":"School of Geoscience, Yangtze University, Wuhan 430100, China"}]},{"given":"Zhiyong","family":"Lv","sequence":"additional","affiliation":[{"name":"School of Computer and Engineering, Xi\u2019an University of Technology, No. 5 Jin Hua South Road, Xi\u2019an 710048, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1109\/TMM.2015.2455418","article-title":"On-Road Pedestrian Tracking Across Multiple Driving Recorders","volume":"17","author":"Lee","year":"2015","journal-title":"IEEE Trans. Multimed."},{"key":"ref_2","unstructured":"Liu, L., Xing, J., Ai, H., and Ruan, X. (2012, January 11\u201315). Hand posture recognition using finger geometric feature. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tang, S., Andriluka, M., Andres, B., and Schiele, B. (2017, January 21\u201326). Multiple People Tracking by Lifted Multicut and Person Re-identification. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.394"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, G., and Vela, P.A. (2015, January 7\u201312). Good features to track for visual SLAM. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298743"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3043","DOI":"10.1109\/JSTARS.2019.2917703","article-title":"Object Tracking in Satellite Videos Based on a Multiframe Optical Flow Tracker","volume":"12","author":"Du","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201323). High Performance Visual Tracking With Siamese Region Proposal Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wu, W., Zou, W., and Yan, J. (2018, January 18\u201323). End-to-End Flow Correlation Tracking with Spatial-Temporal Attention. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00064"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","article-title":"Object Tracking Benchmark","volume":"37","author":"Wu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"3538","DOI":"10.1109\/JSTARS.2019.2933488","article-title":"Object Tracking on Satellite Videos: A Correlation Filter-Based Tracking Method With Trajectory Correction by Kalman Filter","volume":"12","author":"Guo","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1109\/TPAMI.2016.2609928","article-title":"Discriminative Scale Space Tracking","volume":"39","author":"Danelljan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-Speed Tracking with Kernelized Correlation Filters","volume":"37","author":"Henriques","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Khan, F.S., Felsberg, M., and van de Weijer, J. (2014, January 23\u201328). Adaptive Color Attributes for Real-Time Visual Tracking. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.143"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F.S., and Felsberg, M. (2016). Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. arXiv.","DOI":"10.1007\/978-3-319-46454-1_29"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2017, January 21\u201326). ECO: Efficient Convolution Operators for Tracking. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 16\u201320). SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (2016, January 8\u201316). Fully-Convolutional Siamese Networks for Object Tracking. Proceedings of the Computer Vision\u2014ECCV 2016 Workshops, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W. (2018). Distractor-aware Siamese Networks for Visual Object Tracking. arXiv.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1080\/10095020.2017.1329314","article-title":"Earth observation brain (EOB): An intelligent earth observation system","volume":"20","author":"Li","year":"2017","journal-title":"Geo-Spat. Inf. Sci."},{"key":"ref_22","first-page":"95","article-title":"Evaluation of Skybox Video and Still Image products","volume":"XL1","author":"Kuschk","year":"2014","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.3390\/s18041194","article-title":"Super-Resolution for \u201cJilin-1\u201d Satellite Video Imagery via a Convolutional Network","volume":"18","author":"Aoran","year":"2018","journal-title":"Sensors"},{"key":"ref_24","first-page":"1135","article-title":"Satellite Video Point-target Tracking in Combination with Motion Smoothness Constraint and Grayscale Feature","volume":"46","author":"Jiaqi","year":"2017","journal-title":"Acta Geod. Cartogr. Sin."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1109\/JSTARS.2020.2971657","article-title":"Object Tracking in Satellite Videos Based on Convolutional Regression Network With Appearance and Motion Features","volume":"13","author":"Hu","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S.J., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. (2014, January 23\u201328). DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.220"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2015, January 7\u201312). Learning to compare image patches via convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299064"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"7860","DOI":"10.1109\/TGRS.2019.2916953","article-title":"Tracking Objects From Satellite Videos: A Velocity Feature Based Correlation Filter","volume":"57","author":"Shao","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"8719","DOI":"10.1109\/TGRS.2019.2922648","article-title":"Can We Track Targets From Space? A Hybrid Kernel Correlation Filter Tracker for Satellite Video","volume":"57","author":"Shao","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Han, X., Zhong, Y., and Zhang, L. (2017). An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens., 9.","DOI":"10.3390\/rs9070666"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.isprsjprs.2017.07.009","article-title":"Visual object tracking by correlation filters and online learning","volume":"140","author":"Zhang","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2054","DOI":"10.1109\/TIE.2018.2835390","article-title":"Real-Time Event-Triggered Object Tracking in the Presence of Model Drift and Occlusion","volume":"66","author":"Guan","year":"2019","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"185857","DOI":"10.1109\/ACCESS.2019.2959406","article-title":"An Anti-Drift Background-Aware Correlation Filter for Visual Tracking in Complex Scenes","volume":"7","author":"Luo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"159199","DOI":"10.1109\/ACCESS.2019.2951056","article-title":"Bidirectional Tracking Scheme for Visual Object Tracking Based on Recursive Orthogonal Least Squares","volume":"7","author":"Huang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"8385","DOI":"10.1109\/ACCESS.2017.2697072","article-title":"Kalman Filter with Dynamical Setting of Optimal Process Noise Covariance","volume":"5","author":"Basso","year":"2017","journal-title":"IEEE Access"},{"key":"ref_38","unstructured":"Yu, F., and Koltun, V. (2016, January 2\u20134). Multi-Scale Context Aggregation by Dilated Convolutions. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhu, K., Chen, G., Tan, X., Zhang, L., Dai, F., Liao, P., and Gong, Y. (2019). Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens., 11.","DOI":"10.3390\/rs11070755"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Doll\u2019ar, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_43","unstructured":"(2019, October 09). 2016 IEEE GRSS Data Fusion Contest. Available online: http:\/\/www.grss-ieee.org\/community\/technical-committees\/data-fusion."},{"key":"ref_44","unstructured":"Wang, Q., Gao, J., Xing, J., Zhang, M., and Hu, W. (2017). DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 15\u201320). Deeper and Wider Siamese Networks for Real-Time Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2018). ATOM: Accurate Tracking by Overlap Maximization. arXiv.","DOI":"10.1109\/CVPR.2019.00479"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bhat, G., Danelljan, M., Gool, L.V., and Timofte, R. (November, January 27). Learning Discriminative Model Prediction for Tracking. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00628"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/7\/1298\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:10:20Z","timestamp":1760364620000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/7\/1298"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,29]]},"references-count":47,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["rs13071298"],"URL":"https:\/\/doi.org\/10.3390\/rs13071298","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,29]]}}}