{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:15:36Z","timestamp":1760145336492,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2024,7,13]],"date-time":"2024-07-13T00:00:00Z","timestamp":1720828800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62073279","22567619H"],"award-info":[{"award-number":["62073279","22567619H"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hebei innovation capability improvement plan project","award":["62073279","22567619H"],"award-info":[{"award-number":["62073279","22567619H"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot establish global dependencies on the feature maps of a search area. This paper proposes a dynamic cascade feature fusion (DCFF) module by introducing a local feature guidance (LFG) module and dynamic attention modules (DAMs) after the depth-wise cross-correlation module to enhance the global dependency modeling capability during the feature fusion process. In this paper, a set of verification experiments is designed to investigate whether establishing global dependencies for the features output by the depth-wise cross-correlation operation can significantly improve the performance of fully convolutional trackers based on a Siamese network, providing experimental support for rational design of the structure of a dynamic cascade feature fusion module. Secondly, we integrate the dynamic cascade feature fusion module into the tracking framework based on a Siamese network, propose SiamDCFF, and evaluate it using public datasets. Compared with the baseline model, SiamDCFF demonstrated significant improvements.<\/jats:p>","DOI":"10.3390\/s24144545","type":"journal-article","created":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T14:15:49Z","timestamp":1721052949000},"page":"4545","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SiamDCFF: Dynamic Cascade Feature Fusion for Vision Tracking"],"prefix":"10.3390","volume":"24","author":[{"given":"Jinbo","family":"Lu","sequence":"first","affiliation":[{"name":"School of Electrical Engineering, Yanshan University, Qinhuangdao 066000, China"}]},{"given":"Na","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Yanshan University, Qinhuangdao 066000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4741-7578","authenticated-orcid":false,"given":"Shuo","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering, Yanshan University, Qinhuangdao 066000, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 15\u201320). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhong, B., Li, G., Zhang, S., and Ji, R. (2020, January 13\u201319). Siamese box adaptive network for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00670"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Guo, D., Wang, J., Cui, Y., Wang, Z., and Chen, S. (2020, January 13\u201319). SiamCAR: Siamese fully convolutional classification and regression for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00630"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (15\u201316, January 8\u201310). Fully-convolutional siamese networks for object tracking. Proceedings of the Computer Vision\u2013ECCV 2016 Workshops, Amsterdam, The Netherlands. Proceedings, Part II 14.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201323). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_6","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems 25 (NIPS 2012): 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_8","unstructured":"Chopra, S., Hadsell, R., and LeCun, Y. (2005, January 20\u201325). Learning a similarity metric discriminatively, with application to face verification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ye, B., Chang, H., Ma, B., Shan, S., and Chen, X. (2022, January 23\u201327). Joint feature learning and relation modeling for tracking: A one-stream framework. Proceedings of the European Conference Computer Vision (ECCV), Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20047-2_20"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jiang, C., Wang, L., and Wu, G. (2022, January 18\u201324). Mixformer: End-to-end tracking with iterative mixed attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01324"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xia, Z., Pan, X., Song, S., Li, L.E., and Huang, G. (2022, January 18\u201324). Vision transformer with deformable attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00475"},{"key":"ref_12","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_13","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., and Shen, C. (2021, January 20\u201325). Graph attention tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00942"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yu, Y., Xiong, Y., Huang, W., and Scott, M.R. (2020, January 13\u201319). Deformable siamese attention networks for visual object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00676"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, N., Zhou, W., Wang, J., and Li, H. (2021, January 20\u201325). Transformer meets tracker: Exploiting temporal context for robust visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00162"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part V 13.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Real, E., Shlens, J., Mazzocchi, S., Pan, X., and Vanhoucke, V. (2017, January 21\u201326). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.789"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016, January 11\u201314). A benchmark and simulator for uav tracking. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part I 14.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"GOT-10k: A large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","article-title":"Object Tracking Benchmark","volume":"37","author":"Wu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","unstructured":"Zhao, M., Okada, K., and Inaba, M. (2021). Trtr: Visual tracking with transformer. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Peng, H., Fu, J., Li, B., and Hu, W. (2020, January 23\u201328). Ocean: Object-aware anchor-free tracking. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part XXI 16.","DOI":"10.1007\/978-3-030-58589-1_46"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 15\u201320). Deeper and wider siamese networks for real-time visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cao, Z., Fu, C., Ye, J., Li, B., and Li, Y. (2021, January 11\u201317). HiFT: Hierarchical feature transformer for aerial tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01517"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Du, F., Liu, P., Zhao, W., and Tang, X. (2020, January 13\u201319). Correlation-guided attention for corner detection based visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00687"},{"key":"ref_28","unstructured":"Bhat, G., Danelljan, M., Gool, L.V., and Timofte, R. (November, January 27). Learning discriminative model prediction for tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_29","unstructured":"Zheng, L., Tang, M., Wang, J., and Lu, H. (2019). Learning features with differentiable closed-form solver for tracking. arXiv."},{"key":"ref_30","unstructured":"Nam, G., Oh, S.W., Lee, J.Y., and Kim, S.J. (2020). DMV: Visual object tracking via part-level dense memory and voting-based retrieval. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W. (2018, January 8\u201314). Distractor-aware siamese networks for visual object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, G., Luo, C., Xiong, Z., and Zeng, W. (2019, January 15\u201320). SPM-Tracker: Series-parallel matching for real-time visual object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00376"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wang, Z., Li, Z., Yuan, Y., and Yu, G. (2020, January 7\u201312). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6944"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2019, January 15\u201320). Atom: Accurate tracking by overlap maximization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00479"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1109\/TCSS.2023.3235649","article-title":"Flexible Dual-Branch Siamese Network: Learning Location Quality Estimation and Regression Distribution for Visual Tracking","volume":"11","author":"Hu","year":"2023","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gao, J., Zhang, T., and Xu, C. (2019, January 15\u201320). Graph convolutional tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00478"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Gool, L.V., and Timofte, R. (2020, January 13\u201319). Probabilistic regression for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00721"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4545\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:16:19Z","timestamp":1760109379000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/14\/4545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,13]]},"references-count":38,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["s24144545"],"URL":"https:\/\/doi.org\/10.3390\/s24144545","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2024,7,13]]}}}