{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:18:48Z","timestamp":1781713128153,"version":"3.54.5"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,6]],"date-time":"2022-01-06T00:00:00Z","timestamp":1641427200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,6]],"date-time":"2022-01-06T00:00:00Z","timestamp":1641427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62102364"],"award-info":[{"award-number":["62102364"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62002325"],"award-info":[{"award-number":["62002325"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61802348"],"award-info":[{"award-number":["61802348"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772268"],"award-info":[{"award-number":["61772268"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20190065"],"award-info":[{"award-number":["BK20190065"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Visual tracking of generic objects is one of the fundamental but challenging problems in computer vision. Here, we propose a novel fully convolutional Siamese network to solve visual tracking by directly predicting the target bounding box in an end-to-end manner. We first reformulate the visual tracking task as two subproblems: a classification problem for pixel category prediction and a regression task for object status estimation at this pixel. With this decomposition, we design a simple yet effective Siamese architecture based classification and regression framework, termed SiamCAR, which consists of two subnetworks: a Siamese subnetwork for feature extraction and a classification-regression subnetwork for direct bounding box prediction. Since the proposed framework is both proposal- and anchor-free, SiamCAR can avoid the tedious hyper-parameter tuning of anchors, considerably simplifying the training. To demonstrate that a much simpler tracking framework can achieve superior tracking results, we conduct extensive experiments and comparisons with state-of-the-art trackers on a few challenging benchmarks. Without bells and whistles, SiamCAR achieves leading performance with a real-time speed. Furthermore, the ablation study validates that the proposed framework is effective with various backbone networks, and can benefit from deeper networks. Code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ohhhyeahhh\/SiamCAR\">https:\/\/github.com\/ohhhyeahhh\/SiamCAR<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s11263-021-01559-4","type":"journal-article","created":{"date-parts":[[2022,1,6]],"date-time":"2022-01-06T20:02:21Z","timestamp":1641499341000},"page":"550-566","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":58,"title":["Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks"],"prefix":"10.1007","volume":"130","author":[{"given":"Ying","family":"Cui","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9811-4828","authenticated-orcid":false,"given":"Dongyan","family":"Guo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yanyan","family":"Shao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhenhua","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chunhua","family":"Shen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Liyan","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shengyong","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,1,6]]},"reference":[{"key":"1559_CR1","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J., Vedaldi, A., & Torr, P. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"1559_CR2","doi-asserted-by":"crossref","unstructured":"Bhat, G., Danelljan, M., Gool, L., & Timofte, R. (2019a). Learning discriminative model prediction for tracking. In Proceedings of IEEE international conference on computer vision (pp. 6182\u20136191).","DOI":"10.1109\/ICCV.2019.00628"},{"key":"1559_CR3","doi-asserted-by":"crossref","unstructured":"Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019b). Learning discriminative model prediction for tracking. In Proceedings of the IEEE\/CVF international conference on computer vision.","DOI":"10.1109\/ICCV.2019.00628"},{"key":"1559_CR4","doi-asserted-by":"crossref","unstructured":"Bolme, D., Beveridge, J., Draper, B., & Lui, Y. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"1559_CR5","unstructured":"Dai, L., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of advances in neural information processing systems (pp. 379\u2013387)."},{"key":"1559_CR6","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.733"},{"key":"1559_CR7","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F., & Felsberg, M. (2019). Accurate tracking by overlap maximization. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00479"},{"key":"1559_CR8","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., & Khan, F. (2014). Accurate scale estimation for robust visual tracking. In Proceedings of British machine vision conference.","DOI":"10.5244\/C.28.65"},{"key":"1559_CR9","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Fahad K., & Felsberg, M. (2016a). Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence.","DOI":"10.1109\/TPAMI.2016.2609928"},{"key":"1559_CR10","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Fahad, K., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/ICCV.2015.490"},{"key":"1559_CR11","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F., Felsberg, M. (2016b). Beyond correlation filters:learning continuous convolution operators for visual tracking. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-319-46454-1_29"},{"key":"1559_CR12","doi-asserted-by":"crossref","unstructured":"Dong, X., & Shen, J. (2018). Triplet loss in siamese network for object tracking. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-030-01261-8_28"},{"key":"1559_CR13","doi-asserted-by":"crossref","unstructured":"Dong, X., Shen, J., Shao, L., & Porikli, F. (2020). Clnet: A compact latent network for fast adjusting siamese trackers. In Proceedings of the European conference on computer vision (pp. 378\u2013395).","DOI":"10.1007\/978-3-030-58565-5_23"},{"key":"1559_CR14","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 6568\u20136577).","DOI":"10.1109\/ICCV.2019.00667"},{"key":"1559_CR15","doi-asserted-by":"crossref","unstructured":"Fan, H., & Ling, H. (2019). Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00814"},{"key":"1559_CR16","doi-asserted-by":"crossref","unstructured":"Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., & Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00552"},{"key":"1559_CR17","doi-asserted-by":"crossref","unstructured":"Gao, J., Zhang, T., & Xu, C. (2019). Graph convolutional tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00478"},{"key":"1559_CR18","doi-asserted-by":"crossref","unstructured":"Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., & Wang, S. (2017). Learning dynamic siamese network for visual object tracking. In Proceedings of IEEE international conference on computer vision.","DOI":"10.1109\/ICCV.2017.196"},{"key":"1559_CR19","doi-asserted-by":"crossref","unstructured":"Guo, D., Wang, J., Cui, Y., Wang, ZH., & Chen, S. (2020). Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR42600.2020.00630"},{"key":"1559_CR20","doi-asserted-by":"crossref","unstructured":"He, A., Luo, C., Tian, X., & Zeng, W. (2018). A twofold siamese network for real-time object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2018.00508"},{"key":"1559_CR21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2016.90"},{"key":"1559_CR22","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., & Savarese, S. (2016). Learning to track at 100 fps with deep regression networks. In Proceedings of European conference on computer vision (pp. 749\u2013765). Springer.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"1559_CR23","doi-asserted-by":"crossref","unstructured":"Henriques, J., Caseiro, R., Pedro, M., & Jorge, B. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence.","DOI":"10.1109\/TPAMI.2014.2345390"},{"key":"1559_CR24","unstructured":"Horst, P., Thomas, M., & Horst, B. (2015). In defense of color-based model-free tracking. In Proceedings of IEEE conference on computer vision and pattern recognition."},{"key":"1559_CR25","unstructured":"Huan, L., Yang, Y., Deng, Y., & Yu, Y. (2015). Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874."},{"key":"1559_CR26","doi-asserted-by":"crossref","unstructured":"Huang, D., Luo, L., Chen, Z., Wen, M., & Zhang, C. (2016). Applying detection proposals to visual tracking for scale and aspect ratio adaptability. International Journal of Computer Vision, 524\u2013541.","DOI":"10.1007\/s11263-016-0974-6"},{"key":"1559_CR27","unstructured":"Huang, L., Zhao, X., & Huang, K. (2018). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence."},{"key":"1559_CR28","unstructured":"Kiani, G., Hamed, Ashton, F., & Simon, L. (2017). Learning background-aware correlation filters for visual tracking. In Proceedings of European conference on computer vision."},{"key":"1559_CR29","doi-asserted-by":"crossref","unstructured":"Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In Proceedings of European conference on computer vision (pp. 6568\u20136577).","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"1559_CR30","unstructured":"Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of European conference on computer vision."},{"key":"1559_CR31","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"1559_CR32","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"1559_CR33","doi-asserted-by":"crossref","unstructured":"Li, F., Yao, Y., Li, P., Zhang, D., Zuo, W., & Yang, M. (2017). Integrating boundary and center correlation filters for visual tracking with aspect ratio variation. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/ICCVW.2017.234"},{"key":"1559_CR34","doi-asserted-by":"crossref","unstructured":"Lianghua\u00a0Huang, K. H., Zhao, X. (2020). Globaltrack: A simple and strong baseline for long-term tracking.","DOI":"10.1609\/aaai.v34i07.6758"},{"key":"1559_CR35","doi-asserted-by":"crossref","unstructured":"Lin, T., Michael, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., & Zitnick, C. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1559_CR36","doi-asserted-by":"crossref","unstructured":"Liu, T., Wang, G., Yang, Q., & Wang, L. (2016). Part-based tracking via discriminative correlation filters. IEEE Transactions on Circuits and Systems for Video Technology.","DOI":"10.1109\/TCSVT.2016.2637798"},{"key":"1559_CR37","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3431\u20133440).","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1559_CR38","unstructured":"Luca, B., Jack, V., Stuart, G., Ondrej, M., & HS, TP. (2016). Staple: Complementary learners for real-time tracking. In Proceedings of IEEE conference on computer vision and pattern recognition."},{"key":"1559_CR39","doi-asserted-by":"crossref","unstructured":"Lukezic, A., Matas, J., & Kristan, M. (2020). D3s\u2014A discriminative single shot segmentation tracker. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR42600.2020.00716"},{"key":"1559_CR40","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1007\/s11263-017-1061-3","volume":"126","author":"A Luke\u017ei\u010d","year":"2018","unstructured":"Luke\u017ei\u010d, A., Voj\u00ed\u0159, T., \u010cehovin Zajc, L., Matas, J., & Kristan, M. (2018). Discriminative correlation filter tracker with channel and spatial reliability. International Journal of Computer Vision, 126, 671\u2013688.","journal-title":"International Journal of Computer Vision"},{"key":"1559_CR41","doi-asserted-by":"crossref","unstructured":"Ma, C., Huang, J., Yang, X., & Yang, M. (2018a). Robust visual tracking via hierarchical convolutional features. IEEE Transactions on Pattern Analysis and Machine Intelligence.,42, 2709\u20132723.","DOI":"10.1109\/TPAMI.2018.2865311"},{"key":"1559_CR42","doi-asserted-by":"crossref","unstructured":"Ma, C., Huang, J. B., Yang, X., & Yang, M. H. (2018b). Adaptive correlation filters with long-term and short-term memory for object tracking. International Journal of Computer Vision,126, 771\u2013796.","DOI":"10.1007\/s11263-018-1076-4"},{"key":"1559_CR43","unstructured":"Matej, K., Ales, L., Jiri, M., Michael, F., Roman, P., & Joni-Kristian, K. (2021). The eighth visual object tracking vot2020 challenge results. In Proceedings of the European conference on computer vision (pp. 547\u2013601)."},{"key":"1559_CR44","doi-asserted-by":"crossref","unstructured":"Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision.","DOI":"10.1007\/978-3-030-01246-5_19"},{"key":"1559_CR45","doi-asserted-by":"crossref","unstructured":"Muller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for UAV tracking. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"1559_CR46","doi-asserted-by":"crossref","unstructured":"Nam, H., & Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2016.465"},{"key":"1559_CR47","unstructured":"Nam, H., Baek, M., & Han, B. (2016). Modeling and propagating cnns in a tree structure for visual tracking. arXiv Computer Vision and Pattern Recognition."},{"key":"1559_CR48","unstructured":"Pu, S., Song, Y., & Ma, C. (2018). Deep attentive tracking via reciprocative learning. In Proceedings of advances in neural information processing systems."},{"key":"1559_CR49","doi-asserted-by":"crossref","unstructured":"Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.789"},{"key":"1559_CR50","unstructured":"Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of advances in neural information processing systems."},{"key":"1559_CR51","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"6","author":"S Ren","year":"2017","unstructured":"Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 1137\u20131149.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1559_CR52","doi-asserted-by":"crossref","unstructured":"Ross, D., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77, 125\u2013141.","DOI":"10.1007\/s11263-007-0075-7"},{"key":"1559_CR53","doi-asserted-by":"crossref","unstructured":"Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211\u2013252.","DOI":"10.1007\/s11263-015-0816-y"},{"issue":"4","key":"1559_CR54","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","volume":"39","author":"E Shelhamer","year":"2017","unstructured":"Shelhamer, E., Long, J., & Darrell, T. (2017). Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1559_CR55","doi-asserted-by":"publisher","first-page":"1084","DOI":"10.1007\/s11263-019-01156-6","volume":"127","author":"Y Sui","year":"2019","unstructured":"Sui, Y., Zhang, Z., Wang, G., Tang, Y., & Zhang, L. (2019). Exploiting the anisotropy of correlation filter learning for visual tracking. International Journal of Computer Vision, 127, 1084\u20131105.","journal-title":"International Journal of Computer Vision"},{"key":"1559_CR56","doi-asserted-by":"crossref","unstructured":"Tian, Z., He, T., Shen, C., & Yan, Y. (2019a). Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3126\u20133135","DOI":"10.1109\/CVPR.2019.00324"},{"key":"1559_CR57","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., & He, T. (2019b). Fcos: Fully convolutional one-stage object detection. In Proceedings of IEEE international conference on computer vision.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"1559_CR58","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., & Torr, P. (2017). End-to-end representation learning for correlation filter based tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.531"},{"key":"1559_CR59","doi-asserted-by":"crossref","unstructured":"Wang, G., Luo, C., Xiong, Z., & Zeng, W. (2019). Spm-tracker: Series-parallel matching for real-time visual object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2019.00376"},{"key":"1559_CR60","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhu, T., Xing, J., Gao, J., Hu, W., & Maybank. S. (2018). Learning attentions: Residual attentional siamese network for high performance online visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4854\u20134863).","DOI":"10.1109\/CVPR.2018.00510"},{"issue":"9","key":"1559_CR61","doi-asserted-by":"publisher","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","volume":"37","author":"Y Wu","year":"2015","unstructured":"Wu, Y., Lim, J., & Yang, M. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1559_CR62","doi-asserted-by":"crossref","unstructured":"Yang, T., Xu, P., Hu, R., Chai, H., & Chan, A. B. (2020). Roam: Recurrently optimizing tracking model. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR42600.2020.00675"},{"key":"1559_CR63","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., & Huang, T. (2016). Unitbox: An advanced object detection network. In ACM international conference on multimedia.","DOI":"10.1145\/2964284.2967274"},{"key":"1559_CR64","doi-asserted-by":"crossref","unstructured":"Zhang, L., Jagannadan, V., Ponnuthurai, N., Narendra, A., & Pierre, M. (2017a). Robust visual tracking using oblique random forests. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.617"},{"key":"1559_CR65","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ma, S., & Stan, S. (2014). Meem: Robust tracking via multiple experts using entropy minimization. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-319-10599-4_13"},{"key":"1559_CR66","doi-asserted-by":"crossref","unstructured":"Zhang, T., Xu, C., & Yang, M. (2017b). Multi-task correlation particle filter for robust object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.512"},{"key":"1559_CR67","doi-asserted-by":"crossref","unstructured":"Zhou, X., Koltun, V., & Kr\u00e4henb\u00fchl, P. (2020). Tracking objects as points. In Proceedings of European conference on computer vision (pp. 474\u2013490).","DOI":"10.1007\/978-3-030-58548-8_28"},{"key":"1559_CR68","unstructured":"Zhou, X., Wang, D., & Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850."},{"key":"1559_CR69","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In Proceedings of European conference on computer vision.","DOI":"10.1007\/978-3-030-01240-3_7"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-021-01559-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-021-01559-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-021-01559-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,16]],"date-time":"2022-02-16T10:25:32Z","timestamp":1645007132000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-021-01559-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,6]]},"references-count":69,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["1559"],"URL":"https:\/\/doi.org\/10.1007\/s11263-021-01559-4","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,6]]},"assertion":[{"value":"11 April 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}