{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T11:17:32Z","timestamp":1768475852175,"version":"3.49.0"},"reference-count":50,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2019,9,12]],"date-time":"2019-09-12T00:00:00Z","timestamp":1568246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we propose a fast and accurate deep network-based object tracking method, which combines feature representation, template tracking and foreground detection into a single framework for robust tracking. The proposed framework consists of a backbone network, which feeds into two parallel networks, TmpNet for template tracking and FgNet for foreground detection. The backbone network is a pre-trained modified VGG network, in which a few parameters need to be fine-tuned for adapting to the tracked object. FgNet is a fully convolutional network to distinguish the foreground from background in a pixel-to-pixel manner. The parameter in TmpNet is the learned channel-wise target template, which initializes in the first frame and performs fast template tracking in the test frames. To enable each component to work closely with each other, we use a multi-task loss to end-to-end train the proposed framework. In online tracking, we combine the score maps from TmpNet and FgNet to find the optimal tracking results. Experimental results on object tracking benchmarks demonstrate that our approach achieves favorable tracking accuracy against the state-of-the-art trackers while running at a real-time speed of 38 fps.<\/jats:p>","DOI":"10.3390\/s19183945","type":"journal-article","created":{"date-parts":[[2019,9,12]],"date-time":"2019-09-12T10:56:06Z","timestamp":1568285766000},"page":"3945","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Real-Time Object Tracking with Template Tracking and Foreground Detection Network"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3591-9247","authenticated-orcid":false,"given":"Kaiheng","family":"Dai","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7046-7587","authenticated-orcid":false,"given":"Yuehuan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"},{"name":"National Key Lab of Science and Technology on Multi-spectral Information Processing, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiong","family":"Song","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,9,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (2016). Fully-Convolutional Siamese Networks for Object Tracking. ECCV Workshops, Springer.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yang, T., and Chan, A.B. (2018, January 8\u201314). Learning Dynamic Memory Networks for Object Tracking. Proceedings of the ECCV, Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_10"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., and Wang, S. (2017, January 22\u201329). Learning Dynamic Siamese Network for Visual Object Tracking. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.196"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201322). High Performance Visual Tracking with Siamese Region Proposal Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W.H., and Yang, M.H. (2017, January 22\u201329). CREST: Convolutional Residual Learning for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.279"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., and Choi, J.Y. (2018, January 18\u201322). Context-Aware Deep Feature Compression for High-Speed Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00057"},{"key":"ref_7","unstructured":"Nam, H., and Han, B. (July, January 26). Learning Multi-domain Convolutional Neural Networks for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, L., Ouyang, W., Wang, X., and Lu, H. (2015, January 7\u201313). Visual Tracking with Fully Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.357"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ma, C., Huang, J.B., Yang, X., and Yang, M.H. (2015, January 7\u201313). Hierarchical Convolutional Features for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.352"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F.S., and Felsberg, M. (2016, January 11\u201314). Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. Proceedings of the ECCV, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46454-1_29"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2017, January 21\u201326). ECO: Efficient Convolution Operators for Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_12","unstructured":"Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., and Yang, M.H. (July, January 26). Hedged Deep Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Henriques, J.F., Caseiro, R., Martins, P., and Batista, J. (2012, January 7\u201313). Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. Proceedings of the ECCV, Florence, Italy.","DOI":"10.1007\/978-3-642-33765-9_50"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Galoogahi, H.K., Fagg, A., and Lucey, S. (2017, January 22\u201329). Learning Background-Aware Correlation Filters for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.129"},{"key":"ref_16","unstructured":"Wang, Q., Gao, J., Xing, J., Zhang, M., and Hu, W. (2017). DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (2017, January 21\u201326). End-to-End Representation Learning for Correlation Filter Based Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.531"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, A., Luo, C., Tian, X., and Zeng, W. (2018, January 18\u201322). A Twofold Siamese Network for Real-Time Object Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00508"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., and Lu, H. (2018, January 8\u201314). Structured Siamese Network for Real-Time Visual Tracking. Proceedings of the ECCV, Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_22"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dai, K., Wang, Y., Yan, X., and Huo, Y. (2018, January 7\u201310). Fusion of Template Matching and Foreground Detection for Robust Visual Tracking. Proceedings of the IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451332"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wu, Y., Lim, J., and Yang, M.H. (2013, January 23\u201328). Online Object Tracking: A Benchmark. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.312"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","article-title":"Object Tracking Benchmark","volume":"37","author":"Wu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5630","DOI":"10.1109\/TIP.2015.2482905","article-title":"Encoding color information for visual tracking: Algorithms and benchmark","volume":"24","author":"Liang","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016, January 11\u201314). A Benchmark and Simulator for UAV Tracking. Proceedings of the ECCV, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1109\/TPAMI.2016.2609928","article-title":"Discriminative Scale Space Tracking","volume":"39","author":"Danelljan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ma, C., Yang, X., Zhang, C., and Yang, M.H. (2015, January 7\u201312). Long-term correlation tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299177"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, F., Tian, C., Zuo, W., Zhang, L., and Yang, M.H. (2018, January 18\u201322). Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00515"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Choi, J., Chang, H.J., Yun, S., Fischer, T., Demiris, Y., and Choi, J.Y. (2017, January 21\u201326). Attentional Correlation Filter Network for Adaptive Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.513"},{"key":"ref_29","unstructured":"Choi, J., Chang, H.J., Jeong, J., Demiris, Y., and Choi, J.Y. (July, January 26). Visual Tracking Using Attention-Modulated Disintegration and Integration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, Y., Sui, X., Kuang, X., Liu, C., Gu, G., and Chen, Q. (2019). Object Tracking Based on Vector Convolutional Network and Discriminant Correlation Filters. Sensors, 19.","DOI":"10.3390\/s19081818"},{"key":"ref_31","unstructured":"Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P.H.S. (July, January 26). Staple: Complementary Learners for Real-Time Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., and Savarese, S. (2016, January 11\u201314). Learning to Track at 100 FPS with Deep Regression Networks. Proceedings of the ECCV, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Huang, C., Lucey, S., and Ramanan, D. (2017, January 22\u201329). Learning Policies for Adaptive Tracking with Deep Feature Cascades. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.21"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chang, S., Li, W., Zhang, Y., and Feng, Z. (2019). Online Siamese Network for Visual Object Tracking. Sensors, 19.","DOI":"10.3390\/s19081858"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhou, L., and Zhang, J. (2019). Combined Kalman Filter and Multifeature Fusion Siamese Network for Real-Time Visual Tracking. Sensors, 19.","DOI":"10.3390\/s19092201"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., and Maybank, S.J. (2018, January 18\u201322). Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00510"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B.Q., Wu, W., Yan, J., and Hu, W. (2018, January 8\u201314). Distractor-Aware Siamese Networks for Visual Object Tracking. Proceedings of the ECCV, Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Huang, G., Zou, W., Du, D., and Huang, C. (2017, January 22\u201329). UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.231"},{"key":"ref_39","unstructured":"Shelhamer, E., Long, J., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.patrec.2016.09.014","article-title":"Interactive deep learning method for segmenting moving objects","volume":"96","author":"Wang","year":"2017","journal-title":"Pattern Recognit. Lett."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/j.patrec.2018.08.002","article-title":"Foreground segmentation using convolutional neural networks for multiscale feature encoding","volume":"112","author":"Lim","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_42","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fan, H., and Ling, H. (2017, January 21\u201326). SANet: Structure-Aware Network for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.275"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W.H., and Yang, M.H. (2018, January 18\u201322). VITAL: VIsual Tracking via Adversarial Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00937"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sun, C., Lu, H., and Yang, M.H. (2018, January 18\u201322). Learning Spatial-Aware Regressions for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00934"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"4130","DOI":"10.1109\/TIP.2019.2904789","article-title":"Parallel Tracking and Verifying","volume":"28","author":"Fan","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_47","first-page":"263","article-title":"Struck: Structured Output Tracking with Kernels","volume":"38","author":"Hare","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1409","DOI":"10.1109\/TPAMI.2011.239","article-title":"Tracking-Learning-Detection","volume":"34","author":"Kalal","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F.S., and Felsberg, M. (2015, January 7\u201313). Convolutional Features for Correlation Filter Based Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.84"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F.S., and Felsberg, M. (2015, January 7\u201313). Learning Spatially Regularized Correlation Filters for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.490"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/18\/3945\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:19:30Z","timestamp":1760188770000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/18\/3945"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,12]]},"references-count":50,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2019,9]]}},"alternative-id":["s19183945"],"URL":"https:\/\/doi.org\/10.3390\/s19183945","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,12]]}}}