{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:49:19Z","timestamp":1760240959929,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2019,11,9]],"date-time":"2019-11-09T00:00:00Z","timestamp":1573257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Korea Government","award":["2014-0-00077","2017-0-00250"],"award-info":[{"award-number":["2014-0-00077","2017-0-00250"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Online training framework based on discriminative correlation filters for visual tracking has recently shown significant improvement in both accuracy and speed. However, correlation filter-base discriminative approaches have a common problem of tracking performance degradation when the local structure of a target is distorted by the boundary effect problem. The shape distortion of the target is mainly caused by the circulant structure in the Fourier domain processing, and it makes the correlation filter learn distorted training samples. In this paper, we present a structure\u2013attention network to preserve the target structure from the structure distortion caused by the boundary effect. More specifically, we adopt a variational auto-encoder as a structure\u2013attention network to make various and representative target structures. We also proposed two denoising criteria using a novel reconstruction loss for variational auto-encoding framework to capture more robust structures even under the boundary condition. Through the proposed structure\u2013attention framework, discriminative correlation filters can learn robust structure information of targets during online training with an enhanced discriminating performance and adaptability. Experimental results on major visual tracking benchmark datasets show that the proposed method produces a better or comparable performance compared with the state-of-the-art tracking methods with a real-time processing speed of more than 80 frames per second.<\/jats:p>","DOI":"10.3390\/s19224904","type":"journal-article","created":{"date-parts":[[2019,11,12]],"date-time":"2019-11-12T04:07:07Z","timestamp":1573531627000},"page":"4904","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Real-Time Visual Tracking with Variational Structure Attention Network"],"prefix":"10.3390","volume":"19","author":[{"given":"Yeongbin","family":"Kim","sequence":"first","affiliation":[{"name":"Department of Image, Chung-Ang University, Seoul 06974, Korea"}]},{"given":"Joongchol","family":"Shin","sequence":"additional","affiliation":[{"name":"Department of Image, Chung-Ang University, Seoul 06974, Korea"}]},{"given":"Hasil","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Image, Chung-Ang University, Seoul 06974, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8593-7155","authenticated-orcid":false,"given":"Joonki","family":"Paik","sequence":"additional","affiliation":[{"name":"Department of Image, Chung-Ang University, Seoul 06974, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-speed tracking with kernelized correlation filters","volume":"37","author":"Henriques","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F., and Felsberg, M. (2014, January 1\u20135). Accurate scale estimation for robust visual tracking. Proceedings of the British Machine Vision Conference, Nottingham, UK.","DOI":"10.5244\/C.28.65"},{"key":"ref_4","unstructured":"Choi, J., Jin Chang, H., Jeong, J., Demiris, Y., and Young Choi, J. (July, January 26). Visual tracking using attention-modulated disintegration and integration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Choi, J., Jin Chang, H., Yun, S., Fischer, T., Demiris, Y., and Young Choi, J. (2017, January 21\u201326). Attentional correlation filter network for adaptive visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.513"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kiani Galoogahi, H., Sim, T., and Lucey, S. (2015, January 7\u201312). Correlation filters with limited boundaries. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299094"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F.S., and Felsberg, M. (2016, January 8\u201316). Beyond correlation filters: Learning continuous convolution operators for visual tracking. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46454-1_29"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ma, C., Huang, J.B., Yang, X., and Yang, M.H. (2015, January 7\u201313). Hierarchical convolutional features for visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.352"},{"key":"ref_11","unstructured":"Qi, Y., Zhang, S., Qin, L., Yao, H., Huang, Q., Lim, J., and Yang, M.H. (July, January 26). Hedged deep tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_12","unstructured":"Nam, H., and Han, B. (July, January 26). Learning multi-domain convolutional neural networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_13","unstructured":"Pu, S., Song, Y., Ma, C., Zhang, H., and Yang, M.H. (2018). Deep attentive tracking via reciprocative learning. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., and Yang, M.H. (2018, January 18\u201323). Vital: Visual tracking via adversarial learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00937"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (2016, January 8\u201316). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., and Torr, P.H. (2017, January 21\u201326). End-to-end representation learning for correlation filter based tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.531"},{"key":"ref_17","unstructured":"Wang, Q., Gao, J., Xing, J., Zhang, M., and Hu, W. (2017). Dcfnet: Discriminant correlation filters network for visual tracking. arXiv."},{"key":"ref_18","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Henriques, J.F., Caseiro, R., Martins, P., and Batista, J. (2012, January 7\u201313). Exploiting the circulant structure of tracking-by-detection with kernels. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33765-9_50"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Shahbaz Khan, F., Felsberg, M., and Van de Weijer, J. (2014, January 23\u201328). Adaptive color attributes for real-time visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.143"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2015, January 7\u201313). Learning spatially regularized correlation filters for visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.490"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3706","DOI":"10.1109\/TCYB.2016.2577718","article-title":"Dynamically modulated mask sparse tracking","volume":"47","author":"Chen","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2015, January 7\u201313). Convolutional features for correlation filter based visual tracking. Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.84"},{"key":"ref_24","unstructured":"Hong, S., You, T., Kwak, S., and Han, B. (2015, January 6\u201311). Online tracking by learning discriminative saliency map with convolutional neural network. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, T., Xu, C., and Yang, M.H. (2017, January 21\u201326). Multi-task correlation particle filter for robust object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.512"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Im, D.I.J., Ahn, S., Memisevic, R., and Bengio, Y. (2017, January 4\u20139). Denoising criterion for variational auto-encoding framework. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10777"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Huang, G., Zou, W., Du, D., and Huang, C. (2017, January 21\u201326). Uct: Learning unified convolutional networks for real-time visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA.","DOI":"10.1109\/ICCVW.2017.231"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wu, Y., Lim, J., and Yang, M.H. (2013, January 23\u201328). Online object tracking: A benchmark. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.312"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","article-title":"Object tracking benchmark","volume":"37","author":"Wu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5630","DOI":"10.1109\/TIP.2015.2482905","article-title":"Encoding color information for visual tracking: Algorithms and benchmark","volume":"24","author":"Liang","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Kim, Y., Shin, J., Park, H., and Paik, J. (2019, November 08). Real-Time Visual Tracking with Variational Structure Attention Network Results Description. Available online: https:\/\/github.com\/0binkim92\/Real-Time-Visual-Tracking.","DOI":"10.3390\/s19224904"},{"key":"ref_32","unstructured":"Griffin, G., Holub, A., and Perona, P. (2007). Caltech-256 Object Category Dataset, California Institute of Technology."},{"key":"ref_33","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (July, January 26). Adaptive decontamination of the training set: A unified formulation for discriminative visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ma, S., and Sclaroff, S. (2014, January 6\u201312). MEEM: Robust tracking via multiple experts using entropy minimization. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10599-4_13"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/TPAMI.2015.2509974","article-title":"Struck: Structured output tracking with kernels","volume":"38","author":"Hare","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Yun, S., Choi, J., Yoo, Y., Yun, K., and Young Choi, J. (2017, January 21\u201326). Action-decision networks for visual tracking with deep reinforcement learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.148"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Choi, J., Jin Chang, H., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., and Young Choi, J. (2018, January 18\u201322). Context-aware deep feature compression for high-speed visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00057"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/22\/4904\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:33:18Z","timestamp":1760189598000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/22\/4904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,9]]},"references-count":37,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["s19224904"],"URL":"https:\/\/doi.org\/10.3390\/s19224904","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,11,9]]}}}