{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:25:04Z","timestamp":1760239504207,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T00:00:00Z","timestamp":1606262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Currently, intelligent security systems are widely deployed in indoor buildings to ensure the safety of people in shopping malls, banks, train stations, and other indoor buildings. Multi-Object Tracking (MOT), as an important component of intelligent security systems, has received much attention from many researchers in recent years. However, existing multi-objective tracking algorithms still suffer from trajectory drift and interruption problems in crowded scenes, which cannot provide valuable data for managers. In order to solve the above problems, this paper proposes a Multi-Object Tracking algorithm for RGB-D images based on Asymmetric Dual Siamese networks (ADSiamMOT-RGBD). This algorithm combines appearance information from RGB images and target contour information from depth images. Furthermore, the attention module is applied to repress the redundant information in the combined features to overcome the trajectory drift problem. We also propose a trajectory analysis module, which analyzes whether the head movement trajectory is correct in combination with time-context information. It reduces the number of human error trajectories. The experimental results show that the proposed method in this paper has better tracking quality on the MICC, EPFL, and UMdatasets than the previous work.<\/jats:p>","DOI":"10.3390\/s20236745","type":"journal-article","created":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T21:55:06Z","timestamp":1606341306000},"page":"6745","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Multi-Object Tracking Algorithm for RGB-D Images Based on Asymmetric Dual Siamese Networks"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3151-5755","authenticated-orcid":false,"given":"Wen-Li","family":"Zhang","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4707-9102","authenticated-orcid":false,"given":"Kun","family":"Yang","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Yi-Tao","family":"Xin","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Ting-Song","family":"Zhao","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Xu, R., Nikouei, S.Y., Chen, Y., Polunchenko, A., Song, S., Deng, C., and Faughnan, T.R. (2018, January 20\u201324). Real-time human objects tracking for smart surveillance at the edge. Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA.","DOI":"10.1109\/ICC.2018.8422970"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yang, C.-J., Chou, T., Chang, F.A., Chang, S.-Y., and Guo, J.-I. (2016, January 25\u201327). A smart surveillance system with multiple people detection, tracking, and behavior analysis. Proceedings of the 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan.","DOI":"10.1109\/VLSI-DAT.2016.7482569"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.engappai.2017.10.001","article-title":"Deep convolutional framework for abnormal behavior detection in a smart surveillance system","volume":"67","author":"Ko","year":"2018","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Shehzed, A., Jalal, A., and Kim, K. (2019, January 27\u201329). Multi-Person Tracking in Smart Surveillance System for Crowd Counting and Normal\/Abnormal Events Detection. Proceedings of the 2019 International Conference on Applied and Engineering Mathematics (ICAEM), Taxila, Pakistan.","DOI":"10.1109\/ICAEM.2019.8853756"},{"key":"ref_5","unstructured":"Milan, A., Leal-Taix\u00e9, L., and Reid, I. (2016). Roth, Stefan and Schindler, Konrad, MOT16: A benchmark for multi-object tracking. arXiv."},{"key":"ref_6","first-page":"1","article-title":"Eye tracking based control system for natural human-computer interaction","volume":"2017","author":"Zhang","year":"2017","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1016\/j.cviu.2016.07.003","article-title":"Online multi-object tracking via robust collaborative model and sample selection","volume":"154","author":"Naiel","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Eiselein, V., Arp, D., P\u00e4tzold, M., and Sikora, T. (2012, January 18\u201321). Real-time multi-human tracking using a probability hypothesis density filter and multiple detectors. Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, Beijing, China.","DOI":"10.1109\/AVSS.2012.59"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016, January 25\u201328). Simple online and realtime tracking. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2008, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Bochinski, E., Eiselein, V., and Sikora, T. (September, January 29). High-speed tracking-by-detection without using image information. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078516"},{"key":"ref_12","first-page":"3269","article-title":"Heterogeneous association graph fusion for target association in multiple object tracking","volume":"11","author":"Sheng","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_13","unstructured":"Comaniciu, D., Ramesh, V., and Meer, P. (2000, January 13\u201315). Real-time tracking of non-rigid objects using mean shift. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2000 (Cat. No. PR00662), Hilton Head Island, SC, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1049\/ip-rsn:19951757","article-title":"Stochastic simulation Bayesian approach to multitarget tracking","volume":"2","author":"Avitzour","year":"1995","journal-title":"JIEE Proc. Radar, Sonar Navig."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1109\/7.570826","article-title":"A hybrid bootstrap filter for target tracking in clutter","volume":"1","author":"Gordon","year":"1997","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Danescu, R., Oniga, F., Nedevschi, S., and Meinecke, M.-M. (2009, January 3\u20135). Tracking multiple objects using particle filters and digital elevation maps. Proceedings of the 2009 IEEE Intelligent Vehicles Symposium, Xi\u2019an, China.","DOI":"10.1109\/IVS.2009.5164258"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yin, J., Wang, W., Meng, Q., Yang, R., and Shen, J. (2020, January 13\u201319). A Unified Object Motion and Affinity Model for Online Multi-Object Tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00680"},{"key":"ref_18","unstructured":"Feng, W., Hu, Z., Wu, W., Yan, J., and Ouyang, W. (2019). Multi-object tracking with multiple cues and switcher-aware classification. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chu, P., and Ling, H. (2019, January 27\u201328). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00627"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., and Yang, M.-H. (2018, January 8\u201314). Online multi-object tracking with dual matching attention networks. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_23"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Rasoulidanesh, M.S., Yadav, S., Herath, S., Vaghei, Y., and Payandeh, S. (2019). Deep Attention Models for Human Tracking Using RGBD. Sensors, 19.","DOI":"10.3390\/s19040750"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Xie, Y., Lu, Y., and Gu, S. (2019, January 13\u201316). RGB-D Object Tracking with Occlusion Detection. Proceedings of the 2019 15th International Conference on Computational Intelligence and Security (CIS), Macao, China.","DOI":"10.1109\/CIS.2019.00011"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gai, W., Qi, M., Ma, M., Wang, L., Yang, C., Liu, J., Bian, Y., De Melo, G., Liu, S., and Meng, X. (2020). Employing Shadows for Multi-Person Tracking Based on a Single RGB-D Camera. Sensors, 4.","DOI":"10.3390\/s20041056"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Fu, K., Fan, D.-P., Ji, G.-P., Zhao, Q., Shen, J., and Zhu, C. (2020). Siamese Network for RGB-D Salient Object Detection and Beyond. arXiv.","DOI":"10.1109\/TPAMI.2021.3073689"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"107630","DOI":"10.1016\/j.patcog.2020.107630","article-title":"Context-aware network for RGB-D salient object detection","volume":"111","author":"Liang","year":"2021","journal-title":"Pattern Recognit."},{"key":"ref_26","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., and Reed, S. (2015, January 7\u201312). Anguelov, Dragomir and Erhan, Dumitru and Vanhoucke, Vincent and Rabinovich, Andrew, Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1007\/s11263-018-01147-z","article-title":"R Rank-1 tensor approximation for high-order association in multi-target tracking","volume":"127","author":"Shi","year":"2019","journal-title":"Int. J. Comput. Vis."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz, K.F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sadeghian, A., Alahi, A., and Savarese, S. (2017, January 22\u201329). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.41"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chrapek, D., Beran, V., and Zemcik, P. (2015, January 26\u201329). Depth-based filtration for tracking boost. Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Catania, Italy.","DOI":"10.1007\/978-3-319-25903-1_19"},{"key":"ref_32","first-page":"1409","article-title":"Tracking-learning-detection","volume":"7","author":"Kalal","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.neucom.2013.10.021","article-title":"Multi-Cue Based Tracking","volume":"131","author":"Wang","year":"2014","journal-title":"Neurocomputing"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, J., Liu, Y., Cui, Y., and Chen, Y.Q. (2013, January 15\u201318). Real-time human detection and tracking in complex environments using single RGBD camera. Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, Australia.","DOI":"10.1109\/ICIP.2013.6738636"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.patrec.2014.09.013","article-title":"Detecting and tracking people in real time with RGB-D camera","volume":"53","author":"Liu","year":"2015","journal-title":"Pattern Recognit. Lett."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, J., Zhang, G., Liu, Y., Tian, L., and Chen, Y.Q. (2015). An ultra-fast human detection method for color-depth camera. J. Vis. Commun. Image Represent., 177\u2013185.","DOI":"10.1016\/j.jvcir.2015.06.014"},{"key":"ref_37","unstructured":"Ma, A.J., Yuen, P.C., and Saria, S. (2015). Deformable distributed multiple detector fusion for multi-person tracking. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Milan, A., Schindler, K., and Roth, S. (2015). Multi-target tracking by discrete-continuous energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 2054\u20132068.","DOI":"10.1109\/TPAMI.2015.2505309"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"2531","DOI":"10.1109\/TMM.2019.2908350","article-title":"Joint Deep and Depth for Object-Level Segmentation and Stereo Tracking in Crowds","volume":"21","author":"Li","year":"2019","journal-title":"IEEE Trans. Multimed."},{"key":"ref_41","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (2016, January 8\u201316). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 16\u201320). Deeper and wider siamese networks for real-time visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.inffus.2018.09.014","article-title":"Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking","volume":"50","author":"Jiang","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and So, K. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian method for the assignment problem","volume":"1\u20132","author":"Kuhn","year":"1955","journal-title":"Nav. Res. Logist. Q."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"He, M., Luo, H., Hui, B., and Chang, Z. (2019). Pedestrian flow tracking and statistics of monocular camera based on convolutional neural network and Kalman filter. Appl. Sci., 8.","DOI":"10.3390\/app9081624"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Bagautdinov, T., Fleuret, F., and Fua, P. (2015, January 7\u201312). Probability occupancy maps for occluded depth images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298900"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1577","DOI":"10.1109\/TPAMI.2012.248","article-title":"A general framework for tracking multiple people from a moving camera","volume":"35","author":"Choi","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","unstructured":"Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., and Leal-Taix\u00e9, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Sun, S., Akhtar, N., Song, H., Mian, A.S., and Shah, M. (2019). Deep affinity network for multiple object tracking. arXiv.","DOI":"10.1109\/TPAMI.2019.2929520"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/23\/6745\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:37:21Z","timestamp":1760179041000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/23\/6745"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,25]]},"references-count":52,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["s20236745"],"URL":"https:\/\/doi.org\/10.3390\/s20236745","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2020,11,25]]}}}