{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:49:47Z","timestamp":1761598187692,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,3,29]],"date-time":"2020-03-29T00:00:00Z","timestamp":1585440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Based on tracking-by-detection, we propose a hierarchical-matching-based online and real-time multi-object tracking approach with deep appearance features, which can effectively reduce the false positives (FP) in tracking. For the purpose of increasing the accuracy rate of data association, we define the trajectory confidence using its position information, appearance information, and the information of historical relevant detections, after which we can classify the trajectories into different levels. In order to obtain discriminative appearance features, we developed a deep convolutional neural network to extract the appearance features of objects and trained it on a large-scale pedestrian re-identification dataset. Last but not least, we used the proposed diverse and hierarchical matching strategy to associate detection and trajectory sets. Experimental results on the MOT benchmark dataset show that our proposed approach performs well against other online methods, especially for the metrics of FP and frames per second (FPS).<\/jats:p>","DOI":"10.3390\/a13040080","type":"journal-article","created":{"date-parts":[[2020,3,31]],"date-time":"2020-03-31T13:27:19Z","timestamp":1585661239000},"page":"80","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Hierarchical-Matching-Based Online and Real-Time Multi-Object Tracking with Deep Appearance Features"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1207-2410","authenticated-orcid":false,"given":"Qingge","family":"Ji","sequence":"first","affiliation":[{"name":"School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China"},{"name":"Guangdong Key Laboratory of Big Data Analysis and Processing, Guangdong 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9135-627X","authenticated-orcid":false,"given":"Haoqiang","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China"},{"name":"Guangdong Key Laboratory of Big Data Analysis and Processing, Guangdong 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiao","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China"},{"name":"Guangdong Key Laboratory of Big Data Analysis and Processing, Guangdong 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,3,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.patrec.2012.07.005","article-title":"Intelligent multi-camera video surveillance: A review","volume":"34","author":"Wang","year":"2013","journal-title":"Pattern Recognit. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1109\/TITS.2009.2030963","article-title":"Understanding transit scenes: A survey on human behavior-recognition algorithms","volume":"11","author":"Candamo","year":"2010","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_3","unstructured":"Uchiyama, H., and Marchand, E. (2020, March 29). Object Detection and Pose Tracking for Augmented Reality: Recent Approaches. Available online: https:\/\/hal.inria.fr\/hal-00751704\/document."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1016\/j.semcdb.2009.07.004","article-title":"Tracking in cell and developmental biology","volume":"20","author":"Meijering","year":"2009","journal-title":"Semin. Cell Dev. Biol."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., Jitendra, M., Berkeley, U.C., and ICSI (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Ali, F. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Christian, S., Scott, R., Fu, C.-Y., and Alexander, C.B. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_8","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of Oriented Gradients for Human Detection. Proceedings of the Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_9","unstructured":"Sadeghian, A., Alahi, A., and Savarese, S. (, January 22\u201329). Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies. Proceedings of the ICCV, Venice, Italy."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.patcog.2019.04.018","article-title":"Instance-Aware Representation Learning and Association for Online Multi-Person Tracking","volume":"94","author":"Wu","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"38060","DOI":"10.1109\/ACCESS.2020.2975912","article-title":"OneShotDA: Online Multi-Object Tracker with One-Shot-Learning-Based Data Association","volume":"8","author":"Yoon","year":"2020","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Baisa, N.L. (2019, January 2\u20135). Online multi-object visual tracking using a GM-PHD filter with deep appearance learning. Proceedings of the 2019 22nd International Conference on Information Fusion (FUSION), Shaw Center, OT, Canada.","DOI":"10.23919\/FUSION43075.2019.9011441"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ristani, E., Solera, F., Zou, R., Rita, C., and Carlo, T. (2016, January 8\u201316). Performance measures and a data set for multi-target, multi-camera tracking. Proceedings of the European Conference on Computer Vision Springer, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_2"},{"key":"ref_14","unstructured":"Leal-Taix\u00e9, L., Milan, A., Reid, I., Roth, S., and Schindler, K. (2015). MOTChallenge 2015: Towards a Benchmark for Multi-object Tracking. ArXiv, Available online: https:\/\/arxiv.org\/abs\/1504.01942."},{"key":"ref_15","unstructured":"Milan, A., Leal-Taix\u00e9, L., Reid, I., Roth, S., and Schindler, K. (2016). MOT16: A Benchmark for Multi-Object Tracking. ArXiv, Available online: https:\/\/arxiv.org\/pdf\/1603.00831.pdf."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A New Approach to Linear Filtering and Prediction Problems","volume":"82","author":"Kalman","year":"1960","journal-title":"J. Basic Eng."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Varior, R.R., Shuai, B., Lu, J., Xu, D., and Wang, G. (2016). A Siamese Long Short-Term Memory Architecture for Human Re-Identification. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46478-7_9"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, L., Liang, J., Li, H., and Sun, Z. (2018, January 18\u201323). Deep Spatial Feature Reconstruction for Partial Person Re-Identification: Alignment-Free Approach. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00739"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1995","journal-title":"Nav. Res. Logist. Q."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, F., Choi, W., and Lin, Y. (2016, January 27\u201330). Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.234"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained partbased models","volume":"32","author":"Felzenszwalb","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian, Q. (2015, January 7\u201313). Scalable Person Re-identification: A Benchmark. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.133"},{"key":"ref_24","first-page":"1","article-title":"Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics","volume":"1","author":"Bernardin","year":"2008","journal-title":"Image Video Process."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, Y., Huang, C., and Nevatia, R. (2009, January 20\u201326). Learning to associate: Hybrid Boosted multi-object tracker for crowded scene. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206735"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2016, January 19\u201322). Wide residual networks. Proceedings of the BMVC, York, UK.","DOI":"10.5244\/C.30.87"},{"key":"ref_27","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. Arxiv, Available online: https:\/\/arxiv.org\/abs\/1603.04467."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wojke, N., and Bewley, A. (2018, January 12\u201315). Deep Cosine Metric Learning for Person Re-identification. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00087"},{"key":"ref_29","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_30","unstructured":"Zheng, L., Yang, Y., and Hauptmann, A.G. (2016). Person Re-identification: Past, Present and Future. ArXiv, Available online: https:\/\/arxiv.org\/abs\/1610.02984."},{"key":"ref_31","first-page":"886","article-title":"Histograms of oriented gradients for human detection[C]. International Conference on computer vision & Pattern Recognition (CVPR\u201905)","volume":"1","author":"Dalal","year":"2005","journal-title":"IEEE Comput. Soc."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Song, Y., and Jeon, M. (2016, January 26\u201328). Online multiple object tracking with the hierarchically adopted gm-phd filter using motion and appearance. Proceedings of the Consumer Electronics-Asia (ICCE-Asia), Seoul, Korea.","DOI":"10.1109\/ICCE-Asia.2016.7804800"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1049\/iet-cvi.2016.0068","article-title":"Online multi-person tracking with two-stage data association and online appearance model learning","volume":"11","author":"Ju","year":"2017","journal-title":"Iet Comput. Vis."},{"key":"ref_34","unstructured":"Anh, N.T.L., Khan, F.M., Negin, F., and Francois, B. (2017, January 4\u20138). Multi-Object tracking using multi-channel part appearance representation, Advanced Video and Signal Based Surveillance (AVSS). Proceedings of the 2017 14th IEEE International Conference on San Francisco, San Francisco, CA, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Sanchez-Matilla, R., Poiesi, F., and Cavallaro, A. (2016, January 8\u201316). Online multi-target tracking with strong and weak detections. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_7"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Alahi, A., and Savarese, S. (2015, January 11\u201318). Learning to Track: Online Multi-Object Tracking by Decision Making. Proceedings of the International Conference on Computer Vision (ICCV), Araucano Park, Chile.","DOI":"10.1109\/ICCV.2015.534"},{"key":"ref_37","unstructured":"Chul, Y., Song, Y.-M., Yoon, K., and Jeon, M. (2018). Online Multi-Object Tracking Using Selective Deep Appearance Matching, IEEE."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bergmann, P., Meinhardt, T., and Leal-Taixe, L. (2019, January 27\u201328). Tracking without bells and whistles. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00103"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Eiselein, V., Arp, D., Patzold, M., and Sikora, T. (2012). Real-Time Multi-human Tracking Using a Probability Hypothesis Density Filter and Multiple Detectors, IEEE Computer Society.","DOI":"10.1109\/AVSS.2012.59"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kutschbach, T., Bochinski, E., Eiselein, V., and Sikora, T. (2017). Sequential Sensor Fusion Combining Probability Hypothesis Density and Kernelized Correlation Filters for Multi-Object Tracking in Video Data, IEEE.","DOI":"10.1109\/AVSS.2017.8078517"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/j.jvcir.2019.01.026","article-title":"Development of a N-type GM-PHD filter for multiple target, multiple type visual tracking","volume":"59","author":"Baisa","year":"2019","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"8181","DOI":"10.1109\/ACCESS.2018.2889442","article-title":"Multiple Object Tracking via Feature Pyramid Siamese Networks","volume":"7","author":"Lee","year":"2019","journal-title":"IEEE Access"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/4\/80\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:13:08Z","timestamp":1760173988000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/4\/80"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,29]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,4]]}},"alternative-id":["a13040080"],"URL":"https:\/\/doi.org\/10.3390\/a13040080","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2020,3,29]]}}}