{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T11:41:43Z","timestamp":1777030903448,"version":"3.51.4"},"reference-count":59,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T00:00:00Z","timestamp":1668729600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"GIST-LIG Nex1 collaboration research fund","award":["2014-3-00077-008"],"award-info":[{"award-number":["2014-3-00077-008"]}]},{"name":"Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MSIT)","award":["2014-3-00077-008"],"award-info":[{"award-number":["2014-3-00077-008"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Propagation and association tasks in Multi-Object Tracking (MOT) play a pivotal role in accurately linking the trajectories of moving objects. Recently, modern deep learning models have been addressing these tasks by introducing fragmented solutions for each different problem such as appearance modeling, motion modeling, and object associations. To bring unification in the MOT task, we introduce a pixel-guided approach to efficiently build the joint-detection and tracking framework for multi-object tracking. Specifically, the up-sampled multi-scale features from consecutive frames are queued to detect the object locations by using a transformer\u2013decoder, and per-pixel distributions are utilized to compute the association matrix according to object queries. Additionally, we introduce a long-term appearance association on track features to learn the long-term association of tracks against detections to compute the similarity matrix. Finally, a similarity matrix is jointly integrated with the Byte-Tracker resulting in a state-of-the-art MOT performance. The experiments with the standard MOT15 and MOT17 benchmarks show that our approach achieves significant tracking performance.<\/jats:p>","DOI":"10.3390\/s22228922","type":"journal-article","created":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T06:22:28Z","timestamp":1668752548000},"page":"8922","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Pixel-Guided Association for Multi-Object Tracking"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0768-6423","authenticated-orcid":false,"given":"Abhijeet","family":"Boragule","sequence":"first","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5797-7264","authenticated-orcid":false,"given":"Hyunsung","family":"Jang","sequence":"additional","affiliation":[{"name":"LIG Nex1 Company Ltd., Yongin-si 16911, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Namkoo","family":"Ha","sequence":"additional","affiliation":[{"name":"LIG Nex1 Company Ltd., Yongin-si 16911, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2775-7789","authenticated-orcid":false,"given":"Moongu","family":"Jeon","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sadeghian, A., Alahi, A., and Savarese, S. (2017, January 22\u201329). Tracking the untrackable: Learning to track multiple cues with long-term dependencies. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.41"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Rezatofighi, S.H., Milan, A., Zhang, Z., Shi, Q., Dick, A.R., and Reid, I.D. (2015, January 7\u201313). Joint Probabilistic Data Association Revisited. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.349"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Alahi, A., and Savarese, S. (2015, January 7\u201313). Learning to Track: Online Multi-object Tracking by Decision Making. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.534"},{"key":"ref_4","unstructured":"Daniel, S., and J\u00fcrgen, B. (2021, January 16\u201319). Multi-Pedestrian Tracking with Clusters. Proceedings of the 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Washington, DC, USA."},{"key":"ref_5","unstructured":"Daniel, S., and J\u00fcrgen, B. (2021, January 20\u201325). Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jiangmiao, P., Linlu, Q., Xia, L., Haofeng, C., Qi, L., Trevor, D., and Fisher, Y. (2021, January 20\u201325). Quasi-Dense Similarity Learning for Multiple Object Tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00023"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kim, C., Li, F., and Rehg, J.M. (2018, January 8\u201314). Multi-object Tracking with Neural Gating Using Bilinear LSTM. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_13"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Choi, W. (2015, January 7\u201313). Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.347"},{"key":"ref_9","unstructured":"Xing, J., Ai, H., and Lao, S. (2009, January 20\u201325). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Miami, FL, USA."},{"key":"ref_10","unstructured":"Hornakova, A., Henschel, R., Rosenhahn, B., and Swoboda, P. (2020, January 13\u201318). Lifted Disjoint Paths with Application in Multiple Object Tracking. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_11","unstructured":"Zamir, A.R., Dehghan, A., and Shah, M. (2018, January 8\u201314). GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany."},{"key":"ref_12","unstructured":"Andrea, H., Timo, K., Paul, S., Michal, R., Bodo, R., and Roberto, H. (2021, January 10\u201317). Making Higher Order MOT Scalable: An Efficient Approximate Solver for Lifted Disjoint Paths. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada."},{"key":"ref_13","unstructured":"Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., and Alameda-Pineda, X. (2021). TransCenter: Transformers with Dense Queries for Multiple-Object Tracking. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3069","DOI":"10.1007\/s11263-021-01513-4","article-title":"Fairmot: On the fairness of detection and re-identification in multiple object tracking","volume":"129","author":"Yifu","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_15","unstructured":"Philipp, B., Tim, M., and Laura, L.T. (November, January 27). Tracking Without Bells and Whistles. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_16","unstructured":"Xingyi, Z., Vladlen, K., and Philipp, K. (2020, January 23\u201328). Tracking Objects as Points. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK."},{"key":"ref_17","unstructured":"Nicolai, W., Alex, B., and Dietrich, P. (2017, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China."},{"key":"ref_18","unstructured":"Nicolas, C., Francisco, M., Gabriel, S., Nicolas, U., Alexander, K., and Sergey, Z. (2020, January 23\u201328). End-to-End Object Detection with Transformers. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK."},{"key":"ref_19","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021, January 3\u20137). Deformable {DETR}: Deformable Transformers for End-to-End Object Detection. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_20","unstructured":"Chu, P., Wang, J., You, Q., Ling, H., and Liu, Z. (2021). TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking. arXiv."},{"key":"ref_21","unstructured":"Fangao, Z., Bin, D., Yuang, Z., Tiancai, W., Xiangyu, Z., and Yichen, W. (2022, January 23\u201327). MOTR: End-to-End Multiple-Object Tracking with TRansformer. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel."},{"key":"ref_22","unstructured":"Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., Wang, C., and Luo, P. (2020). TransTrack: Multiple-Object Tracking with Transformer. arXiv."},{"key":"ref_23","unstructured":"Yifu, Z., Peize, S., Yi, J., Dongdong, Y., Fucheng, W., Zehuan, Y., Ping, L., Wenyu, L., and Xinggang, W. (2022, January 23\u201327). ByteTrack: Multi-Object Tracking by Associating Every Detection Box. Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel."},{"key":"ref_24","unstructured":"Boragule, A., and Jeon, M. (September, January 29). Joint Cost Minimization for Multi-object Tracking. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy."},{"key":"ref_25","unstructured":"Zhou, X., Jiang, P., Wei, Z., Dong, H., and Wang, F. (2018, January 3\u20136). Online Multi-Object Tracking with Structural Invariance Constraint. Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Dicle, C., Camps, O.I., and Sznaier, M. (2013, January 1\u20138). The Way They Move: Tracking Multiple Targets with Similar Appearance. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.286"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1049\/iet-ipr.2017.1244","article-title":"Multiple hypothesis tracking algorithm for multi-target multi-camera tracking with disjoint views","volume":"12","author":"Yoon","year":"2018","journal-title":"IET Image Process."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kim, C., Li, F., Ciptadi, A., and Insafutdinov, J.M.R. (2015, January 7\u201313). Multiple Hypothesis Tracking Revisited. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.533"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1016\/j.neucom.2022.01.008","article-title":"Online multi-object tracking with unsupervised re-identification learning and occlusion estimation","volume":"483","author":"Liu","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_30","unstructured":"Bastani, F., He, S., and Madden, S. (2021, January 6\u201314). Self-Supervised Multi-Object Tracking with Cross-input Consistency. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Virtual."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yoon, J.H., Lee, C.R., Yang, M.H., and Yoon, K. (2016, January 27\u201330). Online multi-object tracking via structural constraint event aggregation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.155"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yoon, J.H., Yang, M.H., Lim, J., and Yoon, K.J. (2015, January 6\u20139). Bayesian multi-object tracking using motion context from multiple objects. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.12"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yoon, Y.C., Boragule, A., Song, Y., Yoon, K., and Jeon, M. (2018, January 27\u201330). Online Multi-Object Tracking with Historical Appearance Matching and Scene Adaptive Detection Filtering. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand.","DOI":"10.1109\/AVSS.2018.8639078"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1115\/1.3662552","article-title":"A new approach to linear filtering and prediction problems","volume":"82","author":"Kalman","year":"1960","journal-title":"Trans. ASME\u2013J. Basic Eng."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/TAES.2012.6178069","article-title":"Multi-Sensor Joint Detection and Tracking with the Bernoulli Filter","volume":"48","author":"Vo","year":"2012","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1109\/TPAMI.2017.2691769","article-title":"Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking","volume":"40","author":"Bae","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","unstructured":"Zewen, L., Fan, L., Wenjie, Y., Shouheng, P., and Jun, Z. (2021). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Netw. Learn. Syst., 1\u201321."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"(2020). Deep learning in video multi-object tracking: A survey. Neurocomputing, 381, 61\u201388.","DOI":"10.1016\/j.neucom.2019.11.023"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, Y., Kitani, K., and Weng, X. (June, January 30). Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561110"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Lu, Z., Rathod, V., Ronny, V., and Jonathan, H. (2020, January 13\u201319). RetinaTrack: Online Single Stage Joint Detection and Tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01468"},{"key":"ref_41","unstructured":"Qianyu, Z., Xiangtai, L., Lu, H., Yibo, Y., Guangliang, C., Yunhai, T., Lizhuang, M., and Dacheng, T. (2022). TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Meinhardt, T., Kirillov, A., Leal-Taixe, L., and Feichtenhofer, C. (2022, January 19\u201320). TrackFormer: Multi-Object Tracking with Transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00864"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Wu, Z., Zhuang, Y., Li, B., and Jia, J. (2022). Tracking Objects as Pixel-wise Distributions. arXiv.","DOI":"10.1007\/978-3-031-20047-2_5"},{"key":"ref_44","unstructured":"Kaiming, H., Xiangyu, Z., Shaoqing, R., and Jian, S. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., and Girdhar, R. (2022, January 19\u201320). Masked-attention Mask Transformer for Universal Image Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00135"},{"key":"ref_46","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Attention is All you Need. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_47","unstructured":"Cheng, B., Schwing, A.G., and Kirillov, A. (2021). Per-Pixel Classification is Not All You Need for Semantic Segmentation. arXiv."},{"key":"ref_48","unstructured":"Milan, A., Leal-Taix\u00e9, L., Reid, I.D., Roth, S., and Schindler, K. (2016). MOT16: A Benchmark for Multi-Object Tracking. arXiv."},{"key":"ref_49","unstructured":"Loshchilov, I., and Hutter, F. (2019, January 6\u20139). Decoupled Weight Decay Regularization. Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA."},{"key":"ref_50","unstructured":"Bo, P., Yizhuo, L., Yifan, Z., Muchen, L., and Cewu, L. (2020, January 13\u201319). TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Fang, K., Xiang, Y., Li, X., and Savarese, S. (2018, January 12\u201315). Recurrent Autoregressive Networks for Online Multi-object Tracking. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00057"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1268","DOI":"10.1007\/s10489-021-02457-5","article-title":"Online Multi-Object Tracking Using Multi-Function Integration and Tracking Simulation Training","volume":"52","author":"Jieming","year":"2022","journal-title":"Appl. Intell."},{"key":"ref_53","unstructured":"Ioannis, P., Abhijit, S., and Anuj, K. (2021, January 19\u201322). A Graph Convolutional Neural Network Based Approach for Traffic Monitoring Using Augmented Detections with Optical Flow. Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA."},{"key":"ref_54","unstructured":"Peng, C., Heng, F., Chiu, T., and Haibin, L. (2019, January 7\u201311). Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, HI, USA."},{"key":"ref_55","unstructured":"Yihong, X., Aljosa, O., Yutong, B., Radu, H., Laura, L.T., and Xavier, A.P. (2020, January 14\u201319). How To Train Your Deep Multi-Object Tracker. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_56","unstructured":"Pavel, T., Jie, L., Wolfram, B., and Adrien, G. (2021, January 10\u201317). Learning to Track with Object Permanence. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zheng, Y., Pan, P., and Xu, Y. (2021, January 20\u201325). Multiple Object Tracking With Correlation Learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00387"},{"key":"ref_58","unstructured":"Bing, S., Andrew, B., Xinyu, L., Davide, M., and Joseph, T. (2021, January 20\u201325). SiamMOT: Siamese Multi-Object Tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA."},{"key":"ref_59","unstructured":"Feng, W., Hu, Z., Wu, W., Yan, J., and Ouyang, W. (2019). Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/22\/8922\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:20:56Z","timestamp":1760145656000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/22\/8922"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,18]]},"references-count":59,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["s22228922"],"URL":"https:\/\/doi.org\/10.3390\/s22228922","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,18]]}}}