{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T06:22:30Z","timestamp":1763706150325,"version":"build-2065373602"},"reference-count":62,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2021,9,23]],"date-time":"2021-09-23T00:00:00Z","timestamp":1632355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Multi-object tracking is a significant field in computer vision since it provides essential information for video surveillance and analysis. Several different deep learning-based approaches have been developed to improve the performance of multi-object tracking by applying the most accurate and efficient combinations of object detection models and appearance embedding extraction models. However, two-stage methods show a low inference speed since the embedding extraction can only be performed at the end of the object detection. To alleviate this problem, single-shot methods, which simultaneously perform object detection and embedding extraction, have been developed and have drastically improved the inference speed. However, there is a trade-off between accuracy and efficiency. Therefore, this study proposes an enhanced single-shot multi-object tracking system that displays improved accuracy while maintaining a high inference speed. With a strong feature extraction and fusion, the object detection of our model achieves an AP score of 69.93% on the UA-DETRAC dataset and outperforms previous state-of-the-art methods, such as FairMOT and JDE. Based on the improved object detection performance, our multi-object tracking system achieves a MOTA score of 68.5% and a PR-MOTA score of 24.5% on the same dataset, also surpassing the previous state-of-the-art trackers.<\/jats:p>","DOI":"10.3390\/s21196358","type":"journal-article","created":{"date-parts":[[2021,9,27]],"date-time":"2021-09-27T22:16:38Z","timestamp":1632780998000},"page":"6358","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Efficient Single-Shot Multi-Object Tracking for Vehicles in Traffic Scenarios"],"prefix":"10.3390","volume":"21","author":[{"given":"Youngkeun","family":"Lee","sequence":"first","affiliation":[{"name":"Department of Electronic Engineering, Kwangwoon University, Seoul 01897, Korea"}]},{"given":"Sang-ha","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Kwangwoon University, Seoul 01897, Korea"}]},{"given":"Jisang","family":"Yoo","sequence":"additional","affiliation":[{"name":"Department of Electronic Engineering, Kwangwoon University, Seoul 01897, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6595-6415","authenticated-orcid":false,"given":"Soonchul","family":"Kwon","sequence":"additional","affiliation":[{"name":"Graduate School of Smart Convergence, Kwangwoon University, Seoul 01897, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B. (2016, January 25\u201328). Simple online and realtime tracking. Proceedings of the IEEE International Conference on Image Processing, Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., and Yan, J. (2016, January 8\u201316). POI: Multiple object tracking with high performance detection and appearance feature. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_3"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wojke, N., Bewley, A., and Paulus, D. (2017, January 17\u201320). Simple online and realtime tracking with a deep association metric. Proceedings of the IEEE International Conference on Image Processing, Beijing, China.","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, Z., Zheng, L., Liu, Y., Li, Y., and Wang, S. (2019). Towards real-time multi-object tracking. arXiv.","DOI":"10.1007\/978-3-030-58621-8_7"},{"key":"ref_5","unstructured":"Zhang, Y., Wang, C., Wang, X., Zeng, W., and Liu, W. (2020). FairMOT: On the fairness of detection and re-identification in multiple object tracking. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A., and Leibe, B. (2019, January 15\u201320). MOTS: Multi-object tracking and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00813"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sun, S., Akhtar, N., Song, X., Song, H., Mian, A., and Shah, M. (2020). Simultaneous detection and tracking with motion modelling for multiple object tracking. arXiv.","DOI":"10.1007\/978-3-030-58586-0_37"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Peri, N., Khorramshahi, P., Rambhatla, S.S., Shenoy, V., Rawat, S., Chen, J.C., and Chellappa, R. (2020, January 13\u201319). Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00319"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chen, L., Ai, H., Zhuang, Z., and Shang, C. (2018, January 23\u201327). Real-time multiple people tracking with deeply learned candidate selection and person re-identification. Proceedings of the IEEE International Conference on Multimedia and Expo, San Diego, CA, USA.","DOI":"10.1109\/ICME.2018.8486597"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhao, D., Fu, H., Xiao, L., Wu, T., and Dai, B. (2018). Multi-object tracking with correlation filter for autonomous vehicle. Sensors, 18.","DOI":"10.3390\/s18072004"},{"key":"ref_11","unstructured":"Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M.C., Qi, H., Lim, J., Yang, M.H., and Lyu, S. (2015). UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_13","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20138). ImageNet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_14","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_23","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_26","unstructured":"Welch, G., and Bishop, G. (1995). An Introduction to the Kalman Filter, Department of Computer Science University of North Carolina."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Nav. Res. Logist. Q."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yu, F., Wang, D., Shelhamer, E., and Darrell, T. (2018, January 18\u201322). Deep layer aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00255"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast feature pyramids for object detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cai, Z., Saberian, M., and Vasconcelos, N. (2015, January 11\u201318). Learning complexity-aware cascades for deep pedestrian detection. Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile.","DOI":"10.1109\/ICCV.2015.384"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pirsiavash, H., Ramanan, D., and Fowlkes, C.C. (2011, January 20\u201325). Globally-optimal greedy algorithms for tracking a variable number of objects. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995604"},{"key":"ref_33","unstructured":"Andriyenko, A., and Schindler, K. (2011, January 20\u201325). Multi-target tracking by continuous energy minimization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Andriyenko, A., Schindler, K., and Roth, S. (2012, January 16\u201321). Discrete-continuous optimization for multi-target tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6247893"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Dicle, C., Sznaier, M., and Camps, O. (2013, January 1\u20138). The way they move: Tracking multiple targets with similar appearance. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.286"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., and Li, S.Z. (2014, January 23\u201328). Multiple target tracking based on undirected hierarchical relation hypergraph. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.167"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Bae, S., and Yoon, K. (2014, January 23\u201328). Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.159"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bochinski, E., Senst, T., and Sikora, T. (2018, January 27\u201330). Extending IOU based multi-object tracking by visual information. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, Auckland, New Zealand.","DOI":"10.1109\/AVSS.2018.8639144"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_40","unstructured":"Milan, A., Leal-Taix\u00e9, L., Reid, I., Roth, S., and Schindler, K. (2016). MOT16: A benchmark for multi-object tracking. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). EfficientDet: Scalable and efficient object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_42","first-page":"6105","article-title":"EfficientNet: Rethinking model scaling for convolutional neural networks","volume":"97","author":"Tan","year":"2019","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., Kaiming, H., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.Y., and Le, Q.V. (2019, January 15\u201320). NAS-FPN: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201322). MobileNetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q.V. (2019, January 15\u201320). MnasNet: Platform-aware neural architecture search for mobile. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00293"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 8\u201316). Stacked hourglass networks for human pose estimation. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). CornerNet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_50","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_53","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_54","unstructured":"Kendall, A., Gal, Y., and Cipolla, R. (2018, January 18\u201322). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"246309","DOI":"10.1155\/2008\/246309","article-title":"Evaluating multiple object tracking performance: The CLEAR MOT metrics","volume":"2008","author":"Bernardin","year":"2008","journal-title":"EURASIP J. Image Video Process."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_57","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The PASCAL visual object classes (VOC) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_59","unstructured":"Chu, P., and Ling, H. (November, January 27). FAMNet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Wang, L., Lu, Y., Wang, H., Zheng, Y., Ye, H., and Xue, X. (2017, January 10\u201314). Evolving boxes for fast vehicle detection. Proceedings of the IEEE International Conference on Multimedia and Expo, Hong Kong.","DOI":"10.1109\/ICME.2017.8019461"},{"key":"ref_61","unstructured":"Bochinski, E., Eiselein, V., and Sikora, T. (September, January 29). High-speed tracking-by-detection without using image information. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, Italy."},{"key":"ref_62","first-page":"104","article-title":"Deep affinity network for multiple object tracking","volume":"43","author":"Sun","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6358\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:04:00Z","timestamp":1760166240000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,23]]},"references-count":62,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21196358"],"URL":"https:\/\/doi.org\/10.3390\/s21196358","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,9,23]]}}}