{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:11:49Z","timestamp":1777655509597,"version":"3.51.4"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,12,23]],"date-time":"2024-12-23T00:00:00Z","timestamp":1734912000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62433003"],"award-info":[{"award-number":["62433003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62476017"],"award-info":[{"award-number":["62476017"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,1,31]]},"abstract":"<jats:p>Multiple object tracking (MOT) has emerged as a crucial component of the rapidly developing computer vision. However, existing multi-object tracking methods often overlook the relationship between features and motion, hindering the ability to strike a performance balance between coupled motion and complex scenes. In this work, we propose a novel end-to-end multi-object tracking method that integrates motion and feature information. To achieve this, we introduce a motion prior generator that transforms motion information into attention masks. Additionally, we leverage prior-posterior fusion multi-head attention to combine the motion-derived priors and attention-based posteriors. Our proposed method is extensively evaluated on MOT17 and DanceTrack datasets through comprehensive experiments and ablation studies, demonstrating state-of-the-art performance in the feature-based method with reasonable speed.<\/jats:p>","DOI":"10.1145\/3700443","type":"journal-article","created":{"date-parts":[[2024,10,14]],"date-time":"2024-10-14T12:27:19Z","timestamp":1728908839000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["P2FTrack: Multi-Object Tracking with Motion Prior and Feature Posterior"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1282-3755","authenticated-orcid":false,"given":"Hong","family":"Zhang","sequence":"first","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8934-5198","authenticated-orcid":false,"given":"Jiaxu","family":"Wan","sequence":"additional","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3516-0111","authenticated-orcid":false,"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8107-7218","authenticated-orcid":false,"given":"Ding","family":"Yuan","sequence":"additional","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6979-1822","authenticated-orcid":false,"given":"Xuliang","family":"Li","sequence":"additional","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4237-5874","authenticated-orcid":false,"given":"Yifan","family":"Yang","sequence":"additional","affiliation":[{"name":"BeiHang University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,23]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Nir Aharon Roy Orfaig and Ben-Zion Bobrovsky. 2022. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv:2206.14651. Retrieved from https:\/\/arxiv.org\/abs\/2206.14651"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00103"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00934"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00908"},{"key":"e_1_3_1_7_2","unstructured":"Zheng Ge Songtao Liu Feng Wang Zeming Li and Jian Sun. 2021. Yolox: Exceeding yolo series in 2021. arXiv:2107.08430. Retrieved from https:\/\/arxiv.org\/abs\/2107.08430"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3565266"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01712"},{"key":"e_1_3_1_11_2","first-page":"1","article-title":"PANet: An end-to-end network based on relative motion for online multi-object tracking","author":"Li Rui","year":"2023","unstructured":"Rui Li, Baopeng Zhang, Wei Liu, Zhu Teng, and Jianping Fan. 2023. PANet: An end-to-end network based on relative motion for online multi-object tracking. ACM Transactions on Multimedia Computing, Communications and Applications (2023), 1\u201321.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3140929"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00914"},{"key":"e_1_3_1_15_2","unstructured":"Michel Meneses Leonardo Matos Bruno Prado Andr\u00e9 de Carvalho and Hendrik Macedo. 2020. Learning to associate detections for real-time multiple object tracking. arXiv:2007.06041. Retrieved from https:\/\/arxiv.org\/abs\/2007.06041"},{"key":"e_1_3_1_16_2","unstructured":"Anton Milan Laura Leal-Taix\u00e9 Ian Reid Stefan Roth and Konrad Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831. Retrieved from https:\/\/arxiv.org\/abs\/1603.00831"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2023.118967"},{"key":"e_1_3_1_18_2","first-page":"91","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 91\u201399.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00075"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01219"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.02032"},{"key":"e_1_3_1_22_2","unstructured":"Peize Sun Yi Jiang Rufeng Zhang Enze Xie Jinkun Cao Xinting Hu Tao Kong Zehuan Yuan Changhu Wang and Ping Luo. 2020. Transtrack: Multiple-object tracking with transformer. arXiv:2012.15460. Retrieved from https:\/\/arxiv.org\/abs\/2012.15460"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01068"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533253"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00387"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2017.8296962"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01217"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.02.072"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19803-8_43"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121577"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.3390\/app13148010"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_20"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3635155"},{"issue":"3","key":"e_1_3_1_34_2","first-page":"1224","article-title":"Aligned spatial-temporal memory network for thermal infrared target tracking","volume":"70","author":"Yuan Di","year":"2022","unstructured":"Di Yuan, Xiu Shu, Qiao Liu, and Zhenyu He. 2022. Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Transactions on Circuits and Systems II: Express Briefs 70, 3 (2022), 1224\u20131228.","journal-title":"IEEE Transactions on Circuits and Systems II: Express Briefs"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19812-0_38"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2024.102455"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2022.3215920"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_1"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01513-4"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02112"},{"key":"e_1_3_1_41_2","first-page":"759","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV \u201920)","author":"Zheng Linyu","year":"2020","unstructured":"Linyu Zheng, Ming Tang, Yingying Chen, Jinqiao Wang, and Hanqing Lu. 2020. Learning feature embeddings for discriminant model based tracking-supplementary material. In Proceedings of the European Conference on Computer Vision (ECCV \u201920) 759\u2013775."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00248"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_28"},{"key":"e_1_3_1_44_2","unstructured":"Xingyi Zhou Dequan Wang and Philipp Kr\u00e4henb\u00fchl. 2019. Objects as points. arXiv:1904.07850. Retrieved from https:\/\/arxiv.org\/abs\/1904.07850"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58545-7_6"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3700443","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3700443","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:38Z","timestamp":1750295858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3700443"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,23]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,31]]}},"alternative-id":["10.1145\/3700443"],"URL":"https:\/\/doi.org\/10.1145\/3700443","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,23]]},"assertion":[{"value":"2024-03-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}