{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T16:48:44Z","timestamp":1765039724047,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,12,6]],"date-time":"2023-12-06T00:00:00Z","timestamp":1701820800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,6]]},"DOI":"10.1145\/3595916.3626385","type":"proceedings-article","created":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T16:34:41Z","timestamp":1704126881000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1906-734X","authenticated-orcid":false,"given":"Jianping","family":"Zhong","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9196-9818","authenticated-orcid":false,"given":"Zhaobo","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0042-7074","authenticated-orcid":false,"given":"Weigang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7542-296X","authenticated-orcid":false,"given":"Qingming","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Zhaowei Cai Mohammad Saberian and Nuno Vasconcelos. 2015. Learning complexity-aware cascades for deep pedestrian detection. In ICCV. 3361\u20133369. Zhaowei Cai Mohammad Saberian and Nuno Vasconcelos. 2015. Learning complexity-aware cascades for deep pedestrian detection. In ICCV. 3361\u20133369.","DOI":"10.1109\/ICCV.2015.384"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446288"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2300479"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"K. Duan S. Bai L. Xie H. Qi Q. Huang and Q. Tian. 2019. CenterNet: Keypoint Triplets for Object Detection. In ICCV. 6568\u20136577. K. Duan S. Bai L. Xie H. Qi Q. Huang and Q. Tian. 2019. CenterNet: Keypoint Triplets for Object Detection. In ICCV. 6568\u20136577.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"e_1_3_2_1_6_1","volume-title":"arXiv preprint arXiv:2204.08394","author":"Duan Kaiwen","year":"2022","unstructured":"Kaiwen Duan , Song Bai , Lingxi Xie , Honggang Qi , Qingming Huang , and Qi Tian. 2022. Center Net++ for object detection. arXiv preprint arXiv:2204.08394 ( 2022 ). Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2022. CenterNet++ for object detection. arXiv preprint arXiv:2204.08394 (2022)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01855"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Ross Girshick Jeff Donahue Trevor Darrell and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580\u2013587. Ross Girshick Jeff Donahue Trevor Darrell and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR. 580\u2013587.","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_1_10_1","volume-title":"Ron: Reverse connection with objectness prior networks for object detection. In CVPR. 5936\u20135944.","author":"Kong Tao","year":"2017","unstructured":"Tao Kong , Fuchun Sun , Anbang Yao , Huaping Liu , Ming Lu , and Yurong Chen . 2017 . Ron: Reverse connection with objectness prior networks for object detection. In CVPR. 5936\u20135944. Tao Kong, Fuchun Sun, Anbang Yao, Huaping Liu, Ming Lu, and Yurong Chen. 2017. Ron: Reverse connection with objectness prior networks for object detection. In CVPR. 5936\u20135944."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446257"},{"key":"e_1_3_2_1_12_1","volume-title":"Cornernet: Detecting objects as paired keypoints. In ECCV. 734\u2013750.","author":"Law Hei","year":"2018","unstructured":"Hei Law and Jia Deng . 2018 . Cornernet: Detecting objects as paired keypoints. In ECCV. 734\u2013750. Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In ECCV. 734\u2013750."},{"key":"e_1_3_2_1_13_1","volume-title":"Third International Workshop on Pattern Recognition, Vol.\u00a010828","author":"Li Suichan","year":"2018","unstructured":"Suichan Li and Feng Chen . 2018 . 3D-DETNet: a single stage video-based vehicle detector . In Third International Workshop on Pattern Recognition, Vol.\u00a010828 . International Society for Optics and Photonics, 108280A. Suichan Li and Feng Chen. 2018. 3D-DETNet: a single stage video-based vehicle detector. In Third International Workshop on Pattern Recognition, Vol.\u00a010828. International Society for Optics and Photonics, 108280A."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446273"},{"key":"e_1_3_2_1_15_1","volume-title":"MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images","author":"Liang Yuping","year":"2023","unstructured":"Yuping Liang , Jie Feng , Xiangrong Zhang , Junpeng Zhang , and Licheng Jiao . 2023. MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images . IEEE Transactions on Geoscience and Remote Sensing ( 2023 ). Yuping Liang, Jie Feng, Xiangrong Zhang, Junpeng Zhang, and Licheng Jiao. 2023. MidNet: An Anchor-and-Angle-Free Detector for Oriented Ship Detection in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing (2023)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551626.3564947"},{"key":"e_1_3_2_1_17_1","unstructured":"Tsung-Yi Lin Piotr Doll\u00e1r Ross Girshick Kaiming He Bharath Hariharan and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR. 2117\u20132125. Tsung-Yi Lin Piotr Doll\u00e1r Ross Girshick Kaiming He Bharath Hariharan and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR. 2117\u20132125."},{"key":"e_1_3_2_1_18_1","unstructured":"Tsung-Yi Lin Priya Goyal Ross Girshick Kaiming He and Piotr Doll\u00e1r. 2017. Focal loss for dense object detection. In ICCV. 2980\u20132988. Tsung-Yi Lin Priya Goyal Ross Girshick Kaiming He and Piotr Doll\u00e1r. 2017. Focal loss for dense object detection. In ICCV. 2980\u20132988."},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Learning Representations.","author":"Liu Shilong","year":"2021","unstructured":"Shilong Liu , Feng Li , Hao Zhang , Xiao Yang , Xianbiao Qi , Hang Su , Jun Zhu , and Lei Zhang . 2021 . DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR . In International Conference on Learning Representations. Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2021. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_20_1","volume-title":"Ssd: Single shot multibox detector","author":"Liu Wei","year":"2016","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , and Alexander\u00a0 C Berg . 2016 . Ssd: Single shot multibox detector . In ECCV. Springer , 21\u201337. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander\u00a0C Berg. 2016. Ssd: Single shot multibox detector. In ECCV. Springer, 21\u201337."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Jonathan Long Evan Shelhamer and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431\u20133440. Jonathan Long Evan Shelhamer and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431\u20133440.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00363"},{"volume-title":"Stacked hourglass networks for human pose estimation","author":"Newell Alejandro","key":"e_1_3_2_1_23_1","unstructured":"Alejandro Newell , Kaiyu Yang , and Jia Deng . 2016. Stacked hourglass networks for human pose estimation . In ECCV. Springer , 483\u2013499. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. Springer, 483\u2013499."},{"key":"e_1_3_2_1_24_1","volume-title":"Road user detection in videos. arXiv preprint arXiv:1903.12049","author":"Perreault Hughes","year":"2019","unstructured":"Hughes Perreault , Guillaume-Alexandre Bilodeau , Nicolas Saunier , and Pierre Gravel . 2019. Road user detection in videos. arXiv preprint arXiv:1903.12049 ( 2019 ). Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier, and Pierre Gravel. 2019. Road user detection in videos. arXiv preprint arXiv:1903.12049 (2019)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better faster stronger. In CVPR. 7263\u20137271. Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better faster stronger. In CVPR. 7263\u20137271.","DOI":"10.1109\/CVPR.2017.690"},{"key":"e_1_3_2_1_26_1","volume-title":"Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi . 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 ( 2018 ). Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15888\u201315899","author":"Su Weijie","year":"2023","unstructured":"Weijie Su , Xizhou Zhu , Chenxin Tao , Lewei Lu , Bin Li , Gao Huang , Yu Qiao , Xiaogang Wang , Jie Zhou , and Jifeng Dai . 2023 . Towards all-in-one pre-training via maximizing multi-modal mutual information . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15888\u201315899 . Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, and Jifeng Dai. 2023. Towards all-in-one pre-training via maximizing multi-modal mutual information. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 15888\u201315899."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01422"},{"key":"e_1_3_2_1_30_1","volume-title":"Fcos: Fully convolutional one-stage object detection. In ICCV. 9627\u20139636.","author":"Tian Zhi","year":"2019","unstructured":"Zhi Tian , Chunhua Shen , Hao Chen , and Tong He . 2019 . Fcos: Fully convolutional one-stage object detection. In ICCV. 9627\u20139636. Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In ICCV. 9627\u20139636."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14408\u201314419","author":"Wang Wenhai","year":"2023","unstructured":"Wenhai Wang , Jifeng Dai , Zhe Chen , Zhenhang Huang , Zhiqi Li , Xizhou Zhu , Xiaowei Hu , Tong Lu , Lewei Lu , Hongsheng Li , 2023 . Internimage: Exploring large-scale vision foundation models with deformable convolutions . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14408\u201314419 . Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, 2023. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14408\u201314419."},{"key":"e_1_3_2_1_32_1","volume-title":"European Conference on Computer Vision. Springer, 18\u201334","author":"Wei Haoran","year":"2022","unstructured":"Haoran Wei , Xin Chen , Lingxi Xie , and Qi Tian . 2022 . Cornerformer: Purifying instances for corner-based detectors . In European Conference on Computer Vision. Springer, 18\u201334 . Haoran Wei, Xin Chen, Lingxi Xie, and Qi Tian. 2022. Cornerformer: Purifying instances for corner-based detectors. In European Conference on Computer Vision. Springer, 18\u201334."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2020.102907"},{"key":"e_1_3_2_1_34_1","volume-title":"Reppoints: Point set representation for object detection. In ICCV. 9657\u20139666.","author":"Yang Ze","year":"2019","unstructured":"Ze Yang , Shaohui Liu , Han Hu , Liwei Wang , and Stephen Lin . 2019 . Reppoints: Point set representation for object detection. In ICCV. 9657\u20139666. Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. 2019. Reppoints: Point set representation for object detection. In ICCV. 9657\u20139666."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3444685.3446263"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01266-1"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00978"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Shifeng Zhang Longyin Wen Xiao Bian Zhen Lei and Stan\u00a0Z Li. 2018. Single-shot refinement neural network for object detection. In CVPR. 4203\u20134212. Shifeng Zhang Longyin Wen Xiao Bian Zhen Lei and Stan\u00a0Z Li. 2018. Single-shot refinement neural network for object detection. In CVPR. 4203\u20134212.","DOI":"10.1109\/CVPR.2018.00442"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Xingyi Zhou Jiacheng Zhuo and Philipp Krahenbuhl. 2019. Bottom-up object detection by grouping extreme and center points. In CVPR. 850\u2013859. Xingyi Zhou Jiacheng Zhuo and Philipp Krahenbuhl. 2019. Bottom-up object detection by grouping extreme and center points. In CVPR. 850\u2013859.","DOI":"10.1109\/CVPR.2019.00094"},{"key":"e_1_3_2_1_40_1","volume-title":"Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.","author":"Zhu Xizhou","year":"2020","unstructured":"Xizhou Zhu , Weijie Su , Lewei Lu , Bin Li , Xiaogang Wang , and Jifeng Dai . 2020 . Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations."}],"event":{"name":"MMAsia '23: ACM Multimedia Asia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Tainan Taiwan","acronym":"MMAsia '23"},"container-title":["ACM Multimedia Asia 2023"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626385","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3595916.3626385","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:48:40Z","timestamp":1750286920000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626385"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,6]]},"references-count":40,"alternative-id":["10.1145\/3595916.3626385","10.1145\/3595916"],"URL":"https:\/\/doi.org\/10.1145\/3595916.3626385","relation":{},"subject":[],"published":{"date-parts":[[2023,12,6]]},"assertion":[{"value":"2024-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}