{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T00:58:06Z","timestamp":1777510686994,"version":"3.51.4"},"reference-count":60,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T00:00:00Z","timestamp":1738800000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Research Project of Education Department of Hubei Province","award":["B2023362"],"award-info":[{"award-number":["B2023362"]}]},{"name":"Science and Technology Research Project of Education Department of Hubei Province","award":["T2023045"],"award-info":[{"award-number":["T2023045"]}]},{"name":"Excellent Young and Middle aged Science and Technology Innovation Team Project for Higher Education Institutions of Hubei Province","award":["B2023362"],"award-info":[{"award-number":["B2023362"]}]},{"name":"Excellent Young and Middle aged Science and Technology Innovation Team Project for Higher Education Institutions of Hubei Province","award":["T2023045"],"award-info":[{"award-number":["T2023045"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Small object detection in aerial imagery remains challenging due to sparse feature representation, limited spatial resolution, and complex background interference. Current deep learning approaches enhance detection performance through multi-scale feature fusion, leveraging convolutional operations to expand the receptive field or self-attention mechanisms for global context modeling. However, these methods primarily rely on spatial-domain features, while self-attention introduces high computational costs, and conventional fusion strategies (e.g., concatenation or addition) often result in weak feature correlation or boundary misalignment. To address these challenges, we propose a unified spatial-frequency modeling and multi-scale alignment fusion framework, termed USF-DETR, for small object detection. The framework comprises three key modules: the Spatial-Frequency Interaction Backbone (SFIB), the Dual Alignment and Balance Fusion FPN (DABF-FPN), and the Efficient Attention-AIFI (EA-AIFI). The SFIB integrates the Scharr operator for spatial edge and detail extraction and FFT\/IFFT for capturing frequency-domain patterns, achieving a balanced fusion of global semantics and local details. The DABF-FPN employs bidirectional geometric alignment and adaptive attention to enhance the significance expression of the target area, suppress background noise, and improve feature asymmetry across scales. The EA-AIFI streamlines the Transformer attention mechanism by removing key-value interactions and encoding query relationships via linear projections, significantly boosting inference speed and contextual modeling. Experiments on the VisDrone and TinyPerson datasets demonstrate the effectiveness of USF-DETR, achieving improvements of 2.3% and 1.4% mAP over baselines, respectively, while balancing accuracy and computational efficiency. The framework outperforms state-of-the-art methods in small object detection.<\/jats:p>","DOI":"10.3390\/sym17020242","type":"journal-article","created":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T08:53:41Z","timestamp":1738832021000},"page":"242","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Unified Spatial-Frequency Modeling and Alignment for Multi-Scale Small Object Detection"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-5192-3653","authenticated-orcid":false,"given":"Jing","family":"Liu","sequence":"first","affiliation":[{"name":"Xi\u2019an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, School of Computer Science, Xijing University, Xi\u2019an 710123, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3655-9702","authenticated-orcid":false,"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430072, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4735-0668","authenticated-orcid":false,"given":"Yanyan","family":"Cao","sequence":"additional","affiliation":[{"name":"Xi\u2019an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, School of Computer Science, Xijing University, Xi\u2019an 710123, China"}]},{"given":"Chaoping","family":"Guo","sequence":"additional","affiliation":[{"name":"Xi\u2019an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, School of Computer Science, Xijing University, Xi\u2019an 710123, China"}]},{"given":"Peijun","family":"Shi","sequence":"additional","affiliation":[{"name":"Xi\u2019an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, School of Computer Science, Xijing University, Xi\u2019an 710123, China"}]},{"given":"Pan","family":"Li","sequence":"additional","affiliation":[{"name":"Xi\u2019an Key Laboratory of Human-Machine Integration and Control Technology for Intelligent Rehabilitation, School of Computer Science, Xijing University, Xi\u2019an 710123, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,6]]},"reference":[{"key":"ref_1","first-page":"13467","article-title":"Towards large-scale small object detection: Survey and benchmarks","volume":"45","author":"Cheng","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"104046","DOI":"10.1016\/j.imavis.2020.104046","article-title":"Deep learning-based object detection in low-altitude UAV datasets: A survey","volume":"104","author":"Mittal","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"6266","DOI":"10.1109\/TKDE.2024.3393512","article-title":"Global Meets Local: Dual Activation Hashing Network for Large-Scale Fine-Grained Image Retrieval","volume":"36","author":"Jiang","year":"2024","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_4","unstructured":"Vaswani, A. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Girshick, R.B., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","unstructured":"Farhadi, A., and Redmon, J. (2018, January 18\u201323). YOLOv3: An Incremental Improvement. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_9","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2023, January 17\u201324). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_11","unstructured":"Wang, C.Y., Yeh, I.H., and Liao, H.Y.M. (October, January 29). Yolov9: Learning what you want to learn using programmable gradient information. Proceedings of the European Conference on Computer Vision, Milan, Italy."},{"key":"ref_12","unstructured":"Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). Yolov10: Real-time end-to-end object detection. arXiv."},{"key":"ref_13","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., and Chen, J. (2024, January 16\u201322). Detrs beat yolos on real-time object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01605"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"112487","DOI":"10.1016\/j.asoc.2024.112487","article-title":"Hierarchical Scale Awareness for object detection in Unmanned Aerial Vehicle Scenes","volume":"168","author":"Wang","year":"2024","journal-title":"Appl. Soft Comput."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"104471","DOI":"10.1016\/j.imavis.2022.104471","article-title":"Deep learning-based detection from the perspective of small or tiny objects: A survey","volume":"123","author":"Tong","year":"2022","journal-title":"Image Vis. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings; Part I 14.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_21","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020, January 13\u201318). Transformers are rnns: Fast autoregressive transformers with linear attention. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.Y., and Le, Q.V. (2019, January 15\u201320). Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, S., Huang, D., and Wang, Y. (2018, January 8\u201314). Receptive field block net for accurate and fast object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_24"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"5634819","DOI":"10.1109\/TGRS.2022.3224815","article-title":"Multiscale feature enhancement network for salient object detection in optical remote sensing images","volume":"60","author":"Wang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_28","unstructured":"Ma, T., Mao, M., Zheng, H., Gao, P., Wang, X., Han, S., Ding, E., Zhang, B., and Doermann, D. (2021). Oriented object detection with transformer. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., and Wang, C. (2021, January 11\u201317). Sparse r-cnn: End-to-end object detection with learnable proposals. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Montreal, BC, Canada.","DOI":"10.1109\/CVPR46437.2021.01422"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sun, Z., Cao, S., Yang, Y., and Kitani, K.M. (2021, January 11\u201317). Rethinking transformer-based set prediction for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00359"},{"key":"ref_31","unstructured":"Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., and Shum, H.Y. (2022). Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"4708011","DOI":"10.1109\/TGRS.2024.3457155","article-title":"A Lightweight Fusion Strategy with Enhanced Inter-layer Feature Correlation for Small Object Detection","volume":"62","author":"Xiao","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_33","first-page":"4704415","article-title":"A DeNoising FPN with Transformer R-CNN for Tiny Object Detection","volume":"62","author":"Liu","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_34","unstructured":"Du, Z., Hu, Z., Zhao, G., Jin, Y., and Ma, H. (2024). Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images. arXiv."},{"key":"ref_35","unstructured":"Yang, F., Fan, H., Chu, P., Blasch, E., and Ling, H. (November, January 27). Clustered object detection in aerial images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, C., Yang, T., Zhu, S., Chen, C., and Guan, S. (2020, January 14\u201319). Density map guided object detection in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00103"},{"key":"ref_37","unstructured":"Huang, Y., Chen, J., and Huang, D. (March, January 28). UFPMP-Det: Toward accurate and efficient object detection on drone imagery. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Du, B., Huang, Y., Chen, J., and Huang, D. (2023, January 17\u201324). Adaptive sparse convolutional networks with global context enhancement for faster object detection on drone images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01291"},{"key":"ref_39","first-page":"5902516","article-title":"DTSSNet: Dynamic Training Sample Selection Network for UAV Object Detection","volume":"62","author":"Chen","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhang, Z. (2023). Drone-YOLO: An efficient neural network method for target detection in drone images. Drones, 7.","DOI":"10.3390\/drones7080526"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Liu, K., Fu, Z., Jin, S., Chen, Z., Zhou, F., Jiang, R., Chen, Y., and Ye, J. (2024). ESOD: Efficient Small Object Detection on High-Resolution Images. arXiv.","DOI":"10.1109\/TIP.2024.3501853"},{"key":"ref_42","first-page":"5611215","article-title":"FFCA-YOLO for small object detection in remote sensing images","volume":"62","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Khalili, B., and Smyth, A.W. (2024). SOD-YOLOv8\u2014Enhancing YOLOv8 for Small Object Detection in Aerial Imagery and Traffic Scenes. Sensors, 24.","DOI":"10.3390\/s24196209"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"13863","DOI":"10.1109\/TITS.2024.3386928","article-title":"YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images","volume":"25","author":"Liu","year":"2024","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, S., Huang, S., Li, F., Zhang, H., Liang, Y., Su, H., Zhu, J., and Zhang, L. (2023, January 7\u201314). DQ-DETR: Dual query detection transformer for phrase extraction and grounding. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i2.25261"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Liu, J., Jing, D., Zhang, H., and Dong, C. (2024). SRFAD-Net: Scale-Robust Feature Aggregation and Diffusion Network for Object Detection in Remote Sensing Images. Electronics, 13.","DOI":"10.3390\/electronics13122358"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 14\u201319). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"10763","DOI":"10.1109\/TPAMI.2024.3449959","article-title":"Frequency-aware feature fusion for dense image prediction","volume":"46","author":"Chen","year":"2024","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","first-page":"5408915","article-title":"Multiscale Spatial-Frequency Domain Dynamic Pansharpening of Remote Sensing Images Integrated with Wavelet Transform","volume":"62","author":"Li","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Shaker, A., Maaz, M., Rasheed, H., Khan, S., Yang, M.H., and Khan, F.S. (2023, January 2\u20133). Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01598"},{"key":"ref_51","unstructured":"Du, D., Zhu, P., Wen, L., Bian, X., Lin, H., Hu, Q., Peng, T., Zheng, J., Wang, X., and Zhang, Y. (2019, January 27\u201328). VisDrone-DET2019: The vision meets drone object detection in image challenge results. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Yu, X., Gong, Y., Jiang, N., Ye, Q., and Han, Z. (2020, January 1\u20135). Scale match for tiny person detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093394"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Liu, J., Jing, D., Cao, Y., Wang, Y., Guo, C., Shi, P., and Zhang, H. (2024). Lightweight Progressive Fusion Calibration Network for Rotated Object Detection in Remote Sensing Images. Electronics, 13.","DOI":"10.3390\/electronics13163172"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1483","DOI":"10.1109\/TPAMI.2019.2956516","article-title":"Cascade R-CNN: High quality object detection and instance segmentation","volume":"43","author":"Cai","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Feng, C., Zhong, Y., Gao, Y., Scott, M.R., and Huang, W. (2021, January 11\u201317). Tood: Task-aligned one-stage object detection. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00349"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Biffi, L.J., Mitishita, E., Liesenberg, V., Santos, A.A.d., Gon\u00e7alves, D.N., Estrabis, N.V., Silva, J.d.A., Osco, L.P., Ramos, A.P.M., and Centeno, J.A.S. (2020). ATSS deep learning-based approach to detect apple fruits. Remote Sens., 13.","DOI":"10.3390\/rs13010054"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Lin, T. (2017). Focal Loss for Dense Object Detection. arXiv.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_59","unstructured":"Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., Zhang, S., and Chen, K. (2022). Rtmdet: An empirical study of designing real-time object detectors. arXiv."},{"key":"ref_60","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/2\/242\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:28:09Z","timestamp":1760027289000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/2\/242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,6]]},"references-count":60,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["sym17020242"],"URL":"https:\/\/doi.org\/10.3390\/sym17020242","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,6]]}}}