{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T17:07:00Z","timestamp":1781197620074,"version":"3.54.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:00:00Z","timestamp":1771459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T00:00:00Z","timestamp":1778457600000},"content-version":"vor","delay-in-days":81,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"crossref","award":["RS-2024-00407739"],"award-info":[{"award-number":["RS-2024-00407739"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J. King Saud Univ. Comput. Inf. Sci."],"published-print":{"date-parts":[[2026,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Object detection is a technology that automatically identifies and locates specific objects in images or videos and plays a core role in various fields, such as autonomous driving, security surveillance, and medical imaging. You Only Look Once (YOLO) has gained attention for achieving both accuracy and detection speed in real-time applications; however, during the resolution reduction process, detailed information is lost, and unnecessary signals are mixed in the multi-scale feature fusion stage, resulting in limited detection performance for small objects and complex background scenes. To alleviate these limitations, we propose YOLO-RECAP, which integrates a Content-Aware ReAssembly of FEatures (CARAFE) and Efficient Channel Attention (ECA) modules based on YOLOv11. CARAFE precisely restores boundaries and shapes during the upsampling stage by utilizing position-specific content information, whereas ECA effectively models interchannel interactions to emphasize important signals. For performance verification, VisDrone2019, Store Keeping Unit-110\u00a0K (SKU-110\u00a0K), Pascal Visual Object Classes (VOC), and Dataset for Object Detection in Aerial Images (DOTA)v1 were used. In addition, Latency is reported under an end-to-end setting that includes pre-processing, inference, and post-processing, to reflect practical deployment conditions. As a result, YOLO-RECAP achieved mAP50 of 0.316, mAP50@95 of 0.184, Latency of 16.5\u00a0ms, and 60.6 FPS on VisDrone2019; mAP50 of 0.895, mAP50@95 of 0.572, Latency of 18.2\u00a0ms, and 55.0 FPS on SKU-110\u00a0K; mAP50 of 0.770, mAP50@95 of 0.561, Latency of 15.6\u00a0ms, and 63.9 FPS on Pascal VOC; and mAP50 of 0.281, mAP50@95 of 0.157, Latency of 18.6\u00a0ms, and 53.9 FPS on DOTAv1. Qualitative bounding-box visualizations further indicate reduced missed detections and more stable predictions in cluttered and densely populated scenes. As a result, YOLO-RECAP provided a more stable and balanced detection performance than the existing YOLOv11 and recent detection models, especially for small objects and complex backgrounds. This code is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/Heon-ju\/YOLO-RECAP.git\" ext-link-type=\"uri\">https:\/\/github.com\/Heon-ju\/YOLO-RECAP.git<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1007\/s44443-026-00525-9","type":"journal-article","created":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T16:36:13Z","timestamp":1771518973000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["YOLO-RECAP: reassembly with channel attention for perception"],"prefix":"10.1007","volume":"38","author":[{"given":"Heon-Ju","family":"Kim","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sung-Wook","family":"Park","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chun-Bo","family":"Sim","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Se-Hoon","family":"Jung","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,2,19]]},"reference":[{"key":"525_CR1","doi-asserted-by":"publisher","first-page":"77019","DOI":"10.1109\/ACCESS.2024.3405341","volume":"12","author":"FS Alamri","year":"2024","unstructured":"Alamri FS, Abdullahi SB, Khan AR, Saba T (2024) Enhanced weak spatial modeling through CNN-based deep sign language skeletal feature transformation. IEEE Access 12:77019\u201377040. https:\/\/doi.org\/10.1109\/ACCESS.2024.3405341","journal-title":"IEEE Access"},{"issue":"2","key":"525_CR2","doi-asserted-by":"publisher","DOI":"10.1007\/s10044-025-01471-4","volume":"28","author":"M-H Bae","year":"2025","unstructured":"Bae M-H, Park S-W, Park J, Jung S-H, Sim C-B (2025) YOLO-RACE: reassembly and convolutional block attention for enhanced dense object detection. Pattern Anal Appl 28(2):90. https:\/\/doi.org\/10.1007\/s10044-025-01471-4","journal-title":"Pattern Anal Appl"},{"key":"525_CR3","doi-asserted-by":"publisher","unstructured":"Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 886\u2013893. https:\/\/doi.org\/10.1109\/CVPR.2005.177","DOI":"10.1109\/CVPR.2005.177"},{"key":"525_CR4","doi-asserted-by":"publisher","unstructured":"Du D et al (2019) VisDrone-DET2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), pp 213\u2013226. https:\/\/doi.org\/10.1109\/ICCVW.2019.00030","DOI":"10.1109\/ICCVW.2019.00030"},{"key":"525_CR5","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88:303\u2013338. https:\/\/doi.org\/10.1007\/s11263-009-0275-4","journal-title":"Int J Comput Vis"},{"key":"525_CR6","doi-asserted-by":"publisher","unstructured":"Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1440\u20131448. https:\/\/doi.org\/10.1109\/ICCV.2015.169","DOI":"10.1109\/ICCV.2015.169"},{"issue":"1","key":"525_CR7","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1109\/TPAMI.2015.2437384","volume":"38","author":"R Girshick","year":"2016","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142\u2013158. https:\/\/doi.org\/10.1109\/TPAMI.2015.2437384","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"525_CR8","doi-asserted-by":"publisher","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 580\u2013587. https:\/\/doi.org\/10.1109\/CVPR.2014.81","DOI":"10.1109\/CVPR.2014.81"},{"key":"525_CR9","doi-asserted-by":"publisher","unstructured":"Goldman E, Herzig R, Eisenschtat A, Goldberger J, Hassner T (2019) Precise detection in densely packed scenes. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 5227\u20135236. https:\/\/doi.org\/10.1109\/CVPR.2019.00537","DOI":"10.1109\/CVPR.2019.00537"},{"issue":"1","key":"525_CR10","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","volume":"45","author":"K Han","year":"2023","unstructured":"Han K et al (2023) A survey on vision transformer. IEEE Trans Pattern Anal Mach Intell 45(1):87\u2013110. https:\/\/doi.org\/10.1109\/TPAMI.2022.3152247","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"525_CR11","doi-asserted-by":"publisher","unstructured":"Han J, Ding J, Xue N, Xia G-S (2021) ReDet: A rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 2786\u20132795. https:\/\/doi.org\/10.1109\/CVPR46437.2021.00281","DOI":"10.1109\/CVPR46437.2021.00281"},{"key":"525_CR12","doi-asserted-by":"publisher","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 13713\u201313722. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01350","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"525_CR13","unstructured":"https:\/\/docs.ultralytics.com\/ko\/models\/yolov8\/. Accessed 15 Jun 2025"},{"key":"525_CR14","unstructured":"https:\/\/docs.ultralytics.com\/ko\/models\/yolo11\/. Accessed 15 Jun 2025"},{"key":"525_CR15","doi-asserted-by":"publisher","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132\u20137141. https:\/\/doi.org\/10.1109\/CVPR.2018.00745","DOI":"10.1109\/CVPR.2018.00745"},{"key":"525_CR16","doi-asserted-by":"publisher","unstructured":"Khanam R, Hussain M (2024) YOLOv11: An overview of the key architectural enhancements. arXiv preprint arXiv:2410.17725. https:\/\/doi.org\/10.48550\/arXiv.2410.17725","DOI":"10.48550\/arXiv.2410.17725"},{"key":"525_CR17","doi-asserted-by":"publisher","first-page":"4851","DOI":"10.3390\/rs13234851","volume":"13","author":"M Kim","year":"2021","unstructured":"Kim M, Jeong J, Kim S (2021) Ecap-yolo: efficient channel attention pyramid YOLO for small object detection in aerial image. Remote Sens 13:4851. https:\/\/doi.org\/10.3390\/rs13234851","journal-title":"Remote Sens"},{"key":"525_CR18","doi-asserted-by":"publisher","unstructured":"Kotthapalli M, Ravipati D, Bhatia R (2025) YOLOv1 to YOLOv11: A comprehensive survey of real-time object detection innovations and challenges. arXiv preprint arXiv:2508.02067. https:\/\/doi.org\/10.48550\/arXiv.2508.02067","DOI":"10.48550\/arXiv.2508.02067"},{"key":"525_CR19","doi-asserted-by":"publisher","unstructured":"Li X, Wang W, Zhu X, Wu L, Fang T, Ma S, Lu L, Dai J, Qiao Y (2022) DN-DETR: Accelerate DETR training by introducing query de-noising. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 13619\u201313627. https:\/\/doi.org\/10.1109\/CVPR52688.2022.01320","DOI":"10.1109\/CVPR52688.2022.01320"},{"key":"525_CR20","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft COCO: Common objects in context. In: European Conference on Computer Vision (ECCV), pp 740\u2013755. https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"525_CR21","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Doll\u00e1r P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 2999\u20133007. https:\/\/doi.org\/10.1109\/ICCV.2017.324","DOI":"10.1109\/ICCV.2017.324"},{"key":"525_CR22","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Doll\u00e1r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117\u20132125. https:\/\/doi.org\/10.1109\/CVPR.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"525_CR23","doi-asserted-by":"publisher","unstructured":"Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 8759\u20138768. https:\/\/doi.org\/10.1109\/CVPR.2018.00913","DOI":"10.1109\/CVPR.2018.00913"},{"issue":"2","key":"525_CR24","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"DG Lowe","year":"2004","unstructured":"Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91\u2013110. https:\/\/doi.org\/10.1023\/B:VISI.0000029664.99615.94","journal-title":"Int J Comput Vis"},{"key":"525_CR25","doi-asserted-by":"publisher","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779\u2013788. https:\/\/doi.org\/10.1109\/CVPR.2016.91","DOI":"10.1109\/CVPR.2016.91"},{"key":"525_CR26","doi-asserted-by":"publisher","unstructured":"Sun J, Ge H, Zhang Z (2021) AS-YOLO: An improved YOLOv4 based on attention mechanism and SqueezeNet for person detection. In: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol 5, pp 1451\u20131456. https:\/\/doi.org\/10.1109\/IAEAC50856.2021.9390810","DOI":"10.1109\/IAEAC50856.2021.9390810"},{"key":"525_CR27","doi-asserted-by":"publisher","unstructured":"Tan M, Pang R, Le QV (2020) EfficientDet: Scalable and efficient object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10781\u201310790. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01079","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"525_CR28","doi-asserted-by":"publisher","unstructured":"Vaswani A et al (2017) Attention is all you need. arXiv preprint arXiv:1706.03762. https:\/\/doi.org\/10.48550\/arXiv.1706.03762","DOI":"10.48550\/arXiv.1706.03762"},{"key":"525_CR29","doi-asserted-by":"publisher","unstructured":"Wang C-Y, Bochkovskiy A, Liao H-YM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 7464\u20137475. https:\/\/doi.org\/10.1109\/CVPR52729.2023.00721","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"525_CR30","doi-asserted-by":"publisher","unstructured":"Wang C-Y, Yeh I-H, Liao H-YM (2024) YOLOv9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616. https:\/\/doi.org\/10.48550\/arXiv.2402.13616","DOI":"10.48550\/arXiv.2402.13616"},{"key":"525_CR31","doi-asserted-by":"publisher","unstructured":"Wang J et al (2019) CARAFE: Content-aware reassembly of features. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), pp 3007\u20133016. https:\/\/doi.org\/10.1109\/ICCV.2019.00312","DOI":"10.1109\/ICCV.2019.00312"},{"key":"525_CR32","doi-asserted-by":"publisher","unstructured":"Wang C-Y et al (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 390\u2013391. https:\/\/doi.org\/10.1109\/CVPRW50498.2020.00203","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"525_CR33","doi-asserted-by":"publisher","unstructured":"Wang Q et al (2020) ECA-Net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 11534\u201311542. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01155","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"525_CR34","doi-asserted-by":"publisher","unstructured":"Wang A et al (2024) YOLOv10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458. https:\/\/doi.org\/10.48550\/arXiv.2405.14458","DOI":"10.48550\/arXiv.2405.14458"},{"key":"525_CR35","doi-asserted-by":"publisher","unstructured":"Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3\u201319. https:\/\/doi.org\/10.1007\/978-3-030-01234-2_1","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"525_CR36","doi-asserted-by":"publisher","unstructured":"Zhang QL, Yang YB (2021) SA-Net: Shuffle attention for deep convolutional neural networks. arXiv preprint arXiv:2102.00240. https:\/\/doi.org\/10.48550\/arXiv.2102.00240","DOI":"10.48550\/arXiv.2102.00240"},{"key":"525_CR37","doi-asserted-by":"publisher","unstructured":"Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, Liu Y, Chen J (2023) DETRs beat YOLOs on real-time object detection (RT-DETR). arXiv preprint arXiv:2304.08069. https:\/\/doi.org\/10.48550\/arXiv.2304.08069","DOI":"10.48550\/arXiv.2304.08069"},{"key":"525_CR38","doi-asserted-by":"publisher","unstructured":"Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable DETR: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159. https:\/\/doi.org\/10.48550\/arXiv.2010.04159","DOI":"10.48550\/arXiv.2010.04159"},{"key":"525_CR39","doi-asserted-by":"publisher","unstructured":"Zhu L, Wang X, Ke Z, Zhang W, Lau RWH (2023) BiFormer: Vision transformer with bi-level routing attention. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10323\u201310333. https:\/\/doi.org\/10.1109\/CVPR52729.2023.00995","DOI":"10.1109\/CVPR52729.2023.00995"},{"key":"525_CR40","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","volume":"111","author":"Z Zou","year":"2023","unstructured":"Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. Proc IEEE 111:257\u2013276. https:\/\/doi.org\/10.1109\/JPROC.2023.3238524","journal-title":"Proc IEEE"}],"container-title":["Journal of King Saud University Computer and Information Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44443-026-00525-9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44443-026-00525-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44443-026-00525-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T16:43:06Z","timestamp":1781196186000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44443-026-00525-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,19]]},"references-count":40,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,5]]}},"alternative-id":["525"],"URL":"https:\/\/doi.org\/10.1007\/s44443-026-00525-9","relation":{},"ISSN":["1319-1578","2213-1248"],"issn-type":[{"value":"1319-1578","type":"print"},{"value":"2213-1248","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,19]]},"assertion":[{"value":"2 December 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 January 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"162"}}