{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T01:14:05Z","timestamp":1772500445319,"version":"3.50.1"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T00:00:00Z","timestamp":1713139200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T00:00:00Z","timestamp":1713139200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100013143","name":"National Natural Science Foundation of China-Shandong Joint Fund for Marine Science Research Centers","doi-asserted-by":"publisher","award":["61472220"],"award-info":[{"award-number":["61472220"]}],"id":[{"id":"10.13039\/501100013143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100013143","name":"National Natural Science Foundation of China-Shandong Joint Fund for Marine Science Research Centers","doi-asserted-by":"publisher","award":["61572286"],"award-info":[{"award-number":["61572286"]}],"id":[{"id":"10.13039\/501100013143","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Convolutional neural network (CNN)-based object detectors perform excellently but lack global feature extraction and cannot establish global dependencies between object pixels. Although the Transformer is able to compensate for this, it does not incorporate the advantages of convolution, which results in insufficient information being obtained about the details of local features, as well as slow speed and large computational parameters. In addition, Feature Pyramid Network (FPN) lacks information interaction across layers, which can reduce the acquisition of feature context information. To solve the above problems, this paper proposes a CNN-based anchor-free object detector that combines transformer global and local feature extraction (GLFT) to enhance the extraction of semantic information from images. First, the segmented channel extraction feature attention (SCEFA) module was designed to improve the extraction of local multiscale channel features from the model and enhance the discrimination of pixels in the object region. Second, the aggregated feature hybrid transformer (AFHTrans) module combined with convolution is designed to enhance the extraction of global and local feature information from the model and to establish the dependency of the pixels of distant objects. This approach compensates for the shortcomings of the FPN by means of multilayer information aggregation transmission. Compared with a transformer, these methods have obvious advantages. Finally, the feature extraction head (FE-Head) was designed to extract full-text information based on the features of different tasks. An accuracy of 47.0% and 82.76% was achieved on the COCO2017 and PASCAL VOC2007\u2009+\u20092012 datasets, respectively, and the experimental results validate the effectiveness of our method.<\/jats:p>","DOI":"10.1007\/s40747-024-01409-z","type":"journal-article","created":{"date-parts":[[2024,4,15]],"date-time":"2024-04-15T09:01:59Z","timestamp":1713171719000},"page":"4897-4920","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Combining transformer global and local feature extraction for object detection"],"prefix":"10.1007","volume":"10","author":[{"given":"Tianping","family":"Li","sequence":"first","affiliation":[]},{"given":"Zhenyi","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Mengdi","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Zhaotong","family":"Cui","sequence":"additional","affiliation":[]},{"given":"Dongmei","family":"Wei","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,15]]},"reference":[{"key":"1409_CR1","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","volume":"111","author":"Z Zou","year":"2023","unstructured":"Zou Z, Chen K, Shi Z et al (2023) Object Detection in 20 Years: A Survey. Proc IEEE 111:257\u2013276. https:\/\/doi.org\/10.1109\/JPROC.2023.3238524","journal-title":"Proc IEEE"},{"key":"1409_CR2","doi-asserted-by":"publisher","first-page":"1706","DOI":"10.1016\/j.procs.2018.05.144","volume":"132","author":"AR Pathak","year":"2018","unstructured":"Pathak AR, Pandey M, Rautaray S (2018) Application of Deep Learning for Object Detection. Procedia Comput Sci 132:1706\u20131717. https:\/\/doi.org\/10.1016\/j.procs.2018.05.144","journal-title":"Procedia Comput Sci"},{"key":"1409_CR3","doi-asserted-by":"publisher","first-page":"7347","DOI":"10.1016\/j.jksuci.2021.08.001","volume":"34","author":"E Arulprakash","year":"2022","unstructured":"Arulprakash E, Aruldoss M (2022) A study on generic object detection with emphasis on future research directions. J King Saud Univ - Comput Inf Sci 34:7347\u20137365. https:\/\/doi.org\/10.1016\/j.jksuci.2021.08.001","journal-title":"J King Saud Univ - Comput Inf Sci"},{"key":"1409_CR4","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1007\/s13748-019-00203-0","volume":"9","author":"A Dhillon","year":"2020","unstructured":"Dhillon A, Verma GK (2020) Convolutional neural network: a review of models, methodologies and applications to object detection. Prog Artif Intell 9:85\u2013112. https:\/\/doi.org\/10.1007\/s13748-019-00203-0","journal-title":"Prog Artif Intell"},{"key":"1409_CR5","doi-asserted-by":"crossref","unstructured":"Vaidwan H, Seth N, Parihar AS, Singh K (2021) A study on transformer-based Object Detection. In: 2021 International Conference on Intelligent Technologies (CONIT). IEEE, Hubli, India, pp 1\u20136","DOI":"10.1109\/CONIT51480.2021.9498550"},{"key":"1409_CR6","unstructured":"Girshick R, Donahue J, Darrell T, Malik J Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: arXiv preprint arXiv:1311.2524"},{"key":"1409_CR7","doi-asserted-by":"crossref","unstructured":"Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving Into High Quality Object Detection. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, pp 6154\u20136162","DOI":"10.1109\/CVPR.2018.00644"},{"key":"1409_CR8","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"1409_CR9","unstructured":"Lin T-Y, Goyal P, Girshick R, et al Focal Loss for Dense Object Detection. In: arXiv preprint arXiv:1708.02002"},{"key":"1409_CR10","unstructured":"Ren S, He K, Girshick R (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. In: arXiv preprint arXiv:1506.01497"},{"key":"1409_CR11","unstructured":"Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. In: arXiv preprint arXiv:2004.10934"},{"key":"1409_CR12","doi-asserted-by":"crossref","unstructured":"Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully Convolutional One-Stage Object Detection. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), pp 9626\u20139635","DOI":"10.1109\/ICCV.2019.00972"},{"key":"1409_CR13","doi-asserted-by":"crossref","unstructured":"Zhang S, Chi C, Yao Y, et al (2020) Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 9756\u20139765","DOI":"10.1109\/CVPR42600.2020.00978"},{"key":"1409_CR14","doi-asserted-by":"publisher","unstructured":"Liu Y, Zhang Y, Wang Y, et al (2023) A Survey of Visual Transformers. IEEE Trans Neural Netw Learn Syst 1\u201321. https:\/\/doi.org\/10.1109\/TNNLS.2022.3227717","DOI":"10.1109\/TNNLS.2022.3227717"},{"key":"1409_CR15","doi-asserted-by":"crossref","unstructured":"Vedaldi A, Bischof H, Brox T, Frahm J-M (2020) End-to-End Object Detection with Transformers. In: Computer Vision \u2013 ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part I. Springer International Publishing, Cham.","DOI":"10.1007\/978-3-030-58583-9"},{"key":"1409_CR16","unstructured":"Zhu X, Su W, Lu L, et al (2021) Deformable detr: Deformable transformers for end-to-end object detection. In: arXiv preprint arXiv:2010.04159"},{"key":"1409_CR17","unstructured":"Vaswani A, Shazeer N, Parmar N, et al Attention is All you Need. In: arXiv preprint arXiv:1706.03762"},{"key":"1409_CR18","unstructured":"Ivanov A, Dryden N, Ben-Nun T, et al Data Movement Is All You Need: A Case Study on Optimizing Transformers. In: arXiv preprint arXiv:2007.00072"},{"key":"1409_CR19","doi-asserted-by":"crossref","unstructured":"Chen Y, Dai X, Chen D, et al (2022) Mobile-Former: Bridging MobileNet and Transformer. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, pp 5260\u20135269","DOI":"10.1109\/CVPR52688.2022.00520"},{"key":"1409_CR20","doi-asserted-by":"publisher","unstructured":"Harjoseputro Y, Yuda IgnP, Danukusumo KP (2020) MobileNets: Efficient Convolutional Neural Network for Identification of Protected Birds. Int J Adv Sci Eng Inf Technol 10:2290. https:\/\/doi.org\/10.18517\/ijaseit.10.6.10948","DOI":"10.18517\/ijaseit.10.6.10948"},{"key":"1409_CR21","unstructured":"Li K, Wang Y, Gao P, et al (2022) Uniformer: Unified transformer for efficient spatiotemporal representation learning. In: arXiv preprint arXiv:2201.04676"},{"key":"1409_CR22","unstructured":"Lou M, Zhou H-Y, Yang S, Yu Y (2023) TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition. In: arXiv preprint arXiv:2310.19380"},{"key":"1409_CR23","doi-asserted-by":"crossref","unstructured":"Sun Z, Cao S, Yang Y, Kitani K (2021) Rethinking Transformer-based Set Prediction for Object Detection. In: 2021 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, pp 3591\u20133600","DOI":"10.1109\/ICCV48922.2021.00359"},{"key":"1409_CR24","doi-asserted-by":"crossref","unstructured":"Zhang H, Zu K, Lu J, et al (2023) EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network. In: Wang L, Gall J, Chin T-J, et al (eds) Computer Vision \u2013 ACCV 2022. Springer Nature Switzerland, Cham, pp 541\u2013557","DOI":"10.1007\/978-3-031-26313-2_33"},{"key":"1409_CR25","doi-asserted-by":"crossref","unstructured":"Zhang Q-L, Yang Y-B (2021) SA-Net: Shuffle Attention for Deep Convolutional Neural Networks. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto, ON, Canada, pp 2235\u20132239","DOI":"10.1109\/ICASSP39728.2021.9414568"},{"key":"1409_CR26","doi-asserted-by":"publisher","first-page":"8906","DOI":"10.1109\/TMM.2023.3243616","volume":"25","author":"J Jiao","year":"2023","unstructured":"Jiao J, Tang Y-M, Lin K-Y et al (2023) DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition. IEEE Trans Multimed 25:8906\u20138919. https:\/\/doi.org\/10.1109\/TMM.2023.3243616","journal-title":"IEEE Trans Multimed"},{"key":"1409_CR27","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Dollar P, Girshick R, et al (2017) Feature Pyramid Networks for Object Detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, pp 936\u2013944","DOI":"10.1109\/CVPR.2017.106"},{"key":"1409_CR28","doi-asserted-by":"crossref","unstructured":"Zhang W, Huang Z, Luo G, et al (2022) TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, pp 12073\u201312083","DOI":"10.1109\/CVPR52688.2022.01177"},{"key":"1409_CR29","doi-asserted-by":"crossref","unstructured":"Feng C, Zhong Y, Gao Y, et al (2021) TOOD: Task-aligned One-stage Object Detection. In: 2021 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, pp 3490\u20133499","DOI":"10.1109\/ICCV48922.2021.00349"},{"key":"1409_CR30","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, pp 2980\u20132988","DOI":"10.1109\/ICCV.2017.322"},{"key":"1409_CR31","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/TGRS.2019.2930246","volume":"58","author":"Y Gong","year":"2020","unstructured":"Gong Y, Xiao Z, Tan X et al (2020) Context-Aware Convolutional Neural Network for Object Detection in VHR Remote Sensing Imagery. IEEE Trans Geosci Remote Sens 58:34\u201344. https:\/\/doi.org\/10.1109\/TGRS.2019.2930246","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"1409_CR32","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1007\/978-3-030-01228-1_15","volume-title":"Computer Vision \u2013 ECCV 2018","author":"S-W Kim","year":"2018","unstructured":"Kim S-W, Kook H-K, Sun J-Y et al (2018) Parallel Feature Pyramid Network for Object Detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision \u2013 ECCV 2018. Springer International Publishing, Cham, pp 239\u2013256"},{"key":"1409_CR33","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single Shot MultiBox Detector. pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1409_CR34","unstructured":"Deng L, Yang M, Li T, et al (2019) RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation. In: arXiv preprint arXiv:1907.00135"},{"key":"1409_CR35","doi-asserted-by":"publisher","first-page":"6893","DOI":"10.1109\/TIP.2022.3216771","volume":"31","author":"T Liang","year":"2022","unstructured":"Liang T, Chu X, Liu Y et al (2022) CBNet: A Composite Backbone Network Architecture for Object Detection. IEEE Trans Image Process 31:6893\u20136906. https:\/\/doi.org\/10.1109\/TIP.2022.3216771","journal-title":"IEEE Trans Image Process"},{"key":"1409_CR36","unstructured":"Law H, Deng J CornerNet: Detecting Objects as Paired Keypoints. In: arXiv preprint arXiv:1808.01244"},{"key":"1409_CR37","unstructured":"Liu S, Qi L, Qin H, et al Path Aggregation Network for Instance Segmentation. In: arXiv preprint arXiv:1803.01534"},{"key":"1409_CR38","doi-asserted-by":"crossref","unstructured":"Peng Z, Huang W, Gu S, et al (2021) Conformer: Local Features Coupling Global Representations for Visual Recognition. In: 2021 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, pp 357\u2013366","DOI":"10.1109\/ICCV48922.2021.00042"},{"key":"1409_CR39","doi-asserted-by":"crossref","unstructured":"Guo J, Han K, Wu H, et al (2022) CMT: Convolutional Neural Networks Meet Vision Transformers. In: 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, pp 12165\u201312175","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"1409_CR40","doi-asserted-by":"publisher","first-page":"1489","DOI":"10.1109\/TPAMI.2022.3164083","volume":"45","author":"Y Li","year":"2023","unstructured":"Li Y, Yao T, Pan Y, Mei T (2023) Contextual Transformer Networks for Visual Recognition. IEEE Trans Pattern Anal Mach Intell 45:1489\u20131500. https:\/\/doi.org\/10.1109\/TPAMI.2022.3164083","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1409_CR41","doi-asserted-by":"crossref","unstructured":"Li Y, Mao H, Girshick R, He K (2022) Exploring Plain Vision Transformer Backbones for Object Detection. In: Avidan S, Brostow G, Ciss\u00e9 M, et al (eds) Computer Vision \u2013 ECCV 2022. Springer Nature Switzerland, Cham, pp 280\u2013296","DOI":"10.1007\/978-3-031-20077-9_17"},{"key":"1409_CR42","unstructured":"Lin W, Wu Z, Chen J, et al Scale-Aware Modulation Meet Transformer. In: arXiv preprint arXiv:2307.08579"},{"key":"1409_CR43","unstructured":"Fan Q, Huang H, Guan J, He R (2023) Rethinking Local Perception in Lightweight Vision Transformer. In: arXiv preprint arXiv:2303.17803"},{"key":"1409_CR44","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1007\/978-3-030-01264-9_48","volume-title":"Computer Vision \u2013 ECCV 2018","author":"B Jiang","year":"2018","unstructured":"Jiang B, Luo R, Mao J et al (2018) Acquisition of Localization Confidence for Accurate Object Detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision \u2013 ECCV 2018. Springer International Publishing, Cham, pp 816\u2013832"},{"key":"1409_CR45","doi-asserted-by":"crossref","unstructured":"Wu Y, Chen Y, Yuan L, et al (2020) Rethinking Classification and Localization for Object Detection. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 10183\u201310192","DOI":"10.1109\/CVPR42600.2020.01020"},{"key":"1409_CR46","doi-asserted-by":"crossref","unstructured":"Song G, Liu Y, Wang X (2020) Revisiting the Sibling Head in Object Detector. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 11560\u201311569","DOI":"10.1109\/CVPR42600.2020.01158"},{"key":"1409_CR47","unstructured":"Ge Z, Liu S, Wang F, et al (2021) YOLOX: Exceeding YOLO Series in 2021. In: arXiv preprint arXiv:2107.08430"},{"key":"1409_CR48","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.109579","volume":"140","author":"Z Zhao","year":"2023","unstructured":"Zhao Z, He C, Zhao G et al (2023) RA-YOLOX: Re-parameterization align decoupled head and novel label assignment scheme based on YOLOX. Pattern Recognit 140:109579. https:\/\/doi.org\/10.1016\/j.patcog.2023.109579","journal-title":"Pattern Recognit"},{"key":"1409_CR49","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1016\/j.neucom.2019.10.076","volume":"379","author":"J Qin","year":"2020","unstructured":"Qin J, Huang Y, Wen W (2020) Multi-scale feature fusion residual network for Single Image Super-Resolution. Neurocomputing 379:334\u2013342. https:\/\/doi.org\/10.1016\/j.neucom.2019.10.076","journal-title":"Neurocomputing"},{"key":"1409_CR50","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.107149","volume":"100","author":"W Ma","year":"2020","unstructured":"Ma W, Wu Y, Cen F, Wang G (2020) MDFN: Multi-scale deep feature learning network for object detection. Pattern Recognit 100:107149. https:\/\/doi.org\/10.1016\/j.patcog.2019.107149","journal-title":"Pattern Recognit"},{"key":"1409_CR51","doi-asserted-by":"crossref","unstructured":"Li Y, Chen Y, Wang N, Zhang Z-X (2019) Scale-Aware Trident Networks for Object Detection. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), pp 6053\u20136062","DOI":"10.1109\/ICCV.2019.00615"},{"key":"1409_CR52","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2023\/6358162","volume":"2023","author":"T Li","year":"2023","unstructured":"Li T, Wei Y, Liu M et al (2023) Refined Division Features Based on Transformer for Semantic Image Segmentation. Int J Intell Syst 2023:1\u201315. https:\/\/doi.org\/10.1155\/2023\/6358162","journal-title":"Int J Intell Syst"},{"key":"1409_CR53","unstructured":"Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. In: arXiv preprint arXiv:1611.01144"},{"key":"1409_CR54","unstructured":"Xu B, Wang N, Chen T, et al. Empirical evaluation of rectified activations in convolutional network. In: arXiv preprint arXiv:1505.00853"},{"key":"1409_CR55","doi-asserted-by":"crossref","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate Attention for Efficient Mobile Network Design. In: 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, pp 13708\u201313717","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"1409_CR56","doi-asserted-by":"crossref","unstructured":"Cao Y, Xu J, Lin S, et al (2019) GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In: 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, Seoul, Korea (South), pp 1971\u20131980","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"1409_CR57","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1016\/j.patrec.2020.05.017","volume":"135","author":"M Tanaka","year":"2020","unstructured":"Tanaka M (2020) Weighted sigmoid gate unit for an activation function of deep neural network. Pattern Recognit Lett 135:354\u2013359. https:\/\/doi.org\/10.1016\/j.patrec.2020.05.017","journal-title":"Pattern Recognit Lett"},{"key":"1409_CR58","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.114528","volume":"170","author":"ZY Khan","year":"2021","unstructured":"Khan ZY, Niu Z (2021) CNN with depthwise separable convolutions and combined kernels for rating prediction. Expert Syst Appl 170:114528. https:\/\/doi.org\/10.1016\/j.eswa.2020.114528","journal-title":"Expert Syst Appl"},{"key":"1409_CR59","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2023.111305","volume":"284","author":"X Wei","year":"2024","unstructured":"Wei X, Zhang L, Zhang J et al (2024) Decoupled Sequential Detection Head for accurate acne detection. Knowl-Based Syst 284:111305. https:\/\/doi.org\/10.1016\/j.knosys.2023.111305","journal-title":"Knowl-Based Syst"},{"key":"1409_CR60","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88:303\u2013338. https:\/\/doi.org\/10.1007\/s11263-009-0275-4","journal-title":"Int J Comput Vis"},{"key":"1409_CR61","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","volume-title":"Computer Vision \u2013 ECCV 2014","author":"T-Y Lin","year":"2014","unstructured":"Lin T-Y, Maire M, Belongie S et al (2014) Microsoft COCO: Common Objects in Context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision \u2013 ECCV 2014. Springer International Publishing, Cham, pp 740\u2013755"},{"key":"1409_CR62","doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak J, et al (2019) Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In: 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 658\u2013666","DOI":"10.1109\/CVPR.2019.00075"},{"key":"1409_CR63","doi-asserted-by":"publisher","first-page":"15650","DOI":"10.1109\/TPAMI.2023.3292030","volume":"45","author":"P Sun","year":"2023","unstructured":"Sun P, Zhang R, Jiang Y et al (2023) Sparse R-CNN: An End-to-End Framework for Object Detection. IEEE Trans Pattern Anal Mach Intell 45:15650\u201315664. https:\/\/doi.org\/10.1109\/TPAMI.2023.3292030","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1409_CR64","doi-asserted-by":"publisher","first-page":"2567","DOI":"10.1609\/aaai.v36i3.20158","volume":"36","author":"Y Wang","year":"2022","unstructured":"Wang Y, Zhang X, Yang T, Sun J (2022) Anchor DETR: Query Design for Transformer-Based Detector. Proc AAAI Conf Artif Intell 36:2567\u20132575. https:\/\/doi.org\/10.1609\/aaai.v36i3.20158","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"1409_CR65","unstructured":"Liu S, Li F, Zhang H, et al (2022) DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. In: arXiv preprint arXiv:2201.12329"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01409-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01409-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01409-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T17:17:51Z","timestamp":1721236671000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01409-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,15]]},"references-count":65,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["1409"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01409-z","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,15]]},"assertion":[{"value":"13 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflicts of interest in the publication of this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}