{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T19:12:21Z","timestamp":1773256341970,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,6,29]],"date-time":"2022-06-29T00:00:00Z","timestamp":1656460800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,29]],"date-time":"2022-06-29T00:00:00Z","timestamp":1656460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In crowded scenes, one of the most important issues is that heavily overlapped objects are hardly distinguished from each other since most of their pixels are shared and the visible pixels of the occluded objects, which are used to represent their features, are limited. In this paper, a spatial pyramid convolutional shuffle (SPCS) module is proposed to extract refined information from the limited visible pixels of the occluded objects and generate distinguishable representations for the heavily overlapped objects. We adopt four convolutional kernels with different sizes and dilation rates at each location in the pyramid features and adjacently recombine their fused outputs spatially using a pixel shuffle module. In this way, four distinguishable instance predictions corresponding different convolutional kernels can be produced for each location in the pyramid feature. In addition, multiple convolutional operations with different kernel sizes and dilation rates at the same location can generate refined information for the corresponding regions, which is helpful to extract features for the occluded objects from their limited visible pixels. Extensive experimental results demonstrate that SPCS module can effectively boost the performance in crowded human detection. YOLO detector with SPCS module achieves 94.11% AP, 41.75% MR, 97.75% Recall on CrowdHuman, 93.04% AP, and 98.45% Recall on WiderPerson, which are the best compared with previous state-of-the-art models.<\/jats:p>","DOI":"10.1007\/s40747-022-00786-7","type":"journal-article","created":{"date-parts":[[2022,6,29]],"date-time":"2022-06-29T09:07:57Z","timestamp":1656493677000},"page":"301-315","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["SPCS: a spatial pyramid convolutional shuffle module for YOLO to detect occluded object"],"prefix":"10.1007","volume":"9","author":[{"given":"Xiang","family":"Li","sequence":"first","affiliation":[]},{"given":"Miao","family":"He","sequence":"additional","affiliation":[]},{"given":"Yan","family":"Liu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6425-6433","authenticated-orcid":false,"given":"Haibo","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Moran","family":"Ju","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,6,29]]},"reference":[{"key":"786_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TGRS.2021.3092433","volume":"60","author":"Y Yang","year":"2022","unstructured":"Yang Y, Tang X, Cheung Y-M, Zhang X, Liu F, Ma J, Jiao L (2022) Ar<sup>2<\/sup>det: An accurate and real-time rotational one-stage ship detector in remote sensing images. IEEE Trans Geosci Remote Sens 60:1\u201314. https:\/\/doi.org\/10.1109\/TGRS.2021.3092433","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"786_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TGRS.2022.3140856","volume":"60","author":"W Ma","year":"2022","unstructured":"Ma W, Li N, Zhu H, Jiao L, Tang X, Guo Y, Hou B (2022) Feature split\u2013merge\u2013enhancement network for remote sensing object detection. IEEE Trans Geosci Remote Sens 60:1\u201317. https:\/\/doi.org\/10.1109\/TGRS.2022.3140856","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"786_CR3","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1007\/s40747-020-00206-8","volume":"7","author":"N Chen","year":"2021","unstructured":"Chen N, Li M, Yuan H, Su X, Li Y (2021) Survey of pedestrian detection with occlusion. Complex Intell Syst 7:577\u2013587. https:\/\/doi.org\/10.1007\/s40747-020-00206-8","journal-title":"Complex Intell Syst"},{"key":"786_CR4","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR.2016.91"},{"key":"786_CR5","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","DOI":"10.1109\/CVPR.2017.690"},{"key":"786_CR6","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767"},{"key":"786_CR7","unstructured":"Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934"},{"key":"786_CR8","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C.-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision \u2013 ECCV 2016, pp 21\u201337. Springer, Cham","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"786_CR9","unstructured":"Fu C, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659"},{"issue":"6","key":"786_CR10","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149. https:\/\/doi.org\/10.1109\/TPAMI.2016.2577031","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"786_CR11","unstructured":"Zhou X, Wang D, Kr\u00e4henb\u00fchl P (2019) Objects as points. arXiv:1904.07850"},{"key":"786_CR12","doi-asserted-by":"publisher","unstructured":"Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), pp 9626\u20139635. https:\/\/doi.org\/10.1109\/ICCV.2019.00972","DOI":"10.1109\/ICCV.2019.00972"},{"key":"786_CR13","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","volume-title":"Computer Vision - ECCV 2014","author":"T-Y Lin","year":"2014","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 740\u2013755"},{"key":"786_CR14","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303\u2013338. https:\/\/doi.org\/10.1007\/s11263-009-0275-4","journal-title":"Int J Comput Vis"},{"key":"786_CR15","doi-asserted-by":"publisher","unstructured":"Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: one proposal, multiple predictions. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 12211\u201312220. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01223","DOI":"10.1109\/CVPR42600.2020.01223"},{"key":"786_CR16","doi-asserted-by":"crossref","unstructured":"Rukhovich D, Sofiiuk K, Galeev D, Barinova O, Konushin A (2021) Iterdet: iterative scheme for object detection in crowded environments. Structural, Syntactic, and Statistical Pattern Recognition. Springer, Cham, pp 344\u2013354","DOI":"10.1007\/978-3-030-73973-7_33"},{"key":"786_CR17","doi-asserted-by":"publisher","unstructured":"Jun M, Honglin W, Junxia W, Hao X, Chengjie B (2021) An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J Real-Time Image Process. https:\/\/doi.org\/10.1007\/s11554-021-01074-2","DOI":"10.1007\/s11554-021-01074-2"},{"key":"786_CR18","doi-asserted-by":"crossref","unstructured":"Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018. Springer, Cham, pp 657\u2013674","DOI":"10.1007\/978-3-030-01219-9_39"},{"key":"786_CR19","doi-asserted-by":"publisher","first-page":"112977","DOI":"10.1016\/j.eswa.2019.112977","volume":"141","author":"X Zeng","year":"2020","unstructured":"Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) Dspnet: Deep scale purifier network for dense crowd counting. Expert Syst Appl 141:112977. https:\/\/doi.org\/10.1016\/j.eswa.2019.112977","journal-title":"Expert Syst Appl"},{"key":"786_CR20","unstructured":"Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor R-CNN for human detection in a crowd. arXiv:1909.09998"},{"key":"786_CR21","unstructured":"G\u00e4hlert N, Hanselmann N, Franke U, Denzler J (2020) Visibility guided NMS: efficient boosting of amodal object detection in crowded traffic scenes. arXiv:2006.08547"},{"key":"786_CR22","doi-asserted-by":"publisher","unstructured":"Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In: 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 6452\u20136461 . https:\/\/doi.org\/10.1109\/CVPR.2019.00662","DOI":"10.1109\/CVPR.2019.00662"},{"key":"786_CR23","doi-asserted-by":"publisher","unstructured":"Huang X, Ge Z, Jie Z, Yoshie O (2020) Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10747\u201310756 . https:\/\/doi.org\/10.1109\/CVPR42600.2020.01076","DOI":"10.1109\/CVPR42600.2020.01076"},{"key":"786_CR24","doi-asserted-by":"crossref","unstructured":"Wang CY, Bochkovskiy A, Liao HYM (2020) Scaled-yolov4: Scaling cross stage partial network. arXiv:2011.08036","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"786_CR25","unstructured":"Ge Z, Liu S, Wang F, Li Z, Sun J (2021) YOLOX: exceeding YOLO series in 2021. arXiv:2107.08430"},{"key":"786_CR26","doi-asserted-by":"publisher","unstructured":"Shi W, Caballero J, Huszr F, Totz J, Aitken A.P, Bishop R, Rueckert D, Wang Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1874\u20131883 . https:\/\/doi.org\/10.1109\/CVPR.2016.207","DOI":"10.1109\/CVPR.2016.207"},{"key":"786_CR27","doi-asserted-by":"publisher","unstructured":"Bodla N, Singh B, Chellappa R, Davis L.S (2017) Soft-nms improving object detection with one line of code. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 5562\u20135570 . https:\/\/doi.org\/10.1109\/ICCV.2017.593","DOI":"10.1109\/ICCV.2017.593"},{"key":"786_CR28","unstructured":"Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd. arXiv:1805.00123"},{"issue":"2","key":"786_CR29","doi-asserted-by":"publisher","first-page":"380","DOI":"10.1109\/TMM.2019.2929005","volume":"22","author":"S Zhang","year":"2020","unstructured":"Zhang S, Xie Y, Wan J, Xia H, Li SZ, Guo G (2020) Widerperson: A diverse dataset for dense pedestrian detection in the wild. IEEE Trans Multimed 22(2):380\u2013393. https:\/\/doi.org\/10.1109\/TMM.2019.2929005","journal-title":"IEEE Trans Multimed"},{"key":"786_CR30","doi-asserted-by":"publisher","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580\u2013587 . https:\/\/doi.org\/10.1109\/CVPR.2014.81","DOI":"10.1109\/CVPR.2014.81"},{"key":"786_CR31","doi-asserted-by":"publisher","unstructured":"Girshick R (2015) Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440\u20131448 . https:\/\/doi.org\/10.1109\/ICCV.2015.169","DOI":"10.1109\/ICCV.2015.169"},{"key":"786_CR32","doi-asserted-by":"publisher","unstructured":"He K, Gkioxari G, Dollr P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980\u20132988 . https:\/\/doi.org\/10.1109\/ICCV.2017.322","DOI":"10.1109\/ICCV.2017.322"},{"key":"786_CR33","doi-asserted-by":"publisher","unstructured":"Lin T.-Y, Dollr P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936\u2013944 . https:\/\/doi.org\/10.1109\/CVPR.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"786_CR34","unstructured":"Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 29. Curran Associates, Inc., ??? . https:\/\/proceedings.neurips.cc\/paper\/2016\/file\/577ef1154f3240ad5b9b413aa7346a1e-Paper.pdf"},{"issue":"2","key":"786_CR35","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","volume":"42","author":"T-Y Lin","year":"2020","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Dollr P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318\u2013327. https:\/\/doi.org\/10.1109\/TPAMI.2018.2858826","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"786_CR36","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","volume":"104","author":"Selective search for object recognition","year":"2013","unstructured":"Selective search for object recognition (2013) J.R.R, U., van de Sande K.E.A., T, G., M, S.A.W. Int J Comput Vision 104:154\u2013171. https:\/\/doi.org\/10.1007\/s11263-013-0620-5","journal-title":"Int J Comput Vision"},{"key":"786_CR37","doi-asserted-by":"publisher","unstructured":"Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3150\u20133158 . https:\/\/doi.org\/10.1109\/CVPR.2016.343","DOI":"10.1109\/CVPR.2016.343"},{"key":"786_CR38","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1007\/978-3-030-01264-9_48","volume-title":"Computer Vision - ECCV 2018","author":"B Jiang","year":"2018","unstructured":"Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018. Springer, Cham, pp 816\u2013832"},{"key":"786_CR39","doi-asserted-by":"publisher","first-page":"642","DOI":"10.1007\/s11263-019-01204-1","volume":"128","author":"H Law","year":"2020","unstructured":"Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vision 128:642\u2013656. https:\/\/doi.org\/10.1007\/s11263-019-01204-1","journal-title":"Int J Comput Vision"},{"key":"786_CR40","doi-asserted-by":"publisher","unstructured":"Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 6995\u20137003 . https:\/\/doi.org\/10.1109\/CVPR.2018.00731","DOI":"10.1109\/CVPR.2018.00731"},{"key":"786_CR41","unstructured":"Misra D (2019) Mish: A self regularized non-monotonic neural activation function. CoRR arXiv:1908.08681"},{"issue":"9","key":"786_CR42","doi-asserted-by":"publisher","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","volume":"37","author":"H Kaiming","year":"2015","unstructured":"Kaiming H, Xiangyu Z, Shaoqing R, Jian S (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904\u20131916. https:\/\/doi.org\/10.1109\/TPAMI.2015.2389824","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"786_CR43","doi-asserted-by":"publisher","unstructured":"Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743\u2013761. https:\/\/doi.org\/10.1109\/TPAMI.2011.155","DOI":"10.1109\/TPAMI.2011.155"},{"key":"786_CR44","unstructured":"Loshchilov I, Hutter F (2016) SGDR: stochastic gradient descent with restarts. CoRR arXiv:1608.03983"},{"key":"786_CR45","doi-asserted-by":"publisher","unstructured":"Ge Z, Jie Z, Huang X, Xu R, Yoshie O (2020) Ps-rcnn: Detecting secondary human instances in a crowd via primary object suppression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1\u20136 . https:\/\/doi.org\/10.1109\/ICME46284.2020.9102793","DOI":"10.1109\/ICME46284.2020.9102793"},{"key":"786_CR46","unstructured":"Shang M, Xiang D, Wang Z, Zhou E (2021) V2f-net: Explicit decomposition of occluded pedestrian detection. CoRR arXiv:2104.03106"},{"key":"786_CR47","doi-asserted-by":"publisher","unstructured":"Zhou P, Zhou C, Peng P, Du J, Sun X, Guo X, Huang F (2020) Noh-nms: Improving pedestrian detection by nearby objects hallucination. In: Proceedings of the 28th ACM International Conference on Multimedia. MM \u201920, pp. 1967\u20131975. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3394171.3413617","DOI":"10.1145\/3394171.3413617"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00786-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00786-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00786-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T18:52:22Z","timestamp":1677091942000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00786-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,29]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["786"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00786-7","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,29]]},"assertion":[{"value":"1 December 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 June 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all the authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}