{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T22:09:12Z","timestamp":1740175752316,"version":"3.37.3"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,8,1]],"date-time":"2023-08-01T00:00:00Z","timestamp":1690848000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,1]],"date-time":"2023-08-01T00:00:00Z","timestamp":1690848000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61961014"],"award-info":[{"award-number":["61961014"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hainan Provincial Natural Science Foundation of China","award":["620RC556"],"award-info":[{"award-number":["620RC556"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Overhead fisheye images can be used for person detection in intelligent monitoring systems. Unlike horizontal images, people in fisheye cameras are generally distributed in any orientation. When the object is rotated, the feature maps from convolutional neural networks have nonlinear variations and lose many orientation features. Transformer can learn the orientation relationship between features. However, a transformer cannot directly extract orientation features and the effectiveness of detecting small objects needs to be improved. In this paper, We propose a novel rotation-equivariant transformer backbone network, which combines group-equivariant convolution with swin transformer to solve these problems. In our proposed model, the rotation-equivariant feature map extracted by group-equivariant convolution contains a large number of orientation features in multiple directions. Aggregates feature in different directions to enhance the communication of orientation features before computing window self-attention. We propose the equivariant-group relation module for evaluating the similarity of the equivariant-group and calculating the aggregation weights. Our network architecture for multi-level receptive field structure can expand the local receptive field to enhance the detection of small objects. The experiments validate that our model achieves state-of-the-art performance on fisheye image datasets MW-R, HABBOF, and CEPDOF. Compared with the swin transformer, the accuracy of our model is improved by 0.3<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, 0.5<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, and 1.3<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, and the accuracy of small object detection in the CEPDOF dataset is improved by 0.73<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>.<\/jats:p>","DOI":"10.1007\/s40747-023-01176-3","type":"journal-article","created":{"date-parts":[[2023,8,1]],"date-time":"2023-08-01T01:01:52Z","timestamp":1690851712000},"page":"691-703","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Rotation-equivariant transformer for oriented person detection of overhead fisheye images"],"prefix":"10.1007","volume":"10","author":[{"given":"You","family":"Zhou","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2506-5981","authenticated-orcid":false,"given":"Yong","family":"Bai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongqing","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,8,1]]},"reference":[{"key":"1176_CR1","doi-asserted-by":"crossref","unstructured":"Ekwevugbe T, Brown N, Pakka V, Fan D (2013) Real-time building occupancy sensing using neural-network based sensor network. In: 2013 7th IEEE international conference on digital ecosystems and technologies (DEST). IEEE, pp 114\u2013119","DOI":"10.1109\/DEST.2013.6611339"},{"issue":"12","key":"1176_CR2","doi-asserted-by":"publisher","first-page":"2179","DOI":"10.1109\/TPAMI.2008.260","volume":"31","author":"M Enzweiler","year":"2008","unstructured":"Enzweiler M, Gavrila DM (2008) Monocular pedestrian detection: Survey and experiments. IEEE Trans Pattern Anal Mach Intell 31(12):2179\u20132195","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1176_CR3","doi-asserted-by":"crossref","unstructured":"Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA, pp 2325\u20132333","DOI":"10.1109\/CVPR.2016.255"},{"key":"1176_CR4","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.neucom.2018.01.092","volume":"300","author":"A Brunetti","year":"2018","unstructured":"Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300:17\u201333","journal-title":"Neurocomputing"},{"key":"1176_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.ymssp.2022.110001","volume":"188","author":"Y Shi","year":"2023","unstructured":"Shi Y, Li L, Yang J, Wang Y, Hao S (2023) Center-based transfer feature learning with classifier adaptation for surface defect recognition. Mech Syst Signal Process 188:110001","journal-title":"Mech Syst Signal Process"},{"key":"1176_CR6","doi-asserted-by":"crossref","unstructured":"Li N, Zhou CC (2020) Ampa-net: optimization-inspired attention neural network for deep compressed sensing. In: 2020 IEEE 20th international conference on communication technology (ICCT). IEEE, pp 1338\u20131344","DOI":"10.1109\/ICCT50939.2020.9295956"},{"key":"1176_CR7","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV). Santiago, Chile, pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"issue":"3","key":"1176_CR8","doi-asserted-by":"publisher","first-page":"1847","DOI":"10.1007\/s40747-021-00322-z","volume":"8","author":"Q Han","year":"2022","unstructured":"Han Q, Yin Q, Zheng X, Chen Z (2022) Remote sensing image building detection method based on mask r-cnn. Complex Intell Syst 8(3):1847\u20131855","journal-title":"Complex Intell Syst"},{"key":"1176_CR9","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1176_CR10","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA. pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"1176_CR11","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"1176_CR12","doi-asserted-by":"crossref","unstructured":"Ding J, Xue N, Long Y, Xia G-S, Lu Q (2019) Learning roi transformer for oriented object detection in aerial images. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 2849\u20132858","DOI":"10.1109\/CVPR.2019.00296"},{"key":"1176_CR13","doi-asserted-by":"crossref","unstructured":"Yang X, Yan J (2020) Arbitrary-oriented object detection with circular smooth label. In: European conference on computer vision. Springer, pp 677\u2013694","DOI":"10.1007\/978-3-030-58598-3_40"},{"key":"1176_CR14","doi-asserted-by":"crossref","unstructured":"Xia G-S, Bai X, Ding J, Zhu Z, Belongie S, Luo J, Datcu M, Pelillo M, Zhang L (2018) Dota: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3974\u20133983","DOI":"10.1109\/CVPR.2018.00418"},{"issue":"11","key":"1176_CR15","doi-asserted-by":"publisher","first-page":"3111","DOI":"10.1109\/TMM.2018.2818020","volume":"20","author":"J Ma","year":"2018","unstructured":"Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111\u20133122","journal-title":"IEEE Trans Multimed"},{"key":"1176_CR16","doi-asserted-by":"crossref","unstructured":"Han J, Ding J, Xue N, Xia G-S (2021) Redet: a rotation-equivariant detector for aerial object detection. In: 2021 IEEE\/CVF conference on computer vsion and pattern recognition (CVPR). Nashville, TN, USA, pp 2786\u20132795","DOI":"10.1109\/CVPR46437.2021.00281"},{"key":"1176_CR17","doi-asserted-by":"crossref","unstructured":"Li S, Tezcan MO, Ishwar P, Konrad J (2019) Supervised people counting using an overhead fisheye camera. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1\u20138","DOI":"10.1109\/AVSS.2019.8909877"},{"key":"1176_CR18","doi-asserted-by":"crossref","unstructured":"Tamura M, Horiguchi S, Murakami T (2019) Omnidirectional pedestrian detection by rotation invariant training. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1989\u20131998","DOI":"10.1109\/WACV.2019.00216"},{"key":"1176_CR19","doi-asserted-by":"crossref","unstructured":"Duan Z, Tezcan O, Nakamura H, Ishwar P, Konrad J (2020) Rapid: rotation-aware people detection in overhead fisheye images. In: 2020 IEEE\/CVF conference on computer vision and pattern recognition (CVPR). Seattle, WA, USA, pp 636\u2013637","DOI":"10.1109\/CVPRW50498.2020.00326"},{"key":"1176_CR20","unstructured":"Cohen T, Welling M (2016) Group equivariant convolutional networks. In: International conference on machine learning. PMLR, pp 2990\u20132999"},{"key":"1176_CR21","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16 x 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929"},{"key":"1176_CR22","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"1176_CR23","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"1176_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.compstruc.2022.106918","volume":"275","author":"MM Rosso","year":"2023","unstructured":"Rosso MM, Marasco G, Aiello S, Aloisio A, Chiaia B, Marano GC (2023) Convolutional networks and transformers for intelligent road tunnel investigations. Comput Struct 275:106918","journal-title":"Comput Struct"},{"issue":"4","key":"1176_CR25","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/acb075","volume":"34","author":"L Shen","year":"2023","unstructured":"Shen L, Tao H, Ni Y, Wang Y, Vladimir S (2023) Improved yolov3 model with feature map cropping for multi-scale road object detection. Meas Sci Technol 34(4):045406","journal-title":"Meas Sci Technol"},{"key":"1176_CR26","doi-asserted-by":"crossref","unstructured":"Li W, Chen Y, Hu K, Zhu J (2022) Oriented reppoints for aerial object detection. In: 2022 IEEE\/CVF conference on computer vision and pattern recognition (CVPR). New Orleans, LA, USA, pp 1829\u20131838","DOI":"10.1109\/CVPR52688.2022.00187"},{"key":"1176_CR27","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE international conference on computer vision (ICCV). Venice, Italy, pp 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"1176_CR28","doi-asserted-by":"crossref","unstructured":"Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE\/CVF international conference on computer vision (ICCV). Seoul, Korea (South), pp 6569\u20136578","DOI":"10.1109\/ICCV.2019.00667"},{"key":"1176_CR29","unstructured":"Beal J, Kim E, Tzeng E, Park DH, Zhai A, Kislyuk D (2020) Toward transformer-based object detection. arXiv preprint arXiv:2012.09958"},{"key":"1176_CR30","doi-asserted-by":"crossref","unstructured":"Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: 2021 IEEE\/CVF conference on computer vision and pattern recognition (CVPR). Nashville, TN, USA, pp 6881\u20136890","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"1176_CR31","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28"},{"key":"1176_CR32","unstructured":"Hoogeboom E, Peters JW, Cohen TS, Welling M (2018) Hexaconv. arXiv preprint arXiv:1803.02108"},{"key":"1176_CR33","doi-asserted-by":"crossref","unstructured":"Marcos D, Volpi M, Komodakis N, Tuia D (2017) Rotation equivariant vector field networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, pp 5048\u20135057","DOI":"10.1109\/ICCV.2017.540"},{"key":"1176_CR34","doi-asserted-by":"crossref","unstructured":"Zhou Y, Ye Q, Qiu Q, Jiao J (2017) Oriented response networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, pp 519\u2013528","DOI":"10.1109\/CVPR.2017.527"},{"key":"1176_CR35","doi-asserted-by":"crossref","unstructured":"Azimi SM, Vig E, Bahmanyar R, K\u00f6rner M, Reinartz P (2018) Towards multi-class object detection in unconstrained remote sensing imagery. In: Asian conference on computer vision. Springer, pp 150\u2013165","DOI":"10.1007\/978-3-030-20893-6_10"},{"issue":"11","key":"1176_CR36","doi-asserted-by":"publisher","first-page":"1745","DOI":"10.1109\/LGRS.2018.2856921","volume":"15","author":"Z Zhang","year":"2018","unstructured":"Zhang Z, Guo W, Zhu S, Yu W (2018) Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks. IEEE Geosci Remote Sens Lett 15(11):1745\u20131749","journal-title":"IEEE Geosci Remote Sens Lett"},{"key":"1176_CR37","doi-asserted-by":"crossref","unstructured":"Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579","DOI":"10.1109\/ICPR.2018.8545598"},{"key":"1176_CR38","doi-asserted-by":"publisher","unstructured":"Minh QN, Van BL, Nguyen C, Le A, Nguyen VD (2021) Arpd: anchor-free rotation-aware people detection using topview fisheye camera. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS). Washington, DC, USA, 1\u20138. https:\/\/doi.org\/10.1109\/AVSS52988.2021.9663768","DOI":"10.1109\/AVSS52988.2021.9663768"},{"key":"1176_CR39","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Doll\u00e1r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, HI, USA, p 936\u2013944. https:\/\/doi.org\/10.1109\/CVPR.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"1176_CR40","unstructured":"Mirror worlds challenge. http:\/\/www2.icat.vt.edu\/mirrorworlds\/challenge\/index.html. Accessed 11 Sept 2022"},{"key":"1176_CR41","unstructured":"Human-aligned bounding boxes from overhead fisheye cameras dataset. https:\/\/vip.bu.edu\/projects\/vsns\/cossy\/datasets\/habbof\/. Accessed 11 Sept 2022"},{"key":"1176_CR42","unstructured":"Challenging events for person detection from overhead fisheye images. https:\/\/vip.bu.edu\/projects\/vsns\/cossy\/datasets\/cepdof\/. Accessed 11 Sept 2022"},{"key":"1176_CR43","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01176-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01176-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01176-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,10]],"date-time":"2024-02-10T22:23:01Z","timestamp":1707603781000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01176-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,1]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["1176"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01176-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,8,1]]},"assertion":[{"value":"22 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Corresponding authors declare on behalf of all authors that there is no conflict of interest. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}