{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T00:51:48Z","timestamp":1775609508819,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T00:00:00Z","timestamp":1649462400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T00:00:00Z","timestamp":1649462400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004593","name":"Universidad Aut\u00f3noma de Madrid","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004593","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2022,5]]},"abstract":"<jats:sec>\n                <jats:title>Abstract<\/jats:title>\n                <jats:p>In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in crowded scenarios. The generalization of multi-camera setups permits to better confront occlusions by combining information from different viewpoints. In this paper, we present a multi-camera approach to globally combine pedestrian detections leveraging automatically extracted scene context. Contrarily to the majority of the methods of the state-of-the-art, the proposed approach is scene-agnostic, not requiring a tailored adaptation to the target scenario\u2013e.g., via fine-tuning. This noteworthy attribute does not require <jats:italic>ad hoc<\/jats:italic> training with labeled data, expediting the deployment of the proposed method in real-world situations. Context information, obtained via semantic segmentation, is used (1) to automatically generate a common area of interest for the scene and all the cameras, avoiding the usual need of manually defining it, and (2) to obtain detections for each camera by solving a global optimization problem that maximizes coherence of detections both in each 2D image and in the 3D scene. This process yields tightly fitted bounding boxes that circumvent occlusions or miss detections. The experimental results on five publicly available datasets show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad hoc annotations nor human-guided configuration.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1007\/s10115-022-01673-w","type":"journal-article","created":{"date-parts":[[2022,4,9]],"date-time":"2022-04-09T03:28:56Z","timestamp":1649474936000},"page":"1211-1237","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Semantic-driven multi-camera pedestrian detection"],"prefix":"10.1007","volume":"64","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9113-085X","authenticated-orcid":false,"given":"Alejandro","family":"L\u00f3pez-Cifuentes","sequence":"first","affiliation":[]},{"given":"Marcos","family":"Escudero-Vi\u00f1olo","sequence":"additional","affiliation":[]},{"given":"Jes\u00fas","family":"Besc\u00f3s","sequence":"additional","affiliation":[]},{"given":"Pablo","family":"Carballeira","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,4,9]]},"reference":[{"issue":"1","key":"1673_CR1","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s10851-010-0258-7","volume":"41","author":"A Alahi","year":"2011","unstructured":"Alahi A, Jacques L, Boursier Y, Vandergheynst P (2011) Sparsity driven people localization with a heterogeneous network of cameras. J Math Imaging Vision 41(1):39\u201358","journal-title":"J Math Imaging Vision"},{"key":"1673_CR2","first-page":"8264","volume":"4","author":"H Aliakbarpour","year":"2016","unstructured":"Aliakbarpour H, Prasath VBS, Palaniappan K, Seetharaman G, Dias J (2016) Heterogeneous multi-view information fusion: review of 3D reconstruction methods and a new registration with uncertainty modeling. Science 4:8264\u20138285","journal-title":"Science"},{"key":"1673_CR3","doi-asserted-by":"crossref","unstructured":"Baqu\u00e9 P, Fleuret F, Fua P (2017) Deep occlusion reasoning for multi-camera multi-target detection. In: IEEE international conference on computer vision (ICCV), pp 271\u2013279","DOI":"10.1109\/ICCV.2017.38"},{"key":"1673_CR4","unstructured":"Chavdarova T, Baqu\u00e9 P, Bouquet S, Maksai A, Jose C, Bagautdinov T, Lettry L, Fua P, Van\u00a0Gool L, Fleuret F (2018) Wildtrack dataset. https:\/\/cvlab.epfl.ch\/data\/wildtrack"},{"key":"1673_CR5","doi-asserted-by":"crossref","unstructured":"Chavdarova T, Baqu\u00e9 P, Bouquet S, Maksai A, Jose C, Bagautdinov T, Lettry L, Fua P, Van\u00a0Gool L, Fleuret F (2018) WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection","DOI":"10.1109\/CVPR.2018.00528"},{"key":"1673_CR6","unstructured":"Chavdarova T, Fleuret F (2018) Epfl rlc dataset. https:\/\/cvlab.epfl.ch\/data\/rlc"},{"key":"1673_CR7","doi-asserted-by":"crossref","unstructured":"Chavdarova T, et\u00a0al (2017) Deep multi-camera people detection. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA), pp 848\u2013853. IEEE","DOI":"10.1109\/ICMLA.2017.00-50"},{"key":"1673_CR8","doi-asserted-by":"crossref","unstructured":"Delannay D, Danhier N, De\u00a0Vleeschouwer C (2009) Detection and recognition of sports (wo)men from multiple views. In: ACM\/IEEE international conference on distributed smart cameras (ICDSC), pp 1\u20137. IEEE","DOI":"10.1109\/ICDSC.2009.5289407"},{"key":"1673_CR9","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 304\u2013311. IEEE","DOI":"10.1109\/CVPR.2009.5206631"},{"issue":"4","key":"1673_CR10","doi-asserted-by":"publisher","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","volume":"34","author":"P Dollar","year":"2012","unstructured":"Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743\u2013761","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"2","key":"1673_CR11","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303\u2013338","journal-title":"Int J Comput Vision"},{"key":"1673_CR12","unstructured":"Ferryman J, L.\u00a0Crowley J, Shahrokni A (2018) Pets 2009 dataset. http:\/\/www.cvg.reading.ac.uk\/PETS2009\/a.html"},{"key":"1673_CR13","unstructured":"Fleuret F, Berclaz J, Lengagne R (2018) Epfl terrace dataset. https:\/\/cvlab.epfl.ch\/data\/pom"},{"issue":"2","key":"1673_CR14","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1109\/TPAMI.2007.1174","volume":"30","author":"F Fleuret","year":"2008","unstructured":"Fleuret F, Berclaz J, Lengagne R, Fua P (2008) Multicamera people tracking with a probabilistic occupancy map. IEEE Trans Pattern Anal Mach Intell 30(2):267\u2013282","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1673_CR15","doi-asserted-by":"crossref","unstructured":"Franco J.S, Boyer E (2005) Fusion of multiview silhouette cues using a space occupancy grid. In: IEEE international conference on computer vision (ICCV), vol\u00a02, pp 1747\u20131753. IEEE","DOI":"10.1109\/ICCV.2005.105"},{"issue":"5","key":"1673_CR16","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1049\/iet-cvi.2014.0148","volume":"9","author":"\u00c1 Garc\u00eda-Mart\u00edn","year":"2015","unstructured":"Garc\u00eda-Mart\u00edn \u00c1, Mart\u00ednez JM (2015) People detection in surveillance: classification and evaluation. IET Comput Vision 9(5):779\u2013788","journal-title":"IET Comput Vision"},{"key":"1673_CR17","volume-title":"Multiple view geometry in computer vision","author":"R Hartley","year":"2003","unstructured":"Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge"},{"key":"1673_CR18","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun, J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1673_CR19","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der\u00a0Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700\u20134708","DOI":"10.1109\/CVPR.2017.243"},{"issue":"3","key":"1673_CR20","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1109\/TPAMI.2008.102","volume":"31","author":"SM Khan","year":"2009","unstructured":"Khan SM, Shah M (2009) Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans Pattern Anal Mach Intell 31(3):505\u2013519","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1673_CR21","doi-asserted-by":"crossref","unstructured":"Kim K, Davis L.S (2006) Multi-camera tracking and segmentation of occluded people on ground plane using search-guided particle filtering. In: European conference on computer vision (ECCV). Springer, pp 98\u2013109","DOI":"10.1007\/11744078_8"},{"key":"1673_CR22","doi-asserted-by":"crossref","unstructured":"Lima J.P, Roberto R, Figueiredo L, Simoes F, Teichrieb V (2021) Generalizable multi-camera 3d pedestrian detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1232\u20131240","DOI":"10.1109\/CVPRW53098.2021.00135"},{"key":"1673_CR23","doi-asserted-by":"crossref","unstructured":"Lin T.Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick C.L (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1673_CR24","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1673_CR25","unstructured":"Long X, Deng K, Wang G, Zhang Y, Dang Q, Gao Y, Shen H, Ren J, Han S, Ding E, et\u00a0al (2020) Pp-yolo: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099"},{"issue":"10","key":"1673_CR26","doi-asserted-by":"publisher","first-page":"1495","DOI":"10.1109\/LSP.2018.2865833","volume":"25","author":"A Lopez-Cifuentes","year":"2018","unstructured":"Lopez-Cifuentes A, Escudero M, Bescos J (2018) Automatic semantic parsing of the ground-plane in scenarios recorded with multiple moving cameras. IEEE Signal Process Lett 25(10):1495\u20131499","journal-title":"IEEE Signal Process Lett"},{"key":"1673_CR27","unstructured":"L\u00f3pez-Cifuentes A, Escudero-Vi\u00f1olo M, Besc\u00f3s J, Carballeira P (2018) Semantic driven multi-camera pedestrian detection. arXiv preprint arXiv:181210779v1"},{"key":"1673_CR28","first-page":"1","volume":"5","author":"C Ning","year":"2020","unstructured":"Ning C, Menglu L, Hao Y, Xueping S, Yunhong L (2020) Survey of pedestrian detection with occlusion. Complex Intell Syst 5:1\u201311","journal-title":"Complex Intell Syst"},{"issue":"5","key":"1673_CR29","doi-asserted-by":"publisher","first-page":"1760","DOI":"10.1016\/j.patcog.2014.12.004","volume":"48","author":"P Peng","year":"2015","unstructured":"Peng P, Tian Y, Wang Y, Li J, Huang T (2015) Robust multiple cameras pedestrian detection with multi-view bayesian network. Pattern Recogn 48(5):1760\u20131772","journal-title":"Pattern Recogn"},{"key":"1673_CR30","doi-asserted-by":"crossref","unstructured":"Priscilla C.V, Sheila S.A (2019) Pedestrian detection-a survey. In: International conference on information, communication and computing technology. Springer, pp 349\u2013358","DOI":"10.1007\/978-3-030-38501-9_35"},{"key":"1673_CR31","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"1673_CR32","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS)"},{"key":"1673_CR33","doi-asserted-by":"crossref","unstructured":"Stiefelhagen R, Bernardin K, Bowers R, Garofolo J, Mostefa D, Soundararajan P (2006) The clear 2006 evaluation. In: International evaluation workshop on classification of events, activities and relationships. Springer, pp 1\u201344","DOI":"10.1007\/978-3-540-69568-4_1"},{"key":"1673_CR34","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"1673_CR35","doi-asserted-by":"crossref","unstructured":"Tan M, Pang R, Le Q.V (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10781\u201310790","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"1673_CR36","unstructured":"Tao A, Sapra K, Catanzaro B(2020) Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821"},{"issue":"1","key":"1673_CR37","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1109\/TCSVT.2012.2203201","volume":"23","author":"\u00c1 Utasi","year":"2013","unstructured":"Utasi \u00c1, Benedek C (2013) A bayesian approach on people localization in multicamera systems. IEEE Trans Circuits Syst Video Technol 23(1):105\u2013115","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"1673_CR38","doi-asserted-by":"crossref","unstructured":"Wang C.Y, Bochkovskiy A, Liao H.Y.M (2020) Scaled-yolov4: Scaling cross stage partial network. arXiv preprint arXiv:2011.08036","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"1673_CR39","doi-asserted-by":"crossref","unstructured":"Xu Y, Liu X, Liu Y, Zhu S.C (2016) Multi-view people tracking via hierarchical trajectory composition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4256\u20134265","DOI":"10.1109\/CVPR.2016.461"},{"key":"1673_CR40","unstructured":"Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: ICLR"},{"key":"1673_CR41","unstructured":"Zhang H, Wu C, Zhang Z, Zhu Y, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R, et\u00a0al (2020) Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955"},{"key":"1673_CR42","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2881\u20132890","DOI":"10.1109\/CVPR.2017.660"},{"key":"1673_CR43","doi-asserted-by":"crossref","unstructured":"Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ade20k dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 633\u2013641","DOI":"10.1109\/CVPR.2017.544"},{"key":"1673_CR44","unstructured":"Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01673-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-022-01673-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01673-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,13]],"date-time":"2022-05-13T04:11:35Z","timestamp":1652415095000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-022-01673-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,9]]},"references-count":44,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,5]]}},"alternative-id":["1673"],"URL":"https:\/\/doi.org\/10.1007\/s10115-022-01673-w","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,9]]},"assertion":[{"value":"12 July 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 February 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 March 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 April 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}