{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T20:33:15Z","timestamp":1754598795511,"version":"3.37.3"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Pattern Anal Applic"],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Although many state-of-the-art methods of object detection in a single image have achieved great success in the last few years, they still suffer from the false positives in crowd scenes of the real-world applications like automatic checkout. In order to address the limitations of single-view object detection in complex scenes, we propose MVDet, an end-to-end learnable approach that can detect and re-identify multi-class objects in multiple images captured by multiple cameras (multi-view). Our approach is based on the premise that incorrect detection results in a specific view can be eliminated using precise cues from other views, given the availability of multi-view images. Unlike most existing multi-view detection algorithms, which assume that objects belong to a single class on the ground plane, our approach can classify multi-class objects without such assumptions and is thus more practical. To classify multi-class objects, we propose an integrated architecture for region proposal, re-identification, and classification. Additionally, we utilize the epipolar geometry constraint to devise a novel re-identification algorithm that does not require assumptions about ground plane assumption. Our model demonstrates competitive performance compared to several baselines on the challenging MessyTable dataset.<\/jats:p>","DOI":"10.1007\/s10044-023-01168-6","type":"journal-article","created":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T02:01:21Z","timestamp":1686621681000},"page":"1059-1070","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["MVDet: multi-view multi-class object detection without ground plane assumption"],"prefix":"10.1007","volume":"26","author":[{"given":"Sola","family":"Park","sequence":"first","affiliation":[]},{"given":"Seungjin","family":"Yang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8895-9117","authenticated-orcid":false,"given":"Hyuk-Jae","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,6,13]]},"reference":[{"key":"1168_CR1","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1016\/j.neucom.2021.07.040","volume":"461","author":"K Hameed","year":"2021","unstructured":"Hameed K, Chai D, Rassau A (2021) Class distribution-aware adaptive margins and cluster embedding for classification of fruit and vegetables at supermarket self-checkouts. Neurocomputing 461:292\u2013309","journal-title":"Neurocomputing"},{"key":"1168_CR2","unstructured":"Rigner A (2019) Ai-based machine vision for retail self-checkout system. Master\u2019s Theses in Mathematical Sciences"},{"issue":"1","key":"1168_CR3","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1109\/TITS.2020.3012034","volume":"23","author":"S Mozaffari","year":"2020","unstructured":"Mozaffari S, Al-Jarrah OY, Dianati M, Jennings P, Mouzakitis A (2020) Deep learning-based vehicle behavior prediction for autonomous driving applications: a review. IEEE Trans Intel Transp Syst 23(1):33\u201347","journal-title":"IEEE Trans Intel Transp Syst"},{"issue":"16","key":"1168_CR4","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1080\/01691864.2017.1365009","volume":"31","author":"HA Pierson","year":"2017","unstructured":"Pierson HA, Gashler MS (2017) Deep learning in robotics: a review of recent research. Adv Robot 31(16):821\u2013835","journal-title":"Adv Robot"},{"key":"1168_CR5","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-cnn. In. In: Proceedings of the IEEE international conference on computer vision, pp. 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"1168_CR6","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"1168_CR7","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Doll\u00e1r P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980\u20132988","DOI":"10.1109\/ICCV.2017.324"},{"key":"1168_CR8","doi-asserted-by":"crossref","unstructured":"Noh J, Lee S, Kim B, Kim G (2018) Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 966\u2013974","DOI":"10.1109\/CVPR.2018.00107"},{"key":"1168_CR9","doi-asserted-by":"crossref","unstructured":"Wang A, Sun Y, Kortylewski A, Yuille AL (2020) Robust object detection under occlusion with context-aware compositionalnets. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 12645\u201312654","DOI":"10.1109\/CVPR42600.2020.01266"},{"issue":"3","key":"1168_CR10","doi-asserted-by":"publisher","first-page":"736","DOI":"10.1007\/s11263-020-01401-3","volume":"129","author":"A Kortylewski","year":"2021","unstructured":"Kortylewski A, Liu Q, Wang A, Sun Y, Yuille A (2021) Compositional convolutional neural networks: a robust and interpretable model for object recognition under occlusion. Int J Comput Vis 129(3):736\u2013760","journal-title":"Int J Comput Vis"},{"key":"1168_CR11","doi-asserted-by":"crossref","unstructured":"Song S, Xiao J (2014) Sliding shapes for 3d object detection in depth images. In: European conference on computer vision, Springer. pp. 634\u2013651","DOI":"10.1007\/978-3-319-10599-4_41"},{"key":"1168_CR12","doi-asserted-by":"crossref","unstructured":"Wang T, He X, Barnes N (2013) Learning structured Hough voting for joint object detection and occlusion reasoning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1790\u20131797","DOI":"10.1109\/CVPR.2013.234"},{"key":"1168_CR13","doi-asserted-by":"crossref","unstructured":"Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918\u2013927","DOI":"10.1109\/CVPR.2018.00102"},{"key":"1168_CR14","doi-asserted-by":"crossref","unstructured":"Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 1631\u20131640","DOI":"10.1109\/CVPR42600.2020.00170"},{"key":"1168_CR15","unstructured":"Zhou Y, Sun P, Zhang Y, Anguelov D, Gao J, Ouyang T, Guo J, Ngiam J, Vasudevan V (2020) End-to-end multi-view fusion for 3d object detection in lidar point clouds. In: Conference on Robot Learning, PMLR pp. 923\u2013932"},{"key":"1168_CR16","doi-asserted-by":"crossref","unstructured":"Roig G, Boix X, Shitrit HB, Fua P (2011) Conditional random fields for multi-camera object detection. In: 2011 International Conference on Computer Vision, IEEE. pp. 563\u2013570","DOI":"10.1109\/ICCV.2011.6126289"},{"key":"1168_CR17","doi-asserted-by":"crossref","unstructured":"Baqu\u00e9 P, Fleuret F, Fua P (2017) Deep occlusion reasoning for multi-camera multi-target detection. In: Proceedings of the IEEE international conference on computer vision, pp. 271\u2013279","DOI":"10.1109\/ICCV.2017.38"},{"key":"1168_CR18","doi-asserted-by":"crossref","unstructured":"Chavdarova T, Fleuret F (2017) Deep multi-camera people detection. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA), IEEE. pp. 848\u2013853","DOI":"10.1109\/ICMLA.2017.00-50"},{"key":"1168_CR19","doi-asserted-by":"crossref","unstructured":"Nassar AS, D\u2019aronco S, Lef\u00e8vre S, Wegner JD (2020) Geograph: Graph-based multi-view object detection with geometric cues end-to-end. In: European conference on computer vision, Springer. pp. 488\u2013504","DOI":"10.1007\/978-3-030-58571-6_29"},{"key":"1168_CR20","doi-asserted-by":"crossref","unstructured":"Cai Z, Zhang J, Ren D, Yu C, Zhao H, Yi S, Yeo CK, Change\u00a0Loy C (2020) Messytable: instance association in multiple camera views. In: European conference on computer vision, Springer. pp. 1\u201316","DOI":"10.1007\/978-3-030-58621-8_1"},{"key":"1168_CR21","doi-asserted-by":"crossref","unstructured":"Hou Y, Zheng L, Gould S (2020) Multiview detection with feature perspective transformation. In: European conference on computer vision, Springer. pp. 1\u201318.","DOI":"10.1007\/978-3-030-58571-6_1"},{"key":"1168_CR22","doi-asserted-by":"crossref","unstructured":"Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815\u2013823","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"1168_CR23","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497"},{"key":"1168_CR24","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"1168_CR25","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European Conference on Computer Vision, Springer. pp. 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1168_CR26","doi-asserted-by":"crossref","unstructured":"Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 10781\u201310790","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"1168_CR27","doi-asserted-by":"crossref","unstructured":"Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp. 3219\u20133228","DOI":"10.1109\/ICCV.2017.349"},{"key":"1168_CR28","doi-asserted-by":"crossref","unstructured":"Wang G, Yuan Y, Chen X, Li J, Zhou X (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on multimedia, pp. 274\u2013282","DOI":"10.1145\/3240508.3240552"},{"key":"1168_CR29","doi-asserted-by":"crossref","unstructured":"Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp. 480\u2013496","DOI":"10.1007\/978-3-030-01225-0_30"},{"key":"1168_CR30","doi-asserted-by":"crossref","unstructured":"Zhao H, Tian M, Sun S, Shao J, Yan J, Yi S, Wang X, Tang X (2017) Spindle net: person re-identification with human body region guided feature decomposition and fusion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1077\u20131085","DOI":"10.1109\/CVPR.2017.103"},{"key":"1168_CR31","doi-asserted-by":"crossref","unstructured":"Xiang Y, Choi W, Lin Y, Savarese S (2015) Data-driven 3d voxel patterns for object category recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1903\u20131911","DOI":"10.1109\/CVPR.2015.7298800"},{"key":"1168_CR32","unstructured":"Chen X, Kundu K, Zhu Y, Berneshawi AG, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. Adv Neural Inf Process Syst. 28"},{"key":"1168_CR33","doi-asserted-by":"crossref","unstructured":"Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2147\u20132156","DOI":"10.1109\/CVPR.2016.236"},{"issue":"11","key":"1168_CR34","doi-asserted-by":"publisher","first-page":"2608","DOI":"10.1109\/TPAMI.2013.87","volume":"35","author":"MZ Zia","year":"2013","unstructured":"Zia MZ, Stark M, Schiele B, Schindler K (2013) Detailed 3d representations for object recognition and modeling. IEEE Trans Pattern Anal Mach Intell 35(11):2608\u20132623","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1168_CR35","unstructured":"Zeeshan Zia M, Stark M, Schindler K (2014) Are cars just 3d boxes?-jointly estimating the 3d shape of multiple objects. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3678\u20133685"},{"key":"1168_CR36","doi-asserted-by":"crossref","unstructured":"Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 945\u2013953","DOI":"10.1109\/ICCV.2015.114"},{"key":"1168_CR37","doi-asserted-by":"crossref","unstructured":"Nassar AS, Lef\u00e8vre S, Wegner JD (2019) Simultaneous multi-view instance detection with learned geometric soft-constraints. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp. 6559\u20136568","DOI":"10.1109\/ICCV.2019.00666"},{"key":"1168_CR38","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556"},{"key":"1168_CR39","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE pp. 248\u2013255","DOI":"10.1109\/CVPR.2009.5206848"},{"issue":"1","key":"1168_CR40","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929\u20131958","journal-title":"J Mach Learn Res"},{"key":"1168_CR41","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980"},{"issue":"2","key":"1168_CR42","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1109\/TPAMI.2008.57","volume":"31","author":"R Kasturi","year":"2008","unstructured":"Kasturi R, Goldgof D, Soundararajan P, Manohar V, Garofolo J, Bowers R, Boonstra M, Korzhova V, Zhang J (2008) Framework for performance evaluation of face, text, and vehicle detection and tracking in video: data, metrics, and protocol. IEEE Trans Pattern Anal Mach Intell 31(2):319\u2013336","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1168_CR43","unstructured":"Han X, Leung T, Jia Y, Sukthankar R, Berg AC (2015) Matchnet: unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3279\u20133286"},{"key":"1168_CR44","doi-asserted-by":"crossref","unstructured":"Xu Y, Liu X, Liu Y, Zhu S-C (2016) Multi-view people tracking via hierarchical trajectory composition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4256\u20134265","DOI":"10.1109\/CVPR.2016.461"},{"key":"1168_CR45","doi-asserted-by":"crossref","unstructured":"Xu Y, Liu X, Qin L, Zhu S-C (2017) Cross-view people tracking by scene-centered spatio-temporal parsing. In: Proceedings of the AAAI conference on artificial intelligence, vol. 31","DOI":"10.1609\/aaai.v31i1.11190"}],"container-title":["Pattern Analysis and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-023-01168-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10044-023-01168-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10044-023-01168-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,22]],"date-time":"2023-07-22T14:09:42Z","timestamp":1690034982000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10044-023-01168-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["1168"],"URL":"https:\/\/doi.org\/10.1007\/s10044-023-01168-6","relation":{},"ISSN":["1433-7541","1433-755X"],"issn-type":[{"type":"print","value":"1433-7541"},{"type":"electronic","value":"1433-755X"}],"subject":[],"published":{"date-parts":[[2023,6,13]]},"assertion":[{"value":"26 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or nonfinancial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}