{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:29:18Z","timestamp":1740122958619,"version":"3.37.3"},"reference-count":70,"publisher":"Springer Science and Business Media LLC","issue":"39","license":[{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T00:00:00Z","timestamp":1718841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001691","name":"KAKENHI","doi-asserted-by":"crossref","award":["21K11816"],"award-info":[{"award-number":["21K11816"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Mirror and glass are ubiquitous materials in the 3D indoor living environment. However, the existing vision system always tends to neglect or misdiagnose them since they always perform the special visual feature of reflectivity or transparency, which causes severe consequences, <jats:italic>i.e.<\/jats:italic>, a robot or drone may crash into a glass wall or be wrongly positioned by the reflections in mirrors, or wireless signals with high frequency may be influenced by these high-reflective materials. The exploration of segmenting mirrors and glass in static images has garnered notable research interest in recent years. However, accurately segmenting mirrors and glass within dynamic scenes remains a formidable challenge, primarily due to the lack of a high-quality dataset and effective methodologies. To accurately segment the mirror and glass regions in videos, this paper proposes key points trajectory and multi-level depth distinction to improve the segmentation quality of mirror and glass regions that are generated by any existing segmentation model. Firstly, key points trajectory is used to extract the special motion feature of reflection in the mirror and glass region. And the distinction in trajectory is used to remove wrong segmentation. Secondly, a multi-level depth map is generated for region and edge segmentation which contributes to the accuracy improvement. Further, an original dataset for video mirror and glass segmentation (MAGD) is constructed, which contains 9,960 images from 36 videos with corresponding manually annotated masks. Extensive experiments demonstrate that the proposed method consistently reduces the segmentation errors generated from various state-of-the-art models and reach the highest successful rate at 0.969, mIoU (mean Intersection over Union) at 0.852, and mPA (mean Pixel Accuracy) at 0.950, which is around 40% - 50% higher on average on an original video mirror and glass dataset.<\/jats:p>","DOI":"10.1007\/s11042-024-19627-5","type":"journal-article","created":{"date-parts":[[2024,6,20]],"date-time":"2024-06-20T08:02:11Z","timestamp":1718870531000},"page":"86513-86535","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Key points trajectory and multi-level depth distinction based refinement for video mirror and glass segmentation"],"prefix":"10.1007","volume":"83","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1783-4816","authenticated-orcid":false,"given":"Ziyue","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanchao","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xina","family":"Cheng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Takeshi","family":"Ikenaga","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,6,20]]},"reference":[{"key":"19627_CR1","doi-asserted-by":"crossref","unstructured":"Gandhi D, Pinto L, Gupta A (2017) Learning to fly by crashing. In: 2017 IEEE\/RSJ International conference on intelligent robots and systems (IROS). IEEE, pp 3948\u20133955","DOI":"10.1109\/IROS.2017.8206247"},{"key":"19627_CR2","doi-asserted-by":"crossref","unstructured":"Dao T-K, Tran T-H, Le T-L, Vu H, Nguyen V-T, Mac D-K, Do N-D, Pham T-T (2016) Indoor navigation assistance system for visually impaired people using multimodal technologies. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) 1\u20136","DOI":"10.1109\/ICARCV.2016.7838771"},{"key":"19627_CR3","doi-asserted-by":"crossref","unstructured":"Dong E, Xu J, Wu C, Liu Y, Yang Z (2019) Pair-navi: peer-to-peer indoor navigation with mobile visual slam. In: IEEE INFOCOM 2019-IEEE conference on computer communications. IEEE, pp 1189\u20131197","DOI":"10.1109\/INFOCOM.2019.8737640"},{"issue":"15","key":"19627_CR4","doi-asserted-by":"publisher","first-page":"3824","DOI":"10.3390\/rs14153824","volume":"14","author":"S Badrloo","year":"2022","unstructured":"Badrloo S, Varshosaz M, Pirasteh S, Li J (2022) Image-based obstacle detection methods for the safe navigation of unmanned vehicles: a review. Remote Sens 14(15):3824","journal-title":"Remote Sens"},{"issue":"8","key":"19627_CR5","doi-asserted-by":"publisher","first-page":"1785","DOI":"10.1109\/TPAMI.2017.2723883","volume":"40","author":"Y Zhang","year":"2017","unstructured":"Zhang Y, Ye M, Manocha D, Yang R (2017) 3d reconstruction in the presence of glass and mirrors by acoustic and visual fusion. IEEE Trans Pattern Anal Mach Intell 40(8):1785\u20131798","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"4","key":"19627_CR6","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1145\/3197517.3201319","volume":"37","author":"T Whelan","year":"2018","unstructured":"Whelan T, Goesele M, Lovegrove SJ, Straub J, Green S, Szeliski R, Butterfield S, Verma S, Newcombe RA, Goesele M et al (2018) Reconstructing scenes with mirror and glass surfaces. ACM Trans Graph 37(4):102\u20131","journal-title":"ACM Trans Graph"},{"issue":"8","key":"19627_CR7","doi-asserted-by":"publisher","first-page":"22995","DOI":"10.1007\/s11042-023-16369-8","volume":"83","author":"Y Liu","year":"2024","unstructured":"Liu Y, Cheng X, Ikenaga T (2024) Motion-aware and data-independent model based multi-view 3d pose refinement for volleyball spike analysis. Multimedia Tools Appl 83(8):22995\u201323018","journal-title":"Multimedia Tools Appl"},{"key":"19627_CR8","doi-asserted-by":"crossref","unstructured":"Liu H, Iwamoto N, Zhu Z, Li Z, Zhou Y, Bozkurt E, Zheng B (2022) Disco: Disentangled implicit content and rhythm learning for diverse co-speech gestures synthesis. In: Proceedings of the 30th ACM international conference on multimedia. pp 3764\u20133773","DOI":"10.1145\/3503161.3548400"},{"key":"19627_CR9","doi-asserted-by":"crossref","unstructured":"Fang Q, Shuai Q, Dong J, Bao H, Zhou X (2021) Reconstructing 3d human pose by watching humans in the mirror. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 12814\u201312823","DOI":"10.1109\/CVPR46437.2021.01262"},{"issue":"9","key":"19627_CR10","doi-asserted-by":"publisher","first-page":"9116","DOI":"10.1109\/JIOT.2020.3004008","volume":"7","author":"Y Zhang","year":"2020","unstructured":"Zhang Y, Chen C, Yang S, Zhang J, Chu X, Zhang J (2020) How friendly are building materials as reflectors to indoor los mimo communications? IEEE Internet Things J 7(9):9116\u20139127","journal-title":"IEEE Internet Things J"},{"key":"19627_CR11","doi-asserted-by":"crossref","unstructured":"Yang X, Mei H, Xu K, Wei X, Yin B, Lau RWH (2019) Where is my mirror? In: Proc IEEE Int Conf Comput Vis (ICCV). pp 8808\u20138817","DOI":"10.1109\/ICCV.2019.00890"},{"key":"19627_CR12","doi-asserted-by":"crossref","unstructured":"Mei H, Yang X, Wang Y, Liu Y-A, He S, Zhang Q, Wei X, Lau RWH (2020) Don\u2019t hit me! glass detection in real-world scenes. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 3684\u20133693","DOI":"10.1109\/CVPR42600.2020.00374"},{"issue":"2","key":"19627_CR13","doi-asserted-by":"publisher","first-page":"3067","DOI":"10.32604\/cmc.2024.049512","volume":"79","author":"Arnab Dey D-NL Samit Biswas","year":"2024","unstructured":"Arnab Dey D-NL Samit Biswas (2024) Workout action recognition in video streams using an attention driven residual dc-gru network. Comput, Mater Continua 79(2):3067\u20133087","journal-title":"Comput, Mater Continua"},{"issue":"9","key":"19627_CR14","doi-asserted-by":"publisher","first-page":"25643","DOI":"10.1007\/s11042-023-16041-1","volume":"83","author":"J Wang","year":"2024","unstructured":"Wang J, Wang Z, Zhuang S, Hao Y, Wang H (2024) Cross-enhancement transformer for action segmentation. Multimedia Tools Appl 83(9):25643\u201325656","journal-title":"Multimedia Tools Appl"},{"key":"19627_CR15","doi-asserted-by":"crossref","unstructured":"Li Z, Huang M, Yang Y, Li Z, Wang L (2022) A mirror detection method in the indoor environment using a laser sensor. Math Probl Eng 2022","DOI":"10.1155\/2022\/9621694"},{"key":"19627_CR16","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1016\/j.robot.2016.11.003","volume":"88","author":"X Wang","year":"2017","unstructured":"Wang X, Wang J (2017) Detecting glass in simultaneous localisation and mapping. Rob Auton Syst 88:97\u2013103","journal-title":"Rob Auton Syst"},{"key":"19627_CR17","doi-asserted-by":"crossref","unstructured":"Wu S, Wang S (2021) Method for detecting glass wall with lidar and ultrasonic sensor. In: Proc. IEEE 3rd Eurasia Conf. IOT, Commun. Eng. (ECICE). pp 163\u2013168","DOI":"10.1109\/ECICE52819.2021.9645614"},{"key":"19627_CR18","doi-asserted-by":"crossref","unstructured":"Mei H, Dong B, Dong W, Yang J, Baek S-H, Heide F, Peers P, Wei X, Yang X (2022) Glass segmentation using intensity and spectral polarization cues. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 12622\u201312631","DOI":"10.1109\/CVPR52688.2022.01229"},{"key":"19627_CR19","doi-asserted-by":"publisher","first-page":"1911","DOI":"10.1109\/TIP.2023.3256762","volume":"32","author":"D Huo","year":"2023","unstructured":"Huo D, Wang J, Qian Y, Yang Y-H (2023) Glass segmentation with rgb-thermal image pairs. IEEE Trans Image Process 32:1911\u20131926","journal-title":"IEEE Trans Image Process"},{"key":"19627_CR20","doi-asserted-by":"crossref","unstructured":"Xu Y, Nagahara H, Shimada A, Taniguchi R-i (2015) Transcut: transparent object segmentation from a light-field image. In: Proceedings of the IEEE international conference on computer vision. pp 3442\u20133450","DOI":"10.1109\/ICCV.2015.393"},{"key":"19627_CR21","doi-asserted-by":"crossref","unstructured":"Zhu Y, Qiu J, Ren B (2021) Transfusion: a novel slam method focused on transparent objects. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 6019\u20136028","DOI":"10.1109\/ICCV48922.2021.00596"},{"key":"19627_CR22","doi-asserted-by":"crossref","unstructured":"Mei H, Dong B, Dong W, Peers P, Yang X, Zhang Q, Wei X (2021) Depth-aware mirror segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 3044\u20133053","DOI":"10.1109\/CVPR46437.2021.00306"},{"key":"19627_CR23","doi-asserted-by":"crossref","unstructured":"Tondin Ferreira\u00a0Dias E, Vieira\u00a0Neto H, Schneider FK (2020) A compressed sensing approach for multiple obstacle localisation using sonar sensors in air. Sens 20(19):5511","DOI":"10.3390\/s20195511"},{"issue":"3","key":"19627_CR24","first-page":"3492","volume":"45","author":"X Tan","year":"2022","unstructured":"Tan X, Lin J, Xu K, Chen P, Ma L, Lau RW (2022) Mirror detection with the visual chirality cue. IEEE Trans Pattern Anal Mach Intell 45(3):3492\u20133504","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"19627_CR25","doi-asserted-by":"crossref","unstructured":"He H, Li X, Cheng G, Shi J, Tong Y, Meng G, Prinet V, Weng L (2021) Enhanced boundary learning for glass-like object segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 15859\u201315868","DOI":"10.1109\/ICCV48922.2021.01556"},{"key":"19627_CR26","doi-asserted-by":"crossref","unstructured":"Pei G, Shen F, Yao Y, Xie G-S, Tang Z, Tang J (2022) Hierarchical feature alignment network for unsupervised video object segmentation. In: European conference on computer vision. Springer, pp 596\u2013613","DOI":"10.1007\/978-3-031-19830-4_34"},{"key":"19627_CR27","doi-asserted-by":"crossref","unstructured":"Schmidt C, Athar A, Mahadevan S, Leibe B (2022) D2conv3d: dynamic dilated convolutions for object segmentation in videos. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision. pp 1200\u20131209","DOI":"10.1109\/WACV51458.2022.00199"},{"key":"19627_CR28","doi-asserted-by":"crossref","unstructured":"Cho S, Lee M, Lee S, Park C, Kim D, Lee S (2023) Treating motion as option to reduce motion dependency in unsupervised video object segmentation. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision. pp 5140\u20135149","DOI":"10.1109\/WACV56688.2023.00511"},{"key":"19627_CR29","doi-asserted-by":"crossref","unstructured":"Yuan Y, Wang Y, Wang L, Zhao X, Lu H, Wang Y, Su W, Zhang L (2023) Isomer: isomerous transformer for zero-shot video object segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 966\u2013976","DOI":"10.1109\/ICCV51070.2023.00095"},{"key":"19627_CR30","doi-asserted-by":"crossref","unstructured":"Tan Y, Chen L, Zheng C, Ling H, Lai X (2024) Saeformer: stepwise attention emphasis transformer for polyp segmentation. Multimedia Tools Appl 1\u201321","DOI":"10.1007\/s11042-024-18515-2"},{"key":"19627_CR31","doi-asserted-by":"crossref","unstructured":"Miao B, Bennamoun M, Gao Y, Mian A (2024) Region aware video object segmentation with deep motion modeling. IEEE Trans Image Process","DOI":"10.1109\/TIP.2024.3381445"},{"key":"19627_CR32","doi-asserted-by":"crossref","unstructured":"Tokmakov P, Alahari K, Schmid C (2017) Learning video object segmentation with visual memory. In: Proceedings of the IEEE international conference on computer vision. pp 4481\u20134490","DOI":"10.1109\/ICCV.2017.480"},{"key":"19627_CR33","doi-asserted-by":"crossref","unstructured":"Zhang K, Zhao Z, Liu D, Liu Q, Liu B (2021) Deep transport network for unsupervised video object segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 8781\u20138790","DOI":"10.1109\/ICCV48922.2021.00866"},{"key":"19627_CR34","doi-asserted-by":"crossref","unstructured":"Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi SC, Ling H (2019) Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition pp 3064\u20133074","DOI":"10.1109\/CVPR.2019.00318"},{"key":"19627_CR35","doi-asserted-by":"crossref","unstructured":"Wang W, Lu X, Shen J, Crandall DJ, Shao L (2019) Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 9236\u20139245","DOI":"10.1109\/ICCV.2019.00933"},{"key":"19627_CR36","doi-asserted-by":"crossref","unstructured":"Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr PH (2019) Anchor diffusion for unsupervised video object segmentation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 931\u2013940","DOI":"10.1109\/ICCV.2019.00102"},{"key":"19627_CR37","doi-asserted-by":"crossref","unstructured":"Wang Z, Liu Y, Cheng X, Ikenaga T (2023) Key points trajectory and predicted-real frames distinction based mirror and glass detection for indoor 5g signal analysis. In: Journal of physics: conference series, vol 2522. IOP Publishing, p 012033","DOI":"10.1088\/1742-6596\/2522\/1\/012033"},{"key":"19627_CR38","doi-asserted-by":"publisher","first-page":"1874","DOI":"10.1109\/TRO.2021.3075644","volume":"37","author":"C Campos","year":"2020","unstructured":"Campos C, Elvira R, Rodr\u2019iguez JJG, Montiel JMM, Tard\u00f3s JD (2020) Orb-slam3: an accurate open-source library for visual, visual-inertial, and multimap slam. IEEE Trans Robot 37:1874\u20131890","journal-title":"IEEE Trans Robot"},{"issue":"2s","key":"19627_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3566127","volume":"19","author":"H Mei","year":"2023","unstructured":"Mei H, Yu L, Xu K, Wang Y, Yang X, Wei X, Lau RW (2023) Mirror segmentation via semantic-aware contextual contrasted feature learning. ACM Trans Multimedia Comput Commun Appl 19(2s):1\u201322","journal-title":"ACM Trans Multimedia Comput Commun Appl"},{"key":"19627_CR40","doi-asserted-by":"crossref","unstructured":"Lin J, He Z, Lau RW (2021) Rich context aggregation with reflection prior for glass surface detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 13415\u201313424","DOI":"10.1109\/CVPR46437.2021.01321"},{"key":"19627_CR41","doi-asserted-by":"crossref","unstructured":"Lin J, Wang G, Lau RW (2020) Progressive mirror detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 3697\u20133705","DOI":"10.1109\/CVPR42600.2020.00375"},{"key":"19627_CR42","doi-asserted-by":"publisher","first-page":"2920","DOI":"10.1109\/TIP.2022.3162709","volume":"31","author":"L Yu","year":"2022","unstructured":"Yu L, Mei H, Dong W, Wei Z, Zhu L, Wang Y, Yang X (2022) Progressive glass segmentation. IEEE Trans Image Process 31:2920\u20132933","journal-title":"IEEE Trans Image Process"},{"key":"19627_CR43","first-page":"22490","volume":"35","author":"J Lin","year":"2022","unstructured":"Lin J, Yeung Y-H, Lau R (2022) Exploiting semantic relations for glass surface detection. Advances in Neural Information Processing Systems 35:22490\u201322504","journal-title":"Advances in Neural Information Processing Systems"},{"key":"19627_CR44","doi-asserted-by":"crossref","unstructured":"Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV). pp 715\u2013731","DOI":"10.1007\/978-3-030-01252-6_44"},{"key":"19627_CR45","doi-asserted-by":"crossref","unstructured":"Siam M, Jiang C, Lu S, Petrich L, Gamal M, Elhoseiny M, Jagersand M (2019) Video object segmentation using teacher-student adaptation in a human robot interaction (hri) setting. In: 2019 International conference on robotics and automation (ICRA). IEEE, pp 50\u201356","DOI":"10.1109\/ICRA.2019.8794254"},{"key":"19627_CR46","doi-asserted-by":"crossref","unstructured":"Song H, Su T, Zheng Y, Zhang K, Liu B, Liu D (2024) Generalizable fourier augmentation for unsupervised video object segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 38. pp 4918\u20134924","DOI":"10.1609\/aaai.v38i5.28295"},{"key":"19627_CR47","doi-asserted-by":"crossref","unstructured":"Fedynyak V, Romanus Y, Hlovatskyi B, Sydor B, Dobosevych O, Babin I, Riazantsev R (2024) Devos: flow-guided deformable transformer for video object segmentation. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision. pp 240\u2013249","DOI":"10.1109\/WACV57701.2024.00031"},{"key":"19627_CR48","doi-asserted-by":"crossref","unstructured":"Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 3623\u20133632","DOI":"10.1109\/CVPR.2019.00374"},{"key":"19627_CR49","doi-asserted-by":"crossref","unstructured":"Zhou T, Wang S, Zhou Y, Yao Y, Li J, Shao L (2020) Motion-attentive transition for zero-shot video object segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34. pp 13066\u201313073","DOI":"10.1609\/aaai.v34i07.7008"},{"key":"19627_CR50","doi-asserted-by":"crossref","unstructured":"Zhang L, Zhang J, Lin Z, M\u011bch R, Lu H, He Y (2020) Unsupervised video object segmentation with joint hotspot tracking. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XIV 16. Springer, pp 490\u2013506 (2020)","DOI":"10.1007\/978-3-030-58568-6_29"},{"key":"19627_CR51","doi-asserted-by":"crossref","unstructured":"Zhen M, Li S, Zhou L, Shang J, Feng H, Fang T, Quan L (2020) Learning discriminative feature with crf for unsupervised video object segmentation. In: Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXVII 16. Springer, pp 445\u2013462","DOI":"10.1007\/978-3-030-58583-9_27"},{"key":"19627_CR52","unstructured":"Mahadevan S, Athar A, O\u0161ep A, Hennen S, Leal-Taix\u00e9 L, Leibe B (2020) Making a case for 3d convolutions for object segmentation in videos. arXiv:2008.11516"},{"key":"19627_CR53","doi-asserted-by":"crossref","unstructured":"Liu D, Yu D, Wang C, Zhou P (2021) F2net: learning to focus on the foreground for unsupervised video object segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35. pp 2109\u20132117","DOI":"10.1609\/aaai.v35i3.16308"},{"key":"19627_CR54","doi-asserted-by":"crossref","unstructured":"Ren S, Liu W, Liu Y, Chen H, Han G, He S (2021) Reciprocal transformations for unsupervised video object segmentation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 15455\u201315464","DOI":"10.1109\/CVPR46437.2021.01520"},{"key":"19627_CR55","doi-asserted-by":"crossref","unstructured":"Perazzi F, Pont-Tuset J, McWilliams B, Van\u00a0Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 724\u2013732","DOI":"10.1109\/CVPR.2016.85"},{"key":"19627_CR56","doi-asserted-by":"crossref","unstructured":"Strasdat H, Montiel J, Davison AJ (2010) Scale drift-aware large scale monocular slam. Robot: Sci Syst VI 2(3):7","DOI":"10.15607\/RSS.2010.VI.010"},{"issue":"2","key":"19627_CR57","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1109\/4.996","volume":"23","author":"N Kanopoulos","year":"1988","unstructured":"Kanopoulos N, Vasanthavada N, Baker RL (1988) Design of an image edge detection filter using the sobel operator. IEEE J Solid-State Circ 23(2):358\u2013367","journal-title":"IEEE J Solid-State Circ"},{"key":"19627_CR58","unstructured":"Suzuki T, IKENAGA T (2014) Spatio-temporal feature and mrf based keypoint of interest for cloud video recognition. IIEEJ Trans Image Electron Visual Comput 2(2):150\u2013158"},{"key":"19627_CR59","doi-asserted-by":"crossref","unstructured":"Barath D, Matas J (2018) Graph-cut ransac. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp 6733\u20136741","DOI":"10.1109\/CVPR.2018.00704"},{"key":"19627_CR60","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2020\/8825205","volume":"2020","author":"A Mahdaoui","year":"2020","unstructured":"Mahdaoui A, Sbai EH (2020) 3d point cloud simplification based on k-nearest neighbor and clustering. Adv Multimedia 2020:1\u201310","journal-title":"Adv Multimedia"},{"issue":"2","key":"19627_CR61","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"S Lloyd","year":"1982","unstructured":"Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theor 28(2):129\u2013137","journal-title":"IEEE Trans Inf Theor"},{"issue":"9","key":"19627_CR62","doi-asserted-by":"publisher","first-page":"2548","DOI":"10.1007\/s11263-021-01484-6","volume":"129","author":"J-W Bian","year":"2021","unstructured":"Bian J-W, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng M-M, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vision 129(9):2548\u20132564","journal-title":"Int J Comput Vision"},{"issue":"6","key":"19627_CR63","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1109\/TPAMI.2013.242","volume":"36","author":"P Ochs","year":"2013","unstructured":"Ochs P, Malik J, Brox T (2013) Segmentation of moving objects by long term video analysis. IEEE Trans Pattern Anal Mach Intell 36(6):1187\u20131200","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"19627_CR64","doi-asserted-by":"crossref","unstructured":"Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE international conference on computer vision. pp 2192\u20132199","DOI":"10.1109\/ICCV.2013.273"},{"key":"19627_CR65","doi-asserted-by":"crossref","unstructured":"Zheng Z, Huang G, Yuan X, Pun C-M, Liu H, Ling W-K (2022) Quaternion-valued correlation learning for few-shot semantic segmentation. IEEE Trans Circ Syst Video Technol","DOI":"10.1109\/TCSVT.2022.3223150"},{"key":"19627_CR66","first-page":"17864","volume":"34","author":"B Cheng","year":"2021","unstructured":"Cheng B, Schwing A, Kirillov A (2021) Per-pixel classification is not all you need for semantic segmentation. Adv Neural Inf Process Syst 34:17864\u201317875","journal-title":"Adv Neural Inf Process Syst"},{"key":"19627_CR67","first-page":"12077","volume":"34","author":"E Xie","year":"2021","unstructured":"Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P (2021) Segformer: simple and efficient design for semantic segmentation with transformers. Adv Neural Inf Process Syst 34:12077\u201312090","journal-title":"Adv Neural Inf Process Syst"},{"issue":"3","key":"19627_CR68","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","volume":"8","author":"W Wang","year":"2022","unstructured":"Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2022) Pvt v2: improved baselines with pyramid vision transformer. Comput Visual Media 8(3):415\u2013424","journal-title":"Comput Visual Media"},{"key":"19627_CR69","doi-asserted-by":"crossref","unstructured":"Yu W, Luo M, Zhou P, Si C, Zhou Y, Wang X, Feng J, Yan S (2022) Metaformer is actually what you need for vision. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 10819\u201310829","DOI":"10.1109\/CVPR52688.2022.01055"},{"key":"19627_CR70","doi-asserted-by":"crossref","unstructured":"Liu Z, Mao H, Wu C-Y, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. pp 11976\u201311986","DOI":"10.1109\/CVPR52688.2022.01167"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-19627-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-024-19627-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-024-19627-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T13:17:26Z","timestamp":1732022246000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-024-19627-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,20]]},"references-count":70,"journal-issue":{"issue":"39","published-online":{"date-parts":[[2024,11]]}},"alternative-id":["19627"],"URL":"https:\/\/doi.org\/10.1007\/s11042-024-19627-5","relation":{},"ISSN":["1573-7721"],"issn-type":[{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2024,6,20]]},"assertion":[{"value":"12 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 June 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 June 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}