{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T16:14:15Z","timestamp":1776442455164,"version":"3.51.2"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,5,6]],"date-time":"2023-05-06T00:00:00Z","timestamp":1683331200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,6]],"date-time":"2023-05-06T00:00:00Z","timestamp":1683331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62075031"],"award-info":[{"award-number":["62075031"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recently proposed state-of-the-art saliency detection models rely heavily on labeled datasets and rarely focus on perfect RGBD feature fusion, which lowers their generalization ability. In this paper, we propose a depth-based interaction and refinement network (DIR-Net) to fully leverage the depth information provided with RGB images to generate and refine the corresponding saliency segmentation maps. In total, three modules are included in our framework. A depth-based refinement module (DRM) and an RGB module work in parallel while coordinating via interactive spatial guidance modules (ISGMs), which utilize spatial and channel attention computed from both depth features and RGB features. In each layer, the features in both modules are refined and guided by the spatial information obtained from the other module through ISGMs. In the RGB module, before sending the depth-guided feature map to the decoder, a convolutional gated recurrent unit (ConvGRU)-based block is introduced to handle temporal information. Thinking about the clear movement information in RGB features, the block also guides temporal information in DRM. By merging the results from both the DRM and RGB modules, a segmentation map with distinct boundaries is generated. Considering the lack of depth images in popular public datasets, we utilize a depth estimation network that incorporates manual postprocessing-based correction to generate depth images on the DAVIS and UVSD datasets. The state-of-the-art performance achieved on both the original and new datasets illustrates the advantage of our RGBD feature fusion strategy, with a real-time speed of 19 fps on a single GPU.<\/jats:p>","DOI":"10.1007\/s40747-023-01072-w","type":"journal-article","created":{"date-parts":[[2023,5,6]],"date-time":"2023-05-06T09:02:17Z","timestamp":1683363737000},"page":"6343-6358","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Salient object detection for RGBD video via spatial interaction and depth-based boundary refinement"],"prefix":"10.1007","volume":"9","author":[{"given":"Yujian","family":"Zhang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziyan","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ping","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengnan","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,5,6]]},"reference":[{"issue":"5","key":"1072_CR1","doi-asserted-by":"publisher","first-page":"769","DOI":"10.1109\/TCSVT.2013.2280096","volume":"24","author":"Z Ren","year":"2013","unstructured":"Ren Z, Gao S, Chia L, Tsang IW (2013) Region-based saliency detection and its application in object recognition. IEEE Trans Circ Syst Video Technol 24(5):769\u2013779. https:\/\/doi.org\/10.1109\/TCSVT.2013.2280096","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"1072_CR2","doi-asserted-by":"publisher","unstructured":"Fan D, Wang W, Cheng M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 8554\u20138564 . https:\/\/doi.org\/10.1109\/CVPR.2019.00875","DOI":"10.1109\/CVPR.2019.00875"},{"key":"1072_CR3","doi-asserted-by":"publisher","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431\u20133440 . https:\/\/doi.org\/10.1109\/cvpr.2015.7298965","DOI":"10.1109\/cvpr.2015.7298965"},{"key":"1072_CR4","doi-asserted-by":"publisher","unstructured":"Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 23\u201330 . IEEE. https:\/\/doi.org\/10.1109\/CVPRW.2012.6239191","DOI":"10.1109\/CVPRW.2012.6239191"},{"key":"1072_CR5","doi-asserted-by":"publisher","unstructured":"Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440\u20131448. https:\/\/doi.org\/10.1109\/ICCV.2015.169","DOI":"10.1109\/ICCV.2015.169"},{"key":"1072_CR6","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1109\/TPAMI.2016.2577031","volume":"28","author":"S Ren","year":"2015","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances Neural Inform Process Syst 28:91\u201399. https:\/\/doi.org\/10.1109\/TPAMI.2016.2577031","journal-title":"Advances Neural Inform Process Syst"},{"key":"1072_CR7","doi-asserted-by":"publisher","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431\u20133440 . https:\/\/doi.org\/10.1109\/CVPR.2015.7298965","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"1072_CR8","doi-asserted-by":"publisher","unstructured":"Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1265\u20131274 . https:\/\/doi.org\/10.1109\/CVPR.2015.7298731","DOI":"10.1109\/CVPR.2015.7298731"},{"key":"1072_CR9","doi-asserted-by":"publisher","unstructured":"Li G, Yu Y (2016) Deep contrast learning for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 478\u2013487 . https:\/\/doi.org\/10.1109\/CVPR.2016.58","DOI":"10.1109\/CVPR.2016.58"},{"issue":"9","key":"1072_CR10","doi-asserted-by":"publisher","first-page":"3366","DOI":"10.1109\/TIP.2013.2264820","volume":"22","author":"K M\u00fcller","year":"2013","unstructured":"M\u00fcller K, Schwarz H, Marpe D, Bartnik C, Bosse S, Brust H, Hinz T, Lakshman H, Merkle P, Rhee FH (2013) 3d high-efficiency video coding for multi-view video and depth data. IEEE Trans Image Process 22(9):3366\u20133378. https:\/\/doi.org\/10.1109\/TIP.2013.2264820","journal-title":"IEEE Trans Image Process"},{"key":"1072_CR11","doi-asserted-by":"publisher","unstructured":"Urvoy M, Barkowsky M, Cousseau R, Koudota Y, Ricorde V, Le\u00a0Callet P, Gutierrez J, Garcia N (2012)Nama3ds1-cospad1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3d stereoscopic sequences. In: 2012 Fourth International Workshop on Quality of Multimedia Experience, pp. 109\u2013114 . IEEE. https:\/\/doi.org\/10.1109\/QoMEX.2012.6263847","DOI":"10.1109\/QoMEX.2012.6263847"},{"key":"1072_CR12","doi-asserted-by":"publisher","unstructured":"Godard C, Mac\u00a0Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 3828\u20133838 . https:\/\/doi.org\/10.1109\/ICCV.2019.00393","DOI":"10.1109\/ICCV.2019.00393"},{"key":"1072_CR13","doi-asserted-by":"publisher","unstructured":"Ren S, Han C, Yang X, Han G, He S (2020) Tenet: Triple excitation network for video salient object detection. In: European Conference on Computer Vision, pp. 212\u2013228 . Springer. https:\/\/doi.org\/10.1109\/ICCV.2019.00248","DOI":"10.1109\/ICCV.2019.00248"},{"key":"1072_CR14","doi-asserted-by":"publisher","unstructured":"Yan P, Li G, Xie Y, Li Z, Wang C, Chen T, Lin L (2019) Semi-supervised video salient object detection using pseudo-labels. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 7284\u201372930 . https:\/\/doi.org\/10.1109\/ICCV.2019.00738","DOI":"10.1109\/ICCV.2019.00738"},{"key":"1072_CR15","doi-asserted-by":"publisher","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234\u2013241. Springer.https:\/\/doi.org\/10.1007\/978-3-319-24574-4_28","DOI":"10.1007\/978-3-319-24574-4_28"},{"issue":"11","key":"1072_CR16","doi-asserted-by":"publisher","first-page":"1254","DOI":"10.1109\/34.730558","volume":"20","author":"L Itti","year":"1998","unstructured":"Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254\u20131259. https:\/\/doi.org\/10.1109\/34.730558","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1072_CR17","doi-asserted-by":"publisher","unstructured":"Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1\u20138. Ieee. https:\/\/doi.org\/10.1109\/CVPR.2007.383267","DOI":"10.1109\/CVPR.2007.383267"},{"key":"1072_CR18","doi-asserted-by":"publisher","unstructured":"Jiang B, Zhang L, Lu H, Yang C, Yang MH (2013) Saliency detection via absorbing markov chain. In: IEEE International Conference on Computer Vision. https:\/\/doi.org\/10.1109\/ICCV.2013.209","DOI":"10.1109\/ICCV.2013.209"},{"issue":"12","key":"1072_CR19","doi-asserted-by":"publisher","first-page":"2067","DOI":"10.1109\/TCSVT.2013.2270367","volume":"23","author":"Y Li","year":"2013","unstructured":"Li Y, Sheng B, Ma L, Wu W, Xie Z (2013) Temporally coherent video saliency using regional dynamic contrast. IEEE Trans Circ Syst Video Technol 23(12):2067\u20132076. https:\/\/doi.org\/10.1109\/TCSVT.2013.2270367","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"1072_CR20","doi-asserted-by":"publisher","unstructured":"Fang G, Wang W, Shen J, Ling S, Yuan YT (2017) Video saliency detection using object proposals. IEEE Trans Cybern PP(99), 1\u201312 . https:\/\/doi.org\/10.1109\/TCYB.2017.2761361","DOI":"10.1109\/TCYB.2017.2761361"},{"issue":"11","key":"1072_CR21","doi-asserted-by":"publisher","first-page":"4185","DOI":"10.1109\/TIP.2015.2460013","volume":"24","author":"W Wang","year":"2015","unstructured":"Wang W, Shen J, Shao L (2015) Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans Image Process 24(11):4185\u20134196. https:\/\/doi.org\/10.1109\/TIP.2015.2460013","journal-title":"IEEE Trans Image Process"},{"issue":"1","key":"1072_CR22","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1109\/TPAMI.2017.2662005","volume":"40","author":"W Wang","year":"2017","unstructured":"Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20\u201333. https:\/\/doi.org\/10.1109\/TPAMI.2017.2662005","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1072_CR23","doi-asserted-by":"publisher","unstructured":"Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5455\u20135463. https:\/\/doi.org\/10.1109\/CVPR.2015.7299184","DOI":"10.1109\/CVPR.2015.7299184"},{"key":"1072_CR24","doi-asserted-by":"publisher","unstructured":"Lee G, Tai Y-W, Kim J (2016) Deep saliency with encoded low level distance map and high level features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 660\u2013668 . https:\/\/doi.org\/10.1109\/CVPR.2016.78","DOI":"10.1109\/CVPR.2016.78"},{"issue":"1","key":"1072_CR25","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1109\/TIP.2017.2754941","volume":"27","author":"W Wang","year":"2017","unstructured":"Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38\u201349. https:\/\/doi.org\/10.1109\/TIP.2017.2754941","journal-title":"IEEE Trans Image Process"},{"key":"1072_CR26","doi-asserted-by":"publisher","unstructured":"Wang W, Shen J, Guo F, Cheng M, Borji A (2018) Revisiting video saliency: a large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894\u20134903 . https:\/\/doi.org\/10.1109\/CVPR.2018.00514","DOI":"10.1109\/CVPR.2018.00514"},{"key":"1072_CR27","doi-asserted-by":"publisher","unstructured":"Song H, Wang W, Zhao S, Shen J, Lam K (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 715\u2013731 . https:\/\/doi.org\/10.1007\/978-3-030-01252-6_44","DOI":"10.1007\/978-3-030-01252-6_44"},{"key":"1072_CR28","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1002\/acs.3396","volume":"36","author":"Z Zhuang","year":"2022","unstructured":"Zhuang Z, Tao H, Chen Y, Stojanovic V, Paszke W (2022) Iterative learning control for repetitive tasks with randomly varying trial lengths using successive projection. Int J Adapt Control Signal Proces 36:1196\u20131215. https:\/\/doi.org\/10.1002\/acs.3396","journal-title":"Int J Adapt Control Signal Proces"},{"key":"1072_CR29","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s00034-013-9633-0","volume":"33","author":"V Stojanovic","year":"2014","unstructured":"Stojanovic V, Filipovic V (2014) Adaptive input design for identification of output error model with constrained output. Circ Syst Signal Process 33:97\u2013113. https:\/\/doi.org\/10.1007\/s00034-013-9633-0","journal-title":"Circ Syst Signal Process"},{"key":"1072_CR30","doi-asserted-by":"publisher","unstructured":"Wei T, Li X, Stojanovic V (2021) Input-to-state stability of impulsive reaction-diffusion neural networks with infinite distributed delays. Nonlinear Dyn:1733\u20131755. https:\/\/doi.org\/10.1007\/s11071-021-06208-6","DOI":"10.1007\/s11071-021-06208-6"},{"issue":"8","key":"1072_CR31","doi-asserted-by":"publisher","first-page":"1309","DOI":"10.1109\/TCSVT.2014.2381471","volume":"25","author":"J Han","year":"2015","unstructured":"Han J, Zhang D, Hu X, Guo L, Ren J, Wu F (2015) Background prior-based salient object detection via deep reconstruction residual. IEEE Transa Circ Syst Video Technol 25(8):1309\u20131321. https:\/\/doi.org\/10.1109\/TCSVT.2014.2381471","journal-title":"IEEE Transa Circ Syst Video Technol"},{"issue":"4","key":"1072_CR32","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1109\/TCYB.2018.2793278","volume":"49","author":"Y Zhou","year":"2019","unstructured":"Zhou Y, Huo S, Xiang W, Hou C, Kung S-Y (2019) Semi-supervised salient object detection using a linear feedback control system model. IEEE Trans Cybern 49(4):1173\u20131185. https:\/\/doi.org\/10.1109\/TCYB.2018.2793278","journal-title":"IEEE Trans Cybern"},{"key":"1072_CR33","doi-asserted-by":"publisher","unstructured":"Fan X, Zhi L, Sun G (2014) Salient region detection for stereoscopic images. In: International Conference on Digital Signal Processing . https:\/\/doi.org\/10.1109\/ICDSP.2014.6900706","DOI":"10.1109\/ICDSP.2014.6900706"},{"key":"1072_CR34","doi-asserted-by":"publisher","unstructured":"Li N, Sun B, Yu J (2015) A weighted sparse coding framework for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5216\u20135223 . https:\/\/doi.org\/10.1109\/CVPR.2015.7299158","DOI":"10.1109\/CVPR.2015.7299158"},{"key":"1072_CR35","doi-asserted-by":"publisher","unstructured":"Qu L, He S, Zhang J, Tian J, Tang Y, Yang Q (2016) Rgbd salient object detection via deep fusion. In: IEEE Transactions on Image Processing PP(99) . https:\/\/doi.org\/10.1109\/TIP.2017.2682981","DOI":"10.1109\/TIP.2017.2682981"},{"key":"1072_CR36","doi-asserted-by":"publisher","unstructured":"Zhang M, Ren W, Piao Y, Rong Z, Lu H (2020) Select, supplement and focus for rgb-d saliency detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3472\u20133481 . https:\/\/doi.org\/10.1109\/CVPR42600.2020.00353","DOI":"10.1109\/CVPR42600.2020.00353"},{"key":"1072_CR37","doi-asserted-by":"publisher","unstructured":"Zhao J, Cao Y, Fan D, Cheng M, Li X, Zhang L (2019) Contrast prior and fluid pyramid integration for rgbd salient object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3927\u20133936. https:\/\/doi.org\/10.1109\/CVPR.2019.00405","DOI":"10.1109\/CVPR.2019.00405"},{"key":"1072_CR38","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1016\/j.neucom.2019.10.024","volume":"377","author":"P Zhang","year":"2020","unstructured":"Zhang P, Liu J, Wang X, Pu T, Fei C, Guo Z (2020) Stereoscopic video saliency detection based on spatiotemporal correlation and depth confidence optimization. Neurocomputing 377:256\u2013268. https:\/\/doi.org\/10.1016\/j.neucom.2019.10.024","journal-title":"Neurocomputing"},{"issue":"4","key":"1072_CR39","doi-asserted-by":"publisher","first-page":"1476","DOI":"10.1109\/TIP.2014.2303640","volume":"23","author":"H Kim","year":"2014","unstructured":"Kim H, Lee S, Bovik AC (2014) Saliency prediction on stereoscopic videos. IEEE Trans Image Process 23(4):1476\u20131490. https:\/\/doi.org\/10.1109\/TIP.2014.2303640","journal-title":"IEEE Trans Image Process"},{"issue":"9","key":"1072_CR40","doi-asserted-by":"publisher","first-page":"3910","DOI":"10.1109\/TIP.2014.2336549","volume":"23","author":"Y Fang","year":"2014","unstructured":"Fang Y, Wang Z, Lin W, Fang Z (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans Image Process 23(9):3910\u20133921. https:\/\/doi.org\/10.1109\/TIP.2014.2336549","journal-title":"IEEE Trans Image Process"},{"key":"1072_CR41","doi-asserted-by":"publisher","unstructured":"Ferreira L, da Silva\u00a0Cruz, LA, Assuncao P (2015) A method to compute saliency regions in 3d video based on fusion of feature maps. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1\u20136 . IEEE. https:\/\/doi.org\/10.1109\/ICME.2015.7177474","DOI":"10.1109\/ICME.2015.7177474"},{"key":"1072_CR42","doi-asserted-by":"publisher","unstructured":"Zhang Y, Jiang G, Yu M, Chen K (2010) Stereoscopic visual attention model for 3d video. In: International Conference on Multimedia Modeling, pp. 314\u2013324. Springer. https:\/\/doi.org\/10.1007\/978-3-642-11301-7_33","DOI":"10.1007\/978-3-642-11301-7_33"},{"key":"1072_CR43","doi-asserted-by":"publisher","unstructured":"Sun P, Zhang W, Wang H, Li S, Li X (2021) Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1407\u20131417 . https:\/\/doi.org\/10.1109\/CVPR46437.2021.00146","DOI":"10.1109\/CVPR46437.2021.00146"},{"key":"1072_CR44","doi-asserted-by":"publisher","unstructured":"Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. https:\/\/doi.org\/10.48550\/arXiv.1706.05587. arXiv preprint arXiv:1706.05587","DOI":"10.48550\/arXiv.1706.05587"},{"key":"1072_CR45","doi-asserted-by":"publisher","unstructured":"Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794\u20137803 https:\/\/doi.org\/10.48550\/arXiv.1711.07971","DOI":"10.48550\/arXiv.1711.07971"},{"key":"1072_CR46","doi-asserted-by":"publisher","unstructured":"Ballas N, Yao L, Pal C, Courville A (2015) Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432. https:\/\/doi.org\/10.48550\/arXiv.1511.06432","DOI":"10.48550\/arXiv.1511.06432"},{"key":"1072_CR47","doi-asserted-by":"publisher","unstructured":"Song H, Wang W, Zhao S, Shen J, Lam K-M (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 715\u2013731. https:\/\/doi.org\/10.1007\/978-3-030-01252-6_44","DOI":"10.1007\/978-3-030-01252-6_44"},{"key":"1072_CR48","doi-asserted-by":"publisher","unstructured":"Woo S, Park J, Lee JY, Kweon, IS (2018) Cbam: Convolutional block attention module. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-030-01234-2_1","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"1072_CR49","doi-asserted-by":"publisher","unstructured":"Perazzi F, Pont\u00a0Tuset J, McWilliams B, Van\u00a0Gool L, Gross M, Sorkine\u00a0Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 724\u2013732. https:\/\/doi.org\/10.1109\/CVPR.2016.85","DOI":"10.1109\/CVPR.2016.85"},{"issue":"12","key":"1072_CR50","doi-asserted-by":"publisher","first-page":"2527","DOI":"10.1109\/TCSVT.2016.2595324","volume":"27","author":"Z Liu","year":"2016","unstructured":"Liu Z, Li J, Ye L, Sun G, Shen L (2016) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circ Syst Video Technol 27(12):2527\u20132542. https:\/\/doi.org\/10.1109\/TCSVT.2016.2595324","journal-title":"IEEE Trans Circ Syst Video Technol"},{"issue":"2","key":"1072_CR51","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1109\/TPAMI.2010.70","volume":"33","author":"T Liu","year":"2010","unstructured":"Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H-Y (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353\u2013367. https:\/\/doi.org\/10.1109\/TPAMI.2010.70","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1072_CR52","doi-asserted-by":"publisher","unstructured":"Li G, Yu Y (2015) Visual saliency based on multiscale deep features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5455\u20135463. https:\/\/doi.org\/10.1109\/CVPR.2015.7299184","DOI":"10.1109\/CVPR.2015.7299184"},{"issue":"1","key":"1072_CR53","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1109\/TIP.2017.2762594","volume":"27","author":"J Li","year":"2017","unstructured":"Li J, Xia C, Chen X (2017) A benchmark dataset and saliency-guided stacked autoencoders for video-based salient object detection. IEEE Trans Image Process 27(1):349\u2013364. https:\/\/doi.org\/10.1109\/TIP.2017.2762594","journal-title":"IEEE Trans Image Process"},{"issue":"12","key":"1072_CR54","doi-asserted-by":"publisher","first-page":"2527","DOI":"10.1109\/TCSVT.2016.2595324","volume":"27","author":"Z Liu","year":"2016","unstructured":"Liu Z, Li J, Ye L, Sun G, Shen L (2016) Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans Circ Syst Video Technol 27(12):2527\u20132542. https:\/\/doi.org\/10.1109\/TCSVT.2016.2595324","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"1072_CR55","unstructured":"Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249\u2013256 . JMLR Workshop and Conference Proceedings"},{"key":"1072_CR56","doi-asserted-by":"publisher","unstructured":"Fan D, Cheng M, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4548\u20134557. https:\/\/doi.org\/10.1109\/ICCV.2017.487","DOI":"10.1109\/ICCV.2017.487"},{"key":"1072_CR57","doi-asserted-by":"publisher","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. https:\/\/doi.org\/10.48550\/arXiv.1412.6980. arXiv preprint arXiv:1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"key":"1072_CR58","doi-asserted-by":"publisher","unstructured":"Piao Y, Rong Z, Zhang M, Ren W, Lu H (2020) A2dele: adaptive and attentive depth distiller for efficient rgb-d salient object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9060\u20139069. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00908","DOI":"10.1109\/CVPR42600.2020.00908"},{"key":"1072_CR59","doi-asserted-by":"publisher","unstructured":"Zhai Y, Fan D, Yang J, Borji A, Shao L, Han J, Wang L (2020) Bifurcated backbone strategy for rgb-d salient object detection. https:\/\/doi.org\/10.48550\/arXiv.2007.02713. arXiv preprint arXiv:2007.02713","DOI":"10.48550\/arXiv.2007.02713"},{"key":"1072_CR60","doi-asserted-by":"publisher","unstructured":"Chen S, Fu Y (2020) Progressively guided alternate refinement network for rgb-d salient object detection. In: European Conference on Computer Vision, pp. 520\u2013538 . Springer. https:\/\/doi.org\/10.1007\/978-3-030-58598-3_31","DOI":"10.1007\/978-3-030-58598-3_31"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01072-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01072-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01072-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T19:15:39Z","timestamp":1698434139000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01072-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,6]]},"references-count":60,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1072"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01072-w","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,6]]},"assertion":[{"value":"31 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}