{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T08:03:01Z","timestamp":1769932981702,"version":"3.49.0"},"reference-count":65,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T00:00:00Z","timestamp":1723593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:p>Recent research shows that Conditional Generative Adversarial Networks (cGANs) are effective for Salient Object Detection (SOD), a challenging computer vision task that mimics the way human vision focuses on important parts of an image. However, implementing cGANs for this task has presented several complexities, including instability during training with skip connections, weak generators, and difficulty in capturing context information for challenging images. These challenges are particularly evident when dealing with input images containing small salient objects against complex backgrounds, underscoring the need for careful design and tuning of cGANs to ensure accurate segmentation and detection of salient objects. To address these issues, we propose an innovative method for SOD using a cGAN framework. Our method utilizes encoder-decoder framework as the generator component for cGAN, enhancing the feature extraction process and facilitating accurate segmentation of the salient objects. We incorporate Wasserstein-1 distance within the cGAN training process to improve the accuracy of finding the salient objects and stabilize the training process. Additionally, our enhanced model efficiently captures intricate saliency cues by leveraging the spatial attention gate with global average pooling and regularization. The introduction of global average pooling layers in the encoder and decoder paths enhances the network's global perception and fine-grained detail capture, while the channel attention mechanism, facilitated by dense layers, dynamically modulates feature maps to amplify saliency cues. The generated saliency maps are evaluated by the discriminator for authenticity and gives feedback to enhance the generator's ability to generate high-resolution saliency maps. By iteratively training the discriminator and generator networks, the model achieves improved results in finding the salient object. We trained and validated our model using large-scale benchmark datasets commonly used for salient object detection, namely DUTS, ECSSD, and DUT-OMRON. Our approach was evaluated using standard performance metrics on these datasets. Precision, recall, MAE and <jats:italic>F\u03b2<\/jats:italic> score metrics are used to evaluate performance. Our method achieved the lowest MAE values: 0.0292 on the ECSSD dataset, 0.033 on the DUTS-TE dataset, and 0.0439 on the challenging and complex DUT-OMRON dataset, compared to other state-of-the-art methods. Our proposed method demonstrates significant improvements in salient object detection, highlighting its potential benefits for real-life applications.<\/jats:p>","DOI":"10.3389\/fcomp.2024.1420965","type":"journal-article","created":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T14:51:52Z","timestamp":1723647112000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Spatial attention guided cGAN for improved salient object detection"],"prefix":"10.3389","volume":"6","author":[{"given":"Gayathri","family":"Dhara","sequence":"first","affiliation":[]},{"given":"Ravi Kant","family":"Kumar","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,8,14]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"6351","DOI":"10.32604\/cmc.2023.038173","article-title":"A progressive approach to generic object detection: a two-stage framework for image recognition","volume":"75","author":"Aamir","year":"2023","journal-title":"Comp. Mater. Continua"},{"key":"B2","first-page":"214","article-title":"\u201cWasserstein generative adversarial networks,\u201d","volume-title":"International Conference on Machine Learning","author":"Arjovsky","year":"2017"},{"key":"B3","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B4","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1109\/TPAMI.2012.89","article-title":"State-of-the-art in visual attention modeling","volume":"35","author":"Borji","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B5","first-page":"319","article-title":"\u201cBall detection using yolo and mask R-CNN,\u201d","author":"Buric","year":"2018"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1412.7062","article-title":"Semantic image segmentation with deep convolutional nets and fully connected crfs","author":"Chen","year":"2014","journal-title":"arXiv [Preprint]"},{"key":"B7","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.05587","article-title":"Rethinking atrous convolution for semantic image segmentation","author":"Chen","year":"","journal-title":"arXiv [Preprint]"},{"key":"B9","first-page":"801","article-title":"\u201cEncoder-decoder with atrous separable convolution for semantic image segmentation,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Chen","year":"2018"},{"key":"B10","doi-asserted-by":"publisher","first-page":"152483","DOI":"10.1109\/ACCESS.2019.2948062","article-title":"Salient object detection: Integrate salient features in the deep learning framework","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"B11","doi-asserted-by":"publisher","first-page":"10599","DOI":"10.1609\/aaai.v34i07.6633","article-title":"Global context-aware progressive aggregation network for salient object detection","volume":"34","author":"Chen","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B12","doi-asserted-by":"publisher","first-page":"569","DOI":"10.1109\/TPAMI.2014.2345401","article-title":"Global contrast based salient region detection","volume":"37","author":"Cheng","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B13","first-page":"186","article-title":"\u201cSalient objects in clutter: Bringing salient object detection to the foreground,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Fan","year":""},{"key":"B14","first-page":"4548","article-title":"\u201cStructure-measure: a new way to evaluate foreground maps,\u201d","author":"Fan","year":"2017","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/97","article-title":"Enhanced-alignment measure for binary foreground map evaluation","author":"Fan","year":"","journal-title":"arXiv [Preprint]"},{"key":"B16","first-page":"580","article-title":"\u201cRich feature hierarchies for accurate object detection and semantic segmentation,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Girshick","year":"2014"},{"key":"B17","article-title":"\u201cGenerative adversarial nets,\u201d","author":"Goodfellow","year":"2014","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B18","doi-asserted-by":"publisher","first-page":"19","DOI":"10.18280\/ts.380319","article-title":"An object detection framework based on deep features and high-quality object locations","volume":"38","author":"Guan","year":"2021","journal-title":"Traitement du Signal"},{"key":"B19","article-title":"\u201cImproved training of Wasserstein GANs,\u201d","author":"Gulrajani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B20","doi-asserted-by":"publisher","first-page":"48890","DOI":"10.1109\/ACCESS.2019.2910572","article-title":"Convolutional edge constraint-based U-Net for salient object detection","volume":"7","author":"Han","year":"2019","journal-title":"IEEE Access"},{"key":"B21","first-page":"770","article-title":"\u201cDeep residual learning for image recognition,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He","year":"2016"},{"key":"B22","first-page":"1125","article-title":"\u201cImage-to-image translation with conditional adversarial networks,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Isola","year":"2017"},{"key":"B23","first-page":"11","article-title":"\u201cThe one hundred layers tiramisu: fully convolutional densenets for semantic segmentation,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"J\u00e9gou","year":"2017"},{"key":"B24","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1016\/j.neucom.2018.08.013","article-title":"Saliency detection via conditional adversarial image-to-image network","volume":"316","author":"Ji","year":"","journal-title":"Neurocomputing"},{"key":"B25","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1016\/j.neucom.2018.09.061","article-title":"Salient object detection via multi-scale attention cnn","volume":"322","author":"Ji","year":"","journal-title":"Neurocomputing"},{"key":"B26","first-page":"1665","article-title":"\u201cSaliency detection via absorbing markov chain,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Jiang","year":""},{"key":"B27","first-page":"2083","article-title":"\u201cSalient object detection: A discriminative regional feature integration approach,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Jiang","year":""},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6916","article-title":"\u201cF3net: Fusion, feedback and focus for salient object detection,\u201d","author":"Jun","year":"2020","journal-title":"AAAI Conference on Artificial Intelligence (AAAI"},{"key":"B29","first-page":"2940","article-title":"\u201cRecursive contour-saliency blending network for accurate salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Ke","year":"2022"},{"key":"B30","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/978-94-009-3833-5_5","article-title":"\u201cShifts in selective visual attention: towards the underlying neural circuitry,\u201d","volume-title":"Matters of Intelligence: Conceptual Structures in Cognitive Neuroscience","author":"Koch","year":"1987"},{"key":"B31","article-title":"\u201cEfficient inference in fully connected crfs with gaussian edge potentials,\u201d","volume-title":"Advances in Neural Information Processing Systems","author":"Kr\u00e4henb\u00fchl","year":"2011"},{"key":"B32","first-page":"25","article-title":"\u201cImagenet classification with deep convolutional neural networks,\u201d","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky","year":"2012"},{"key":"B33","first-page":"280","article-title":"\u201cThe secrets of salient object segmentation,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Li","year":"2014"},{"key":"B34","doi-asserted-by":"publisher","first-page":"107622","DOI":"10.1016\/j.patcog.2020.107622","article-title":"Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation","volume":"110","author":"Lian","year":"2021","journal-title":"Pattern Recognit"},{"key":"B35","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1109\/TPAMI.2010.70","article-title":"Learning to detect a salient object","volume":"33","author":"Liu","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B36","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1109\/ICMLA.2018.00157","article-title":"\u201cLogan: Generating logos with a generative adversarial neural network conditioned on color,\u201d","author":"Mino","year":"2018","journal-title":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA"},{"key":"B37","first-page":"9413","article-title":"\u201cMulti-scale interactive network for salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference On Computer Vision and Pattern Recognition","author":"Pang","year":"2020"},{"key":"B38","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1109\/CVPR.2012.6247743","article-title":"\u201cSaliency filters: contrast based filtering for salient region detection,\u201d","volume-title":"2012 IEEE Conference on Computer Vision and Pattern Recognition","author":"Perazzi","year":"2012"},{"key":"B39","doi-asserted-by":"crossref","first-page":"1047","DOI":"10.1109\/ICIP.2016.7532517","article-title":"\u201cDeep-CSSR: Scene classification using category-specific salient region with deep features,\u201d","volume-title":"2016 IEEE International Conference on Image Processing (ICIP)","author":"Qi","year":"2016"},{"key":"B40","doi-asserted-by":"publisher","first-page":"107404","DOI":"10.1016\/j.patcog.2020.107404","article-title":"U2-net: Going deeper with nested u-structure for salient object detection","volume":"106","author":"Qin","year":"2020","journal-title":"Pattern Recognit"},{"key":"B41","first-page":"7479","article-title":"\u201cBASNet: Boundary-aware salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Qin","year":"2019"},{"key":"B42","first-page":"28","article-title":"\u201cFaster R-CNN: towards real-time object detection with region proposal networks,\u201d","volume-title":"Advances in Neural Information Processing Systems","author":"Ren","year":"2015"},{"key":"B43","first-page":"234","article-title":"\u201cU-Net: Convolutional networks for biomedical image segmentation,\u201d","volume-title":"Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany","author":"Ronneberger","year":"2015"},{"key":"B44","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1109\/TPAMI.2015.2465960","article-title":"Hierarchical image saliency detection on extended cssd","volume":"38","author":"Shi","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B45","author":"Simonyan","year":"2015","journal-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition"},{"key":"B46","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1007\/s00530-020-00650-z","article-title":"Saliency threshold: a novel saliency detection model using ising's theory on ferromagnetism (stif)","volume":"26","author":"Singh","year":"2020","journal-title":"Multimedia Syst"},{"key":"B47","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s00530-021-00796-4","article-title":"Visual saliency prediction using multi-scale attention gated network","volume":"28","author":"Sun","year":"2022","journal-title":"Multimedia Syst"},{"key":"B48","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1016\/0010-0285(80)90005-5","article-title":"A feature-integration theory of attention","volume":"12","author":"Treisman","year":"1980","journal-title":"Cogn. Psychol"},{"key":"B49","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2017.404","article-title":"\u201cLearning to detect salient objects with image-level supervision,\u201d","volume-title":"CVPR","author":"Wang","year":"2017"},{"key":"B50","first-page":"3127","article-title":"\u201cDetect globally, refine locally: A novel approach to saliency detection,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang","year":"2018"},{"key":"B51","first-page":"10031","article-title":"\u201cPixels, regions, and objects: Multiple enhancement for salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang","year":"2023"},{"key":"B52","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1007\/978-3-642-33712-3_3","article-title":"\u201cGeodesic saliency using background priors,\u201d","volume-title":"Computer Vision-ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part III 12","author":"Wei","year":"2012"},{"key":"B53","first-page":"7264","article-title":"\u201cStacked cross refinement network for edge-aware salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Wu","year":"2019"},{"key":"B54","first-page":"1395","article-title":"\u201cHolistically-nested edge detection,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Xie","year":"2015"},{"key":"B55","first-page":"3166","article-title":"\u201cSaliency detection via graph-based manifold ranking,\u201d","volume-title":"Computer Vision and Pattern Recognition (CVPR)","author":"Yang","year":"2013"},{"key":"B56","doi-asserted-by":"publisher","first-page":"3234","DOI":"10.1609\/aaai.v35i4.16434","article-title":"Structure-consistent weakly supervised salient object detection with local saliency coherence","volume":"35","author":"Yu","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intellig"},{"key":"B57","doi-asserted-by":"publisher","first-page":"3048","DOI":"10.1109\/TIP.2019.2893535","article-title":"Salient object detection with lossless feature reflection and weighted structural loss","volume":"28","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Image Proc"},{"key":"B58","first-page":"202","article-title":"\u201cAmulet: Aggregating multi-level convolutional features for salient object detection,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Zhang","year":"2017"},{"key":"B59","first-page":"405","article-title":"\u201cICNet for real-time semantic segmentation on high-resolution images,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Zhao","year":"2018"},{"key":"B60","first-page":"2881","article-title":"\u201cPyramid scene parsing network,\u201d","volume-title":"Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition","author":"Zhao","year":"2017"},{"key":"B61","first-page":"8779","article-title":"\u201cEgnet: Edge guidance network for salient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhao","year":"2019"},{"key":"B62","first-page":"1529","article-title":"\u201cConditional random fields as recurrent neural networks,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Zheng","year":"2015"},{"key":"B63","first-page":"2814","article-title":"\u201cSaliency optimization from robust background detection,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhu","year":"2014"},{"key":"B64","doi-asserted-by":"publisher","first-page":"3738","DOI":"10.1109\/TPAMI.2022.3179526","article-title":"Salient object detection via integrity learning","volume":"45","author":"Zhuge","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B65","doi-asserted-by":"crossref","first-page":"1845","DOI":"10.1109\/ICIP.2013.6738380","article-title":"\u201cContent-aware compression using saliency-driven image retargeting,\u201d","volume-title":"2013 IEEE International Conference on Image Processing","author":"Z\u00fcnd","year":"2013"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1420965\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,14]],"date-time":"2024-08-14T14:52:21Z","timestamp":1723647141000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1420965\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,14]]},"references-count":65,"alternative-id":["10.3389\/fcomp.2024.1420965"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2024.1420965","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,14]]},"article-number":"1420965"}}