{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:43:26Z","timestamp":1760240606391,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2019,8,12]],"date-time":"2019-08-12T00:00:00Z","timestamp":1565568000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.<\/jats:p>","DOI":"10.3390\/info10080257","type":"journal-article","created":{"date-parts":[[2019,8,13]],"date-time":"2019-08-13T04:31:21Z","timestamp":1565670681000},"page":"257","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Visual Saliency Prediction Based on Deep Learning"],"prefix":"10.3390","volume":"10","author":[{"given":"Bashir","family":"Ghariba","sequence":"first","affiliation":[{"name":"Faculty of Engineering &amp; Applied Science, Memorial University, St. John\u2019s, Newfoundland, NL A1B 3X5, Canada"},{"name":"Faculty of Engineering, Elmergib University, Khoms 40414, Libya"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8464-8650","authenticated-orcid":false,"given":"Mohamed S.","family":"Shehata","sequence":"additional","affiliation":[{"name":"Faculty of Engineering &amp; Applied Science, Memorial University, St. John\u2019s, Newfoundland, NL A1B 3X5, Canada"},{"name":"Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus, Kelowna, BC V1V 1V7, Canada"}]},{"given":"Peter","family":"McGuire","sequence":"additional","affiliation":[{"name":"C-CORE, Captain Robert A. Bartlett Building, Morrissey Road, St. John\u2019s, Newfoundland, NL A1C 3X5, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2019,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/S0004-3702(02)00399-5","article-title":"Object-based visual attention for computer vision","volume":"146","author":"Sun","year":"2003","journal-title":"Artif. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Koch, C., and Ullman, S. (1987). Shifts in selective visual attention: Towards the underlying neural circuitry. Matters of Intelligence, Springer.","DOI":"10.1007\/978-94-009-3833-5_5"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wang, K., Wang, S., and Ji, Q. (2016, January 14\u201317). Deep eye fixation map learning for calibration-free eye gaze tracking. Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, Charleston, SC, USA.","DOI":"10.1145\/2857491.2857515"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Borji, A. (2012, January 16\u201321). Boosting bottom-up and top-down visual features for saliency estimation. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6247706"},{"key":"ref_5","unstructured":"Zhu, W., and Deng, H. (2017, January 22\u201329). Monocular free-head 3D gaze tracking with deep learning and geometry constraints. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1080\/13506280902771138","article-title":"SUN: Top-down saliency using natural statistics","volume":"17","author":"Kanan","year":"2009","journal-title":"Vis. Cogn."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hickson, S., Dufour, N., Sud, A., Kwatra, V., and Essa, I. (2019, January 7\u201311). Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA.","DOI":"10.1109\/WACV.2019.00178"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhao, R., Ouyang, W., Li, H., and Wang, X. (2015, January 7\u201312). Saliency detection by multi-context deep learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298731"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Recasens, A., Vondrick, C., Khosla, A., and Torralba, A. (2017, January 22\u201329). Following gaze in video. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.160"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1145\/2897824.2925947","article-title":"Realtime 3D eye gaze animation using a single RGB camera","volume":"35","author":"Wang","year":"2016","journal-title":"ACM Trans. Graph."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1145\/3177745","article-title":"Paying more attention to saliency: Image captioning with saliency and context attention","volume":"14","author":"Cornia","year":"2018","journal-title":"ACM Trans. Multimed. Comput. Commun. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Naqvi, R., Arsalan, M., Batchuluun, G., Yoon, H., and Park, K. (2018). Deep learning-based gaze detection system for automobile drivers using a NIR camera sensor. Sensors, 18.","DOI":"10.3390\/s18020456"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3030","DOI":"10.1109\/JSTARS.2018.2846178","article-title":"Deep Convolutional Neural Network for Complex Wetland Classification Using Optical Remote Sensing Imagery","volume":"11","author":"Rezaee","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_14","unstructured":"Krafka, K., Khosla, A., Kellnhofer, P., Kannan, H., Bhandarkar, S., Matusik, W., and Torralba, A. (July, January 26). Eye tracking for everyone. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_15","unstructured":"Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 6\u201311). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_16","unstructured":"Kruthiventi, S.S.S., Gudisa, V., Dholakiya, J.H., and Venkatesh Babu, R. (July, January 26). Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_17","unstructured":"Pan, J., Sayrol, E., Giro-i-Nieto, X., McGuinness, K., and O\u2019Connor, N.E. (July, January 26). Shallow and deep convolutional networks for saliency prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.isprsjprs.2017.05.010","article-title":"Random forest wetland classification using ALOS-2 L-band, RADARSAT-2 C-band, and TerraSAR-X imagery","volume":"130","author":"Mahdianpari","year":"2017","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1109\/TNNLS.2016.2628878","article-title":"Learning to predict eye fixations via multiresolution convolutional neural networks","volume":"29","author":"Liu","year":"2016","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mahdianpari, M., Salehi, B., Rezaee, M., Mohammadimanesh, F., and Zhang, Y. (2018). Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sens., 10.","DOI":"10.3390\/rs10071119"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jiang, M., Huang, S., Duan, J., and Zhao, Q. (2015, January 7\u201312). Salicon: Saliency in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298710"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Judd, T., Ehinger, K., Durand, F., and Torralba, A. (October, January 29). Learning to predict where humans look. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459462"},{"key":"ref_24","unstructured":"Bruce, N., and Tsotsos, J. (2006, January 4\u20137). Saliency based on information maximization. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, Y., Hou, X., Koch, C., Rehg, J.M., and Yuille, A.L. (2014, January 23\u201328). The secrets of salient object segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.43"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2368","DOI":"10.1109\/TIP.2017.2787612","article-title":"Deep visual attention prediction","volume":"27","author":"Wang","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.isprsjprs.2019.03.015","article-title":"A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem","volume":"151","author":"Mohammadimanesh","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_28","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/S0893-6080(98)00116-6","article-title":"On the momentum term in gradient descent learning algorithms","volume":"12","author":"Qian","year":"1999","journal-title":"Neural Netw."},{"key":"ref_31","unstructured":"Judd, T., Durand, F., and Torralba, A. (2019, August 09). A Benchmark of Computational Models of Saliency to Predict Human Fixations. Available online: http:\/\/hdl.handle.net\/1721.1\/68590."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yang, C., Zhang, L., Lu, H., Ruan, X., and Yang, M.-H. (2013, January 23\u201328). Saliency detection via graph-based manifold ranking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.407"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/TPAMI.2017.2662005","article-title":"Saliency-aware video object segmentation","volume":"40","author":"Wang","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Harel, J., Koch, C., and Perona, P. (2007, January 3\u20136). Graph-based visual saliency. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada.","DOI":"10.7551\/mitpress\/7503.003.0073"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Borji, A., Tavakoli, H.R., Sihite, D.N., and Itti, L. (2013, January 1\u20138). Analysis of scores, datasets, and models in visual saliency prediction. Proceedings of the IEEE international Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.118"},{"key":"ref_36","unstructured":"Tavakoli, H.R., Rahtu, E., and Heikkil\u00e4, J. (2011, January 23\u201325). Fast and efficient saliency detection using sparse sampling and kernel density estimation. Proceedings of the Scandinavian Conference on Image Analysis, Ystad, Sweden."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1167\/13.4.11","article-title":"Visual saliency estimation by nonlinearly integrating features using region covariances","volume":"13","author":"Erdem","year":"2013","journal-title":"J. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1167\/9.3.5","article-title":"Saliency, attention, and visual search: An information theoretic approach","volume":"9","author":"Bruce","year":"2009","journal-title":"J. Vis."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Csurka, G., Larlus, D., Perronnin, F., and Meylan, F. (2013, January 9\u201313). What is a good evaluation measure for semantic segmentation?. Proceedings of the 24th British Machine Vision Conference (BMVC), Bristol, UK.","DOI":"10.5244\/C.27.32"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/8\/257\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:10:32Z","timestamp":1760188232000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/10\/8\/257"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,12]]},"references-count":39,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2019,8]]}},"alternative-id":["info10080257"],"URL":"https:\/\/doi.org\/10.3390\/info10080257","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2019,8,12]]}}}