{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T05:14:35Z","timestamp":1762060475550,"version":"build-2065373602"},"reference-count":71,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,8,7]],"date-time":"2022-08-07T00:00:00Z","timestamp":1659830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ADAPT"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>In smart mobility, the semantic segmentation of images is an important task for a good understanding of the environment. In recent years, many studies have been made on this subject, in the field of Autonomous Vehicles on roads. Some image datasets are available for learning semantic segmentation models, leading to very good performance. However, for other types of autonomous mobile systems like Electric Wheelchairs (EW) on sidewalks, there is no specific dataset. Our contribution presented in this article is twofold: (1) the proposal of a new dataset of short sequences of exterior images of street scenes taken from viewpoints located on sidewalks, in a 3D virtual environment (CARLA); (2) a convolutional neural network (CNN) adapted for temporal processing and including additional techniques to improve its accuracy. Our dataset includes a smaller subset, made of image pairs taken from the same places in the maps of the virtual environment, but from different viewpoints: one located on the road and the other located on the sidewalk. This additional set is aimed at showing the importance of the viewpoint in the result of semantic segmentation.<\/jats:p>","DOI":"10.3390\/jimaging8080216","type":"journal-article","created":{"date-parts":[[2022,8,7]],"date-time":"2022-08-07T22:51:46Z","timestamp":1659912706000},"page":"216","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A Dataset for Temporal Semantic Segmentation Dedicated to Smart Mobility of Wheelchairs on Sidewalks"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4037-2880","authenticated-orcid":false,"given":"Benoit","family":"Decoux","sequence":"first","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6230-2966","authenticated-orcid":false,"given":"Redouane","family":"Khemmar","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"given":"Nicolas","family":"Ragot","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"given":"Arthur","family":"Venon","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5901-535X","authenticated-orcid":false,"given":"Marcos","family":"Grassi-Pampuch","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"given":"Antoine","family":"Mauri","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8570-3781","authenticated-orcid":false,"given":"Louis","family":"Lecrosnier","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4688-481X","authenticated-orcid":false,"given":"Vishnu","family":"Pradeep","sequence":"additional","affiliation":[{"name":"Normandie University, Unirouen, Esigelec, Irseem, 76000 Rouen, France"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,7]]},"reference":[{"key":"ref_1","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., L\u00f3pez, A., and Koltun, V. (2017). CARLA: An Open Urban Driving Simulator. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1007\/978-3-030-01234-2_49","article-title":"Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation","volume":"11211","author":"Chen","year":"2018","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Brostow, G., Shotton, J., Fauqueur, J., and Cipolla, R. (2008). Segmentation and Recognition Using Structure from Motion Point Clouds. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-540-88682-2_5"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor Segmentation and Support Inference from RGBD Images. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_6","unstructured":"Ding, L., Terwilliger, J., Sherony, R., Reimer, B., and Fridman, L. (2019). Value of Temporal Dynamics Information in Driving Scene Segmentation. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"107791","DOI":"10.1016\/j.dib.2022.107791","article-title":"A pixel-wise annotated dataset of small overlooked indoor objects for semantic segmentation applications","volume":"40","author":"Mohamed","year":"2022","journal-title":"Data Brief"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1007\/978-3-319-46475-6_7","article-title":"Playing for Data: Ground Truth from Computer Games","volume":"9906","author":"Richter","year":"2016","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lopez, A. (2016, January 27\u201330). The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.352"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (2016, January 27\u201330). Virtual Worlds as Proxy for Multi-object Tracking Analysis. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.470"},{"key":"ref_11","unstructured":"Sankaranarayanan, S., Balaji, Y., Jain, A., Lim, S.N., and Chellappa, R. (2017). Unsupervised Domain Adaptation for Semantic Segmentation with GANs. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, Q., Dai, D., Hoyer, L., Fink, O., and Gool, L.V. (2021). Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation. arXiv.","DOI":"10.1109\/ICCV48922.2021.00840"},{"key":"ref_13","unstructured":"Mohan, R. (2014). Deep Deconvolutional Networks for Scene Parsing. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1109\/TPAMI.2012.231","article-title":"Learning Hierarchical Features for Scene Labeling","volume":"35","author":"Farabet","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","unstructured":"Pinheiro, P.H.O., and Collobert, R. (2014, January 22\u201324). Recurrent Convolutional Neural Networks for Scene Labeling. Proceedings of the International Conference on Machine Learning, Bejing, China."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mostajabi, M., Yadollahpour, P., and Shakhnarovich, G. (2014). Feedforward semantic segmentation with zoom-out features. arXiv.","DOI":"10.1109\/CVPR.2015.7298959"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015). Learning Deconvolution Network for Semantic Segmentation. arXiv.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPAMI.2016.2572683","article-title":"Fully Convolutional Networks for Semantic Segmentation","volume":"39","author":"Shelhamer","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kendall, A., Badrinarayanan, V., and Cipolla, R. (2017, January 4\u20137). Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding. Proceedings of the British Machine Vision Conference 2017, London, UK.","DOI":"10.5244\/C.31.57"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, M., Kun, Y., Zhang, C., Li, Z., and Yang, K. (2018, January 18\u201322). DenseASPP for Semantic Segmentation in Street Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, X., Zhao, H., Han, L., Tong, Y., and Yang, K. (2020). GFF: Gated Fully Fusion for Semantic Segmentation. arXiv.","DOI":"10.1609\/aaai.v34i07.6805"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019). Dual Attention Network for Scene Segmentation. arXiv.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_27","unstructured":"Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., and Wang, J. (2021). OCNet: Object Context Network for Scene Parsing. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., and Jia, J. (2018, January 8\u201314). PSANet: Point-wise Spatial Attention Network for Scene Parsing. Proceedings of the 15th European Conference, Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., and Agrawal, A. (2018). Context Encoding for Semantic Segmentation. arXiv.","DOI":"10.1109\/CVPR.2018.00747"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, F., Chen, Y., Li, Z., Hong, Z., Liu, J., Ma, F., Han, J., and Ding, E. (2019). ACFNet: Attentional Class Feature Network for Semantic Segmentation. arXiv.","DOI":"10.1109\/ICCV.2019.00690"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1480","DOI":"10.1109\/TPAMI.2017.2712691","article-title":"Scene Segmentation with DAG-Recurrent Neural Networks","volume":"40","author":"Shuai","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Xu, D., Ouyang, W., Wang, X., and Sebe, N. (2018). PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing. arXiv.","DOI":"10.1109\/CVPR.2018.00077"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chennupati, S., Sistu, G., Yogamani, S., and Rawashdeh, S. (2019, January 25\u201327). AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving. Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, Czech Republic.","DOI":"10.5220\/0007684100002108"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (2019). Gated-SCNN: Gated Shape CNNs for Semantic Segmentation. arXiv.","DOI":"10.1109\/ICCV.2019.00533"},{"key":"ref_35","unstructured":"Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R., and Chopra, S. (2014). Video (language) modeling: A baseline for generative models of natural videos. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1109\/TPAMI.2016.2599174","article-title":"Long-Term Recurrent Convolutional Networks for Visual Recognition and Description","volume":"39","author":"Donahue","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Nilsson, D., and Sminchisescu, C. (2017). Semantic Video Segmentation by Gated Recurrent Flow Propagation. arXiv.","DOI":"10.1109\/CVPR.2018.00713"},{"key":"ref_39","unstructured":"Shi, X., Chen, Z., Wang, H., Yeung, D.Y., kin Wong, W., and chun Woo, W. (2015). Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv."},{"key":"ref_40","unstructured":"Ballas, N., Yao, L., Pal, C., and Courville, A. (2015). Delving Deeper into Convolutional Networks for Learning Video Representations. arXiv."},{"key":"ref_41","unstructured":"Fayyaz, M., Saffar, M.H., Sabokrou, M., Fathy, M., Klette, R., and Huang, F. (2016). STFCN: Spatio-Temporal FCN for Semantic Video Segmentation. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1007\/978-3-030-01219-9_25","article-title":"ICNet for Real-Time Semantic Segmentation on High-Resolution Images","volume":"11207","author":"Zhao","year":"2018","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2017, January 21\u201326). FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.179"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhu, X., Xiong, Y., Dai, J., Yuan, L., and Wei, Y. (2017, January 21\u201326). Deep Feature Flow for Video Recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.441"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Sapra, K., Reda, F.A., Shih, K.J., Newsam, S., Tao, A., and Catanzaro, B. (2019). Improving Semantic Segmentation via Video Propagation and Label Relaxation. arXiv.","DOI":"10.1109\/CVPR.2019.00906"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Li, Y., Shi, J., and Lin, D. (2018, January 18\u201323). Low-Latency Video Semantic Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00628"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Jain, S., Wang, X., and Gonzalez, J.E. (2019, January 15\u201320). Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00907"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Gadde, R., Jampani, V., and Gehler, P.V. (2017). Semantic Video CNNs through Representation Warping. arXiv.","DOI":"10.1109\/ICCV.2017.477"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"6680509","DOI":"10.1155\/2021\/6680509","article-title":"Dynamic Warping Network for Semantic Video Segmentation","volume":"2021","author":"HanchaoHe","year":"2021","journal-title":"Complexity"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Ding, M., Wang, Z., Zhou, B., Shi, J., Lu, Z., and Luo, P. (2019). Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow. arXiv.","DOI":"10.1609\/aaai.v34i07.6699"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Liu, Y., Shen, C., Yu, C., and Wang, J. (2020). Efficient Semantic Video Segmentation with Per-frame Inference. arXiv.","DOI":"10.1007\/978-3-030-58607-2_21"},{"key":"ref_52","unstructured":"Van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/29.21701","article-title":"Phoneme recognition using time-delay neural networks","volume":"37","author":"Waibel","year":"1989","journal-title":"Acoust. Speech Signal Process. IEEE Trans."},{"key":"ref_54","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Sibechi, R., Booij, O., Baka, N., and Bloem, P. (2019, January 27\u201328). Exploiting Temporality for Semi-Supervised Video Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00122"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Oh, S.W., Lee, J.Y., Xu, N., and Kim, S.J. (2019). Video Object Segmentation using Space-Time Memory Networks. arXiv.","DOI":"10.1109\/ICCV.2019.00932"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, W., and Liu, J. (2021). Temporal Memory Attention for Video Semantic Segmentation. arXiv.","DOI":"10.1109\/ICIP42928.2021.9506731"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Paul, M., Mayer, C., Van Gool, L., and Timofte, R. (2020, January 1\u20135). Efficient Video Semantic Segmentation with Labels Propagation and Refinement. Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093520"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Hu, P., Caba, F., Wang, O., Lin, Z., Sclaroff, S., and Perazzi, F. (2020, January 13\u201319). Temporally Distributed Networks for Fast Video Semantic Segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00884"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Xu, Y.S., Fu, T.J., Yang, H.K., and Lee, C.Y. (2018). Dynamic Video Segmentation Network. arXiv.","DOI":"10.1109\/CVPR.2018.00686"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Lee, S.P., Chen, S.C., and Peng, W.H. (2021). GSVNet: Guided Spatially-Varying Convolution for Fast Semantic Segmentation on Video. arXiv.","DOI":"10.1109\/ICME51207.2021.9428381"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Hu, P., Perazzi, F., Heilbron, F.C., Wang, O., Lin, Z., Saenko, K., and Sclaroff, S. (2020). Real-time Semantic Segmentation with Fast Attention. arXiv.","DOI":"10.1109\/LRA.2020.3039744"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"104184","DOI":"10.1016\/j.imavis.2021.104184","article-title":"Semantic video segmentation with dynamic keyframe selection and distortion-aware feature rectification","volume":"110","author":"Awan","year":"2021","journal-title":"Image Vis. Comput."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"107268","DOI":"10.1016\/j.patcog.2020.107268","article-title":"Video semantic segmentation via feature propagation with holistic attention","volume":"104","author":"Wu","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"147914","DOI":"10.1109\/ACCESS.2021.3123952","article-title":"Indoor\/Outdoor Semantic Segmentation Using Deep Learning for Visually Impaired Wheelchair Users","volume":"9","author":"Mohamed","year":"2021","journal-title":"IEEE Access"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_67","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2020). ResNeSt: Split-Attention Networks. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Jadon, S. (2020). A survey of loss functions for semantic segmentation. arXiv.","DOI":"10.1109\/CIBCB48159.2020.9277638"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_70","unstructured":"Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv."},{"key":"ref_71","unstructured":"M\u00fcller, R., Kornblith, S., and Hinton, G.E. (2019, January 8\u201314). When Does Label Smoothing Help?. Proceedings of the Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/8\/216\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:05:22Z","timestamp":1760141122000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/8\/8\/216"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,7]]},"references-count":71,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["jimaging8080216"],"URL":"https:\/\/doi.org\/10.3390\/jimaging8080216","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2022,8,7]]}}}