{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T06:06:23Z","timestamp":1775887583503,"version":"3.50.1"},"reference-count":54,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2023,5,19]],"date-time":"2023-05-19T00:00:00Z","timestamp":1684454400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Chinese Academy of Agricultural Sciences Science and Technology Innovation","award":["ASTIP-TRIC03"],"award-info":[{"award-number":["ASTIP-TRIC03"]}]},{"name":"Chinese Academy of Agricultural Sciences Science and Technology Innovation","award":["62203176"],"award-info":[{"award-number":["62203176"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["ASTIP-TRIC03"],"award-info":[{"award-number":["ASTIP-TRIC03"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62203176"],"award-info":[{"award-number":["62203176"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Vision is an important way for unmanned mobile platforms to understand surrounding environmental information. For an unmanned mobile platform, quickly and accurately obtaining environmental information is a basic requirement for its subsequent visual tasks. Based on this, a unique convolution module called Multi-Scale Depthwise Separable Convolution module is proposed for real-time semantic segmentation. This module mainly consists of concatenation pointwise convolution and multi-scale depthwise convolution. Not only does the concatenation pointwise convolution change the number of channels, but it also combines the spatial features from the multi-scale depthwise convolution operations to produce additional features. The Multi-Scale Depthwise Separable Convolution module can strengthen the non-linear relationship between input and output. Specifically, the multi-scale depthwise convolution module extracts multi-scale spatial features while remaining lightweight. This fully uses multi-scale information to describe objects despite their different sizes. Here, Mean Intersection over Union (MIoU), parameters, and inference speed were used to describe the performance of the proposed network. On the Camvid, KITTI, and Cityscapes datasets, the proposed algorithm compromised between accuracy and memory in comparison to widely used and cutting-edge algorithms. In particular, the proposed algorithm acquired 61.02 MIoU with 2.68 M parameters on the Camvid test dataset.<\/jats:p>","DOI":"10.3390\/rs15102649","type":"journal-article","created":{"date-parts":[[2023,5,19]],"date-time":"2023-05-19T09:23:10Z","timestamp":1684488190000},"page":"2649","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Multi-Scale Depthwise Separable Convolution for Semantic Segmentation in Street\u2013Road Scenes"],"prefix":"10.3390","volume":"15","author":[{"given":"Yingpeng","family":"Dai","sequence":"first","affiliation":[{"name":"Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao 266101, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenglin","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of Key Technology on Agricultural Machine and Equipment, Ministry of Education, College of Engineering, South China Agricultural University, Guangzhou 510642, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaohang","family":"Su","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongxian","family":"Liu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Key Technology on Agricultural Machine and Equipment, Ministry of Education, College of Engineering, South China Agricultural University, Guangzhou 510642, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4946-4434","authenticated-orcid":false,"given":"Jiehao","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of Key Technology on Agricultural Machine and Equipment, Ministry of Education, College of Engineering, South China Agricultural University, Guangzhou 510642, China"},{"name":"School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, J., Dai, Y., Su, X., and Wu, W. (2022). Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera. Remote Sens., 14.","DOI":"10.3390\/rs14163925"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, J., Dai, Y., Wang, J., Su, X., and Ma, R. (2022, January 23\u201327). Towards broad learning networks on unmanned mobile robot for semantic segmentation. Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA.","DOI":"10.1109\/ICRA46639.2022.9812204"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Siam, M., Gamal, M., Abdel-Razek, M., Yogamani, S., and Jagersand, M. (2018, January 7\u201310). Rtseg: Real-time semantic segmentation comparative study. Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451495"},{"key":"ref_4","unstructured":"Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.eswa.2019.01.010","article-title":"FRED-Net: Fully residual encoder\u2013decoder network for accurate iris segmentation","volume":"122","author":"Arsalan","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Noh, H., Hong, S., and Han, B. (2015, January 7\u201313). Learning deconvolution network for semantic segmentation. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.178"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1109\/TPAMI.2016.2578328","article-title":"Object instance segmentation and fine-grained localization using hypercolumns","volume":"39","author":"Hariharan","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5435","DOI":"10.1109\/TIP.2019.2917224","article-title":"Weakly supervised salient object detection by learning a classifier-driven map generator","volume":"28","author":"Hsu","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.neucom.2018.09.003","article-title":"Object detection and recognition via clustered features","volume":"320","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1016\/j.neucom.2020.06.004","article-title":"Building and optimization of 3D semantic map based on Lidar and camera fusion","volume":"409","author":"Li","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chandra, S., and Kokkinos, I. (2016, January 11\u201314). Fast, exact and multi-scale inference for semantic image segmentation with deep gaussian crfs. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Proceedings, Part VII 14.","DOI":"10.1007\/978-3-319-46478-7_25"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hariharan, B., Arbel\u00e1ez, P., Girshick, R., and Malik, J. (2015, January 7\u201312). Hypercolumns for object segmentation and fine-grained localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298642"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2491","DOI":"10.1109\/TSMC.2021.3050616","article-title":"Fuzzy-Torque Approximation-Enhanced Sliding Mode Control for Lateral Stability of Mobile Robot","volume":"52","author":"Li","year":"2022","journal-title":"IEEE Trans. Syst. Man, Cybern. Syst."},{"key":"ref_16","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1016\/j.neucom.2020.05.091","article-title":"Neural fuzzy approximation enhanced autonomous tracking control of the wheel-legged robot under uncertain physical interaction","volume":"410","author":"Li","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep learning with depthwise separable convolutions. Proceedings of the EEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201322). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1117\/1.1631315","article-title":"Survey over image thresholding techniques and quantitative performance evaluation","volume":"13","author":"Sezgin","year":"2004","journal-title":"J. Electron. Imaging"},{"key":"ref_21","first-page":"713","article-title":"A fast algorithm for multilevel thresholding","volume":"17","author":"Liao","year":"2001","journal-title":"J. Inf. Sci. Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13640-017-0182-5","article-title":"Research of segmentation method on color image of Lingwu long jujubes based on the maximum entropy","volume":"2017","author":"Wang","year":"2017","journal-title":"EURASIP J. Image Video Process."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1007\/s007780050040","article-title":"Heuristic and randomized optimization for the join ordering problem","volume":"6","author":"Steinbrunn","year":"1997","journal-title":"VLDB J."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_25","first-page":"2211","article-title":"Multiple kernel learning algorithms","volume":"12","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1109\/TASSP.1981.1163711","article-title":"Cubic convolution interpolation for digital image processing","volume":"29","author":"Keys","year":"1981","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"ref_28","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany. Proceedings, Part III 18."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 8\u201314). Bisenet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J. (2018, January 8\u201314). Icnet for real-time semantic segmentation on high-resolution images. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01219-9_25"},{"key":"ref_34","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zeiler, M.D., and Fergus, R. (2014, January 6\u201312). Visualizing and understanding convolutional networks. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part I 13.","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"ref_39","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_43","unstructured":"Mirza, M., and Osindero, S. (2014). Conditional generative adversarial nets. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_45","unstructured":"Ting, Z., Guo-Jun, Q., Bin, X., and Jingdong, W. (2017, January 22\u201329). Interleaved group convolutions for deep neural networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1038\/35016072","article-title":"Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit","volume":"405","author":"Hahnloser","year":"2000","journal-title":"Nature"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Brostow, G.J., Shotton, J., Fauqueur, J., and Cipolla, R. (2008, January 12\u201318). Segmentation and recognition using structure from motion point clouds. Proceedings of the Computer Vision\u2013ECCV 2008: 10th European Conference on Computer Vision, Marseille, France. Proceedings, Part I 10.","DOI":"10.1007\/978-3-540-88682-2_5"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1016\/j.patrec.2008.04.005","article-title":"Semantic object classes in video: A high-definition ground truth database","volume":"30","author":"Brostow","year":"2009","journal-title":"Pattern Recognit. Lett."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Sturgess, P., Alahari, K., Ladicky, L., and Torr, P.H. (2009, January 7\u201310). Combining appearance and structure from motion features for road scene understanding. Proceedings of the BMVC-British Machine Vision Conference, London, UK.","DOI":"10.5244\/C.23.62"},{"key":"ref_52","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1183","DOI":"10.1109\/TII.2018.2849348","article-title":"Fast semantic segmentation for scene perception","volume":"15","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Ind. Inform."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/10\/2649\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:38:20Z","timestamp":1760125100000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/10\/2649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,19]]},"references-count":54,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["rs15102649"],"URL":"https:\/\/doi.org\/10.3390\/rs15102649","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,19]]}}}