{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T15:45:12Z","timestamp":1772639112300,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T00:00:00Z","timestamp":1681084800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62263004"],"award-info":[{"award-number":["62263004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["PF20069"],"award-info":[{"award-number":["PF20069"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangxi Natural Science Foundation","award":["62263004"],"award-info":[{"award-number":["62263004"]}]},{"name":"Guangxi Natural Science Foundation","award":["PF20069"],"award-info":[{"award-number":["PF20069"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset.<\/jats:p>","DOI":"10.3390\/s23083853","type":"journal-article","created":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T03:24:18Z","timestamp":1681097058000},"page":"3853","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":74,"title":["Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8200-5259","authenticated-orcid":false,"given":"Shuqi","family":"Fang","sequence":"first","affiliation":[{"name":"School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China"}]},{"given":"Bin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China"}]},{"given":"Jingyu","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronic and Automation, Guilin University of Electronic Technology, Guilin 541004, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1002\/rob.21918","article-title":"A survey of deep learning techniques for autonomous driving","volume":"37","author":"Grigorescu","year":"2022","journal-title":"J. Field Robot."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/0600000079","article-title":"Computer vision for autonomous vehicles: Problems, datasets and state of the art","volume":"12","author":"Janai","year":"2020","journal-title":"Found. Trends\u00ae Comput. Graph. Vis."},{"key":"ref_3","first-page":"16","article-title":"A survey of instance segmentation research based on deep learning","volume":"17","author":"Su","year":"2022","journal-title":"CAAI Trans. Intell. Syst."},{"key":"ref_4","unstructured":"Joseph, R., Santosh, D., Ross, G., and Ali, F. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-F., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Part I.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_9","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2001). Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bai, M., and Urtasun, R. (2017, January 21\u201326). Deep watershed transform for instance segmentation. Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.305"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gao, N.-Y., Shan, Y., Wang, Y., Zhao, X., Yu, Y., Yang, M., and Huang, K. (2019, January 27\u201328). Ssap: Single-shot instance segmentation with affinity pyramid. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00073"},{"key":"ref_12","unstructured":"Dai, J.-F., He, K., and Sun, J. (July, January 26). Instance-aware semantic segmentation via multi-task network cascades. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, H., Dai, J., Ji, X., and Wei, Y. (2017, January 21\u201326). Fully convolutional instance-aware semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.472"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bolya, D., Zhou, C., Xiao, F., and Lee, Y.J. (2019, January 27\u201328). Yolact: Real-time instance segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00925"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2020, January 23\u201328). Solo: Segmenting objects by locations. Proceedings of the Computer Vision\u2014ECCV 2020: 16th European Conference, Glasgow, UK. Part XVIII.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ke, L., Tai, Y.-W., and Tang, C.-K. (2021, January 20\u201325). Deep occlusion-aware instance segmentation with overlapping bilayers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00401"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, T., Wei, S., and Ji, S. (2022, January 18\u201324). E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00440"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, J.-J., Li, P., Geng, Y., and Xie, X. (2023). FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation. arXiv.","DOI":"10.1109\/CVPR52729.2023.02266"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhang, H., Li, F., Xu, H., Huang, S., Liu, S., Ni, L.M., and Zhang, L. (2023). MP-Former: Mask-Piloted Transformer for Image Segmentation. arXiv.","DOI":"10.1109\/CVPR52729.2023.01733"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"58443","DOI":"10.1109\/ACCESS.2020.2983149","article-title":"A survey of autonomous driving: Common practices and emerging technologies","volume":"12","author":"Yurtsever","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Peng, Y., Liu, X., Shen, C., Huang, H., Zhao, D., Cao, H., and Guo, X. (2019). An improved optical flow algorithm based on mask-R-CNN and K-means for velocity calculation. Appl. Sci., 9.","DOI":"10.3390\/app9142808"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"032124","DOI":"10.1088\/1742-6596\/1802\/3\/032124","article-title":"Analysis and Comparison of Three Classical Color Image Interpolation Algorithms","volume":"1802","author":"Lu","year":"2021","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_26","unstructured":"Vinod, N., and Hinton, G.E. (2010, January 21\u201324). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel."},{"key":"ref_27","unstructured":"Jonathan, L., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, Q.-L., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient channel attention for deep convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1007\/s41095-022-0271-y","article-title":"Attention mechanisms in computer vision: A survey","volume":"8","author":"Guo","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hu, J., Li, S., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4795396","DOI":"10.1155\/2021\/4795396","article-title":"Research on Surface Defect Detection of Rare-Earth Magnetic Materials Based on Improved SSD","volume":"2021","author":"Zhang","year":"2021","journal-title":"Complexity"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"8574","DOI":"10.1109\/TCYB.2021.3095305","article-title":"Enhancing geometric factors in model learning and inference for object detection and instance segmentation","volume":"52","author":"Zheng","year":"2022","journal-title":"IEEE Trans. Cybern."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., and Doll\u00e1r, P. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Part V.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/8\/3853\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:12:59Z","timestamp":1760123579000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/8\/3853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,10]]},"references-count":36,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["s23083853"],"URL":"https:\/\/doi.org\/10.3390\/s23083853","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,10]]}}}