{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T08:25:33Z","timestamp":1769847933231,"version":"3.49.0"},"reference-count":37,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,23]],"date-time":"2020-12-23T00:00:00Z","timestamp":1608681600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100013058","name":"Jiangsu Provincial Key Research and Development Program","doi-asserted-by":"publisher","award":["BE2019106"],"award-info":[{"award-number":["BE2019106"]}],"id":[{"id":"10.13039\/501100013058","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61801227"],"award-info":[{"award-number":["61801227"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Qing Lan Project of Jiangsu Province","award":["QLGC2020"],"award-info":[{"award-number":["QLGC2020"]}]},{"name":"the Natural Science Foundation of the Jiangsu Higher Education Institutions of China","award":["18KJB413007"],"award-info":[{"award-number":["18KJB413007"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>As an effective means of solving collision problems caused by the limited perspective on board, the cooperative roadside system is gaining popularity. To improve the vehicle detection abilities in such online safety systems, in this paper, we propose a novel multi-sensor multi-level enhanced convolutional network model, called multi-sensor multi-level enhanced convolutional network architecture (MME-YOLO), with consideration of hybrid realistic scene of scales, illumination, and occlusion. MME-YOLO consists of two tightly coupled structures, i.e., the enhanced inference head and the LiDAR-Image composite module. More specifically, the enhanced inference head preliminarily equips the network with stronger inference abilities for redundant visual cues by attention-guided feature selection blocks and anchor-based\/anchor-free ensemble head. Furthermore, the LiDAR-Image composite module cascades the multi-level feature maps from the LiDAR subnet to the image subnet, which strengthens the generalization of the detector in complex scenarios. Compared with YOLOv3, the enhanced inference head achieves a 5.83% and 4.88% mAP improvement on visual dataset LVSH and UA-DETRAC, respectively. Integrated with the composite module, the overall architecture gains 91.63% mAP in the collected Road-side Dataset. Experiments show that even under the abnormal lightings and the inconsistent scales at evening rush hours, the proposed MME-YOLO maintains reliable recognition accuracy and robust detection performance.<\/jats:p>","DOI":"10.3390\/s21010027","type":"journal-article","created":{"date-parts":[[2020,12,23]],"date-time":"2020-12-23T12:19:51Z","timestamp":1608725991000},"page":"27","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance"],"prefix":"10.3390","volume":"21","author":[{"given":"Jianxiao","family":"Zhu","sequence":"first","affiliation":[{"name":"School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China"}]},{"given":"Xu","family":"Li","sequence":"additional","affiliation":[{"name":"School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China"}]},{"given":"Peng","family":"Jin","sequence":"additional","affiliation":[{"name":"School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China"}]},{"given":"Qimin","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China"}]},{"given":"Zhengliang","family":"Sun","sequence":"additional","affiliation":[{"name":"Traffic Management Research Institute, Ministry of Public Security, Wuxi 214151, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1704-4339","authenticated-orcid":false,"given":"Xiang","family":"Song","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering, Nanjing Xiaozhuang University, Nanjing 211171, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,23]]},"reference":[{"key":"ref_1","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_2","first-page":"1","article-title":"Robust principal component analysis?","volume":"58","author":"Li","year":"2011","journal-title":"J. ACM."},{"key":"ref_3","unstructured":"Bakti, R.Y., Areni, I.S., and Prayogi, A.A. (2016, January 22\u201324). Vehicle detection and tracking using gaussian mixture model and kalman filter. Proceedings of the International Conference on Computational Intelligence and Cybernetics, Makassar, Indonesia."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/j.ins.2014.12.033","article-title":"Probabilistic neural networks based moving vehicles extraction algorithm for intelligent traffic surveillance systems","volume":"299","author":"Chen","year":"2015","journal-title":"Inf. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_6","first-page":"2702","article-title":"The apolloscape open dataset for autonomous driving and its application","volume":"42","author":"Wang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., and Darrell, T. (2018). Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T., and Le, Q.V. (2019, January 16\u201320). Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_11","unstructured":"Liu, S., Huang, D., and Wang, Y. (2019). Learning spatial fusion for single-shot object detection. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVF), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Minemura, K., Liau, H., Monrroy, A., and Kato, S. (2018, January 21\u201323). LMNet: Real-time multiclass object detection on CPU using 3D LiDAR. Proceedings of the 3rd Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), Singapore.","DOI":"10.1109\/ACIRS.2018.8467245"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., and Tang, X. (2017, January 21\u201326). Residual attention network for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.683"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J., and So Kweon, I. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 16\u201320). Selective kernel networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_20","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2020). Resnest: Split-attention networks. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhu, C., He, Y., and Savvides, M. (2019, January 16\u201320). Feature selective anchor-free module for single-shot object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00093"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_23","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, B., Zhang, T., and Xia, T. (2016). Vehicle detection from 3d lidar using fully convolutional network. arXiv.","DOI":"10.15607\/RSS.2016.XII.042"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/BF02187695","article-title":"Generalized Delaunay triangulation for planar graphs","volume":"1","author":"Lee","year":"1986","journal-title":"Discrete Comput. Geom."},{"key":"ref_26","unstructured":"Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., and Ling, H. (2020, January 7\u201312). CBNet: A Novel Composite Backbone Network Architecture for Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-20), New York, NY, USA."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1010","DOI":"10.1109\/TITS.2018.2838132","article-title":"SINet: A scale-insensitive convolutional neural network for fast vehicle detection","volume":"20","author":"Hu","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","unstructured":"Wen, L., Du, D., Cai, Z., Lei, Z., Chang, M., Qi, H., Lim, J., Yang, M., and Lyu, S. (2015). UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv."},{"key":"ref_29","unstructured":"Bochkovskiy, A., Wang, C., and Liao, H.M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, F., Li, C., and Yang, F. (2019). Vehicle detection in urban traffic surveillance images based on convolutional neural networks with feature concatenation. Sensors, 19.","DOI":"10.3390\/s19030594"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"72660","DOI":"10.1109\/ACCESS.2019.2919103","article-title":"CMNet: A connect-and-merge convolutional neural network for fast vehicle detection in urban traffic surveillance","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast feature pyramids for object detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wen, L., Bian, X., Lei, Z., and Li, S.Z. (2018, January 18\u201323). Single-shot refinement neural network for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00442"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/27\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:48:43Z","timestamp":1760179723000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/27"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,23]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["s21010027"],"URL":"https:\/\/doi.org\/10.3390\/s21010027","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,23]]}}}