{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T05:04:32Z","timestamp":1775451872959,"version":"3.50.1"},"reference-count":68,"publisher":"EDP Sciences","license":[{"start":{"date-parts":[[2024,3,18]],"date-time":"2024-03-18T00:00:00Z","timestamp":1710720000000},"content-version":"vor","delay-in-days":77,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62133011"],"award-info":[{"award-number":["62133011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Security and Safety"],"accepted":{"date-parts":[[2024,2,4]]},"published-print":{"date-parts":[[2024]]},"abstract":"<jats:p>Deep learning based on labeled data has brought massive success in computer vision, speech recognition, and natural language processing. Nevertheless, labeled data is just a drop in the ocean compared with unlabeled data. How can people utilize the unlabeled data effectively? Research has focused on unsupervised and semi-supervised learning to solve such a problem. Some theoretical and empirical studies have proved that unlabeled data can help boost the generalization ability and robustness under adversarial attacks. However, current theoretical research on the relationship between robustness and unlabeled data limits its scope to toy datasets. Meanwhile, the visual models in autonomous driving need a significant improvement in robustness to guarantee security and safety. This paper proposes a semi-supervised learning framework for object detection in autonomous vehicles, improving the robustness with unlabeled data. Firstly, we build a baseline with the transfer learning of an unsupervised contrastive learning method\u2014Momentum Contrast (MoCo). Secondly, we propose a semi-supervised co-training method to label the unlabeled data for retraining, which improves generalization on the autonomous driving dataset. Thirdly, we apply the unsupervised Bounding Box data augmentation (BBAug) method based on a search algorithm, which uses reinforcement learning to improve the robustness of object detection for autonomous driving. We present an empirical study on the KITTI dataset with diverse adversarial attack methods. Our proposed method realizes the state-of-the-art generalization and robustness under white-box attacks (DPatch and Contextual Patch) and black-box attacks (Gaussian noise, Rain, Fog, and so on). Our proposed method and empirical study show that using more unlabeled data benefits the robustness of perception systems in autonomous driving.<\/jats:p>","DOI":"10.1051\/sands\/2024002","type":"journal-article","created":{"date-parts":[[2024,2,4]],"date-time":"2024-02-04T19:58:52Z","timestamp":1707076732000},"page":"2024002","source":"Crossref","is-referenced-by-count":2,"title":["Robust object detection for autonomous driving based on semi-supervised learning"],"prefix":"10.1051","volume":"3","author":[{"given":"Wenwen","family":"Chen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Yan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiquan","family":"Huang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wancheng","family":"Ge","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huaping","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huilin","family":"Yin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"250","published-online":{"date-parts":[[2024,3,18]]},"reference":[{"key":"R1","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc IEEE"},{"key":"R2","unstructured":"Krizhevsky A, Sutskever I and Hinton GE. Imagenet classification with deep convolutional neural networks. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2012, 1097\u20131105."},{"key":"R3","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"R4","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 4700\u20134708.","DOI":"10.1109\/CVPR.2017.243"},{"key":"R5","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L and Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 7132\u20137141.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"R6","unstructured":"Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks, 2015, 91\u201399."},{"key":"R7","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: European conference on computer vision (ECCV), 2016, 21\u201337.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"R8","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 779\u2013788.","DOI":"10.1109\/CVPR.2016.91"},{"key":"R9","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, et al. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, 2961\u20132969.","DOI":"10.1109\/ICCV.2017.322"},{"key":"R10","doi-asserted-by":"crossref","unstructured":"Redmon J and Farhadi A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 7263\u20137271.","DOI":"10.1109\/CVPR.2017.690"},{"key":"R11","unstructured":"Redmon J and Farhadi A. YOLOv3: An incremental improvement, arXiv preprint https:\/\/arxiv.org\/abs\/1804.02767, 2018."},{"key":"R12","unstructured":"Bochkovskiy A, Wang CY and Mark Liao HY. YOLOv4: Optimal speed and accuracy of object detection, arXiv preprint https:\/\/arxiv.org\/abs\/2004.10934, 2020."},{"key":"R13","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E and Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 3431\u20133440.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"R14","doi-asserted-by":"crossref","unstructured":"Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, 2881\u20132890.","DOI":"10.1109\/CVPR.2017.660"},{"key":"R15","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"R16","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV), Springer, 2016, 630\u2013645.","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"R17","unstructured":"Simonyan K and Zisserman A. Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR), 2015."},{"key":"R18","unstructured":"Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations (ICLR), 2014."},{"key":"R19","unstructured":"Goodfellow IJ, Shlens J and Szegedy C. Explaining and harnessing adversarial examples. In: 3rd International Conference on Learning Representations (ICLR), 2015."},{"key":"R20","unstructured":"Zhai R, Cai T, He D, et al. Adversarially robust generalization just requires more unlabeled data, arXiv preprint https:\/\/arxiv.org\/abs\/1906.00555, 2019."},{"key":"R21","unstructured":"Alayrac JB, Uesato J, Huang PS, et al. Are labels required for improving adversarial robustness? In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, 12192\u201312202."},{"key":"R22","unstructured":"Najafi A, Maeda SI, Koyama M, et al. Robustness to adversarial perturbations in learning from incomplete data. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, 5542\u20135552."},{"key":"R23","unstructured":"Carmon Y, Raghunathan A, Schmidt L, et al. Unlabeled data improves adversarial robustness. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, 11190\u201311201."},{"key":"R24","unstructured":"Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML), PMLR, 2020, 1597\u20131607."},{"key":"R25","unstructured":"Chen T, Kornblith S, Swersky K, et al. Big self-supervised models are strong semi-supervised learners. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2020."},{"key":"R26","doi-asserted-by":"crossref","unstructured":"He K, Fan H, Wu Y, et al. Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 9726\u20139735.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"R27","unstructured":"Chen X, Fan H, Girshick RB, et al. Improved baselines with momentum contrastive learning, arXiv preprint https:\/\/arxiv.org\/abs\/2003.04297, 2020."},{"key":"R28","unstructured":"ISO. Road vehicles \u2013 safety of the intended functionality. In: International Organization for Standardization: ISO\/DIS 21448, 2021."},{"key":"R29","unstructured":"Krizhevsky A and Hinton G. A Learning Multiple Layers of Features from Tiny Images, 2009, http:\/\/www.cs.toronto.edu\/~kriz\/cifar.html"},{"key":"R30","unstructured":"LeCun Y and Cortes C. MNIST Handwritten Digit Database, 2010."},{"key":"R31","doi-asserted-by":"crossref","unstructured":"Caesar H, Bankiti V, Lang AH, et al. nuScenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 11621\u201311631.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"R32","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","volume":"32","author":"Geiger","year":"2013","journal-title":"Int J Robot Res"},{"key":"R33","unstructured":"Zou Z, Shi Z, Guo Y, et al. Object detection in 20 years: A survey, arXiv preprint https:\/\/arxiv.org\/abs\/1905.05055, 2019."},{"key":"R34","doi-asserted-by":"crossref","unstructured":"Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 4203\u20134212.","DOI":"10.1109\/CVPR.2018.00442"},{"key":"R35","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2014, 580\u2013587.","DOI":"10.1109\/CVPR.2014.81"},{"key":"R36","doi-asserted-by":"crossref","unstructured":"Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of the IEEE international Conference on Computer Vision (ICCV), 2017, 2980\u20132988.","DOI":"10.1109\/ICCV.2017.324"},{"key":"R37","unstructured":"Tanay T and Griffin LD. A boundary tilting persepective on the phenomenon of adversarial examples, arXiv preprint https:\/\/arxiv.org\/abs\/1608.07690, 2016."},{"key":"R38","doi-asserted-by":"crossref","first-page":"14410","DOI":"10.1109\/ACCESS.2018.2807385","volume":"6","author":"Akhtar","year":"2018","journal-title":"IEEE Access"},{"key":"R39","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Jha S, et al. The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy (EuroS&P), IEEE, 2016, 372\u2013387.","DOI":"10.1109\/EuroSP.2016.36"},{"key":"R40","doi-asserted-by":"crossref","unstructured":"Carlini N and Wagner D. Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Pprivacy (S &P), IEEE, 2017, 39\u201357.","DOI":"10.1109\/SP.2017.49"},{"key":"R41","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli SM, Fawzi A and Frossard P. Deepfool: A simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 2574\u20132582.","DOI":"10.1109\/CVPR.2016.282"},{"key":"R42","unstructured":"Baluja S and Fischer I. Adversarial transformation networks: learning to generate adversarial examples, arXiv preprint https:\/\/arxiv.org\/abs\/1703.09387, 2017."},{"key":"R43","unstructured":"Liu X, Yang H, Liu Z, et al. DPATCH: An adversarial patch attack on object detectors. In: Workshop on Artificial Intelligence Safety co-located with the Thirty-Third AAAI Conference on Artificial Intelligence, Volume 2301 of CEUR Workshop Proceedings, 2019."},{"key":"R44","doi-asserted-by":"crossref","unstructured":"Saha A, Subramanya A, Patil K, et al. Role of spatial context in adversarial robustness for object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, 784\u2013785.","DOI":"10.1109\/CVPRW50498.2020.00400"},{"key":"R45","unstructured":"Hendrycks D and Dietterich TG. Benchmarking neural network robustness to common corruptions and perturbations. In: 7th International Conference on Learning Representations (ICLR), 2019."},{"key":"R46","unstructured":"Michaelis C, Mitzkus B, Geirhos R, et al. Benchmarking robustness in object detection: Autonomous driving when winter is coming, arXiv preprint https:\/\/arxiv.org\/abs\/1907.07484, 2019."},{"key":"R47","unstructured":"Caron M, Misra I, Mairal J, et al. Unsupervised learning of visual features by contrasting cluster assignments. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2020."},{"key":"R48","unstructured":"Grill JB, Strub F, Altch\u00e9 F, et al. Bootstrap your own latent \u2013 A new approach to self-supervised learning. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2020."},{"key":"R49","doi-asserted-by":"crossref","first-page":"1979","DOI":"10.1109\/TPAMI.2018.2858821","volume":"41","author":"Miyato","year":"2018","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"R50","unstructured":"Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations (ICLR), 2018."},{"key":"R51","doi-asserted-by":"crossref","unstructured":"Blum A and Mitchell T. Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT), 1998, 92\u2013100.","DOI":"10.1145\/279943.279962"},{"key":"R52","unstructured":"Lee DH, et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning of International Conference on Machine Learning ICML, 2013, 896."},{"key":"R53","unstructured":"van den Oord A, Li Y and Vinyals O. Representation learning with contrastive predictive coding, arXiv preprint https:\/\/arxiv.org\/abs\/1807.03748, 2018."},{"key":"R54","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009, 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"R55","doi-asserted-by":"crossref","unstructured":"He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 558\u2013567.","DOI":"10.1109\/CVPR.2019.00065"},{"key":"R56","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/j.isprsjprs.2017.11.004","volume":"145","author":"Han","year":"2018","journal-title":"ISPRS J Photogr Remote Sens"},{"key":"R57","doi-asserted-by":"crossref","unstructured":"Zoph B, Cubuk ED, Ghiasi G, et al. Learning data augmentation strategies for object detection. In: European Conference on Computer Vision (ECCV), 2020, 566\u2013583.","DOI":"10.1007\/978-3-030-58583-9_34"},{"key":"R58","doi-asserted-by":"crossref","unstructured":"Cubuk ED, Zoph B, Man\u00e9 D, et al. Autoaugment: Learning augmentation policies from data, arXiv preprint https:\/\/arxiv.org\/abs\/1805.09501, 2018.","DOI":"10.1109\/CVPR.2019.00020"},{"key":"R59","unstructured":"Everingham M, Van Gool L, Williams CKI, et al. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results, http:\/\/www.pascal-network.org\/challenges\/VOC\/voc2007\/workshop\/index.html"},{"key":"R60","doi-asserted-by":"crossref","unstructured":"Tian Z, Shen C, Chen H, et al. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, 9627\u20139636.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"R61","doi-asserted-by":"crossref","unstructured":"Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision (CVPR), 2019, 6569\u20136578.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"R62","unstructured":"Terven J and Cordova-Esparza D. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond, arXiv preprint https:\/\/arxiv.org\/abs\/2304.00501, 2023."},{"key":"R63","unstructured":"Tarvainen A and Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2017, 1195\u20131204."},{"key":"R64","unstructured":"Liu YC, Ma CY, He Z, et al. Unbiased teacher for semi-supervised object detection. In: 9th International Conference on Learning Representations (ICLR), 2021."},{"key":"R65","doi-asserted-by":"crossref","unstructured":"Kirillov A, Mintun E, Ravi N, et al. Segment anything, arXiv preprint https:\/\/arxiv.org\/abs\/2304.02643, 2023.","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"R66","unstructured":"Tsipras D, Santurkar S, Engstrom L, et al. Robustness may be at odds with accuracy. In: 7th International Conference on Learning Representations (ICLR), 2019."},{"key":"R67","unstructured":"Ilyas A, Santurkar S, Tsipras D, et al. Adversarial examples are not bugs, they are features. In: Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, 125\u2013136."},{"key":"R68","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3592433","volume":"42","author":"Kerbl","year":"2023","journal-title":"ACM Trans Graph"}],"container-title":["Security and Safety"],"original-title":[],"link":[{"URL":"https:\/\/sands.edpsciences.org\/10.1051\/sands\/2024002\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T09:17:20Z","timestamp":1710926240000},"score":1,"resource":{"primary":{"URL":"https:\/\/sands.edpsciences.org\/10.1051\/sands\/2024002"}},"subtitle":[],"editor":[{"given":"Hao","family":"Zhang","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Yu-Gang","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Claudio","family":"Melchiorri","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Gerhard","family":"Rigoll","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":68,"alternative-id":["sands20230024"],"URL":"https:\/\/doi.org\/10.1051\/sands\/2024002","relation":{},"ISSN":["2826-1275"],"issn-type":[{"value":"2826-1275","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024]]}}}