{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T05:38:19Z","timestamp":1772602699317,"version":"3.50.1"},"reference-count":61,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2022,7,28]],"date-time":"2022-07-28T00:00:00Z","timestamp":1658966400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key R&amp;D Program of China","award":["2018YFC0809700"],"award-info":[{"award-number":["2018YFC0809700"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Pedestrian and vehicle detection plays a key role in the safe driving of autonomous vehicles. Although transformer-based object detection algorithms have made great progress, the accuracy of detection in rainy scenarios is still challenging. Based on the Swin Transformer, this paper proposes an end-to-end pedestrian and vehicle detection algorithm (PVformer) with deraining module, which improves the image quality and detection accuracy in rainy scenes. Based on Transformer blocks, a four-branch feature mapping model was introduced to achieve deraining from a single image, thereby mitigating the influence of rain streak occlusion on the detector performance. According to the trouble of small object detection only by visual transformer, we designed a local enhancement perception block based on CNN and Transformer. In addition, the deraining module and the detection module were combined to train the PVformer model through transfer learning. The experimental results show that the algorithm performed well on rainy days and significantly improved the accuracy of pedestrian and vehicle detection.<\/jats:p>","DOI":"10.3390\/s22155667","type":"journal-article","created":{"date-parts":[[2022,7,28]],"date-time":"2022-07-28T22:43:26Z","timestamp":1659048206000},"page":"5667","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["PVformer: Pedestrian and Vehicle Detection Algorithm Based on Swin Transformer in Rainy Scenes"],"prefix":"10.3390","volume":"22","author":[{"given":"Zaiming","family":"Sun","sequence":"first","affiliation":[{"name":"School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chang\u2019an","family":"Liu","sequence":"additional","affiliation":[{"name":"Information College, North China University of Technology, Beijing 100144, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongquan","family":"Qu","sequence":"additional","affiliation":[{"name":"Information College, North China University of Technology, Beijing 100144, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guangda","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, X., Wu, J., Lin, Z., Liu, H., and Zha, H. (2018, January 8\u201314). Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_16"},{"key":"ref_2","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"662","DOI":"10.1016\/j.apm.2018.03.001","article-title":"A directional global sparse model for single image rain removal","volume":"59","author":"Deng","year":"2018","journal-title":"Appl. Math. Model."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3936","DOI":"10.1109\/TIP.2017.2708502","article-title":"A hierarchical approach for rain or snow removing in a single color image","volume":"26","author":"Wang","year":"2017","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhu, L., Fu, C.-W., Lischinski, D., and Heng, P.-A. (2017, January 22\u201329). Joint bi-layer optimization for single-image rain streak removal. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.276"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, C., Wu, Y., Su, Z., and Chen, J. (2020, January 12\u201316). Joint self-attention and scale-aggregation for self-calibrated deraining network. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413559"},{"key":"ref_7","unstructured":"Shen, Y., Feng, Y., Deng, S., Liang, D., Qin, J., Xie, H., and Wei, M. (2020). Mba-raingan: Multi-branch attention generative adversarial network for mixture of rain removal from single images. arXiv."},{"key":"ref_8","unstructured":"Park, Y., Jeon, M., Lee, J., and Kang, M. (2020). MARA-Net: Single Image Deraining Network with Multi-level connections and Adaptive Regional Attentions. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chang, Y., Yan, L., and Zhong, S. (2017, January 22\u201329). Transformed low-rank model for line pattern noise removal. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.191"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, Y., Tan, R.T., Guo, X., Lu, J., and Brown, M.S. (2016, January 27\u201330). Rain streak removal using layer priors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.299"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ketkar, N., and Moolayil, J. (2021). Convolutional neural networks. Deep Learning with Python, Springer.","DOI":"10.1007\/978-1-4842-5364-9"},{"key":"ref_12","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_13","first-page":"84","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1109\/TPAMI.2015.2437384","article-title":"Region-based convolutional networks for accurate object detection and segmentation","volume":"38","author":"Girshick","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_19","first-page":"1137","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3703","DOI":"10.1109\/TIP.2018.2818018","article-title":"Too far to see? Not really!\u2014Pedestrian detection with scale-aware localization policy","volume":"27","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"25985","DOI":"10.1007\/s11042-021-10954-5","article-title":"On-road vehicle detection in varying weather conditions using faster R-CNN with several region proposal networks","volume":"80","author":"Ghosh","year":"2021","journal-title":"Multimed. Tools Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"795","DOI":"10.1166\/jno.2017.2138","article-title":"A Hybrid Equivalent Model of High Frequency Planar Transformer","volume":"12","author":"Ma","year":"2017","journal-title":"J. Nanoelectron. Optoelectron."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1166\/jno.2021.2971","article-title":"Evaluation of breakdown voltage and water content in transformer oil using multi frequency ultrasonic and generalized regression neural network","volume":"16","author":"Su","year":"2021","journal-title":"J. Nanoelectron. Optoelectron."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1168","DOI":"10.1166\/jno.2019.2664","article-title":"The Method for On-Line Monitoring of tan\u03b4 of Transformer Bushing Based on Conductor Temperature Distribution","volume":"14","author":"Yang","year":"2019","journal-title":"J. Nanoelectron. Optoelectron."},{"key":"ref_26","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_27","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_28","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_29","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_30","unstructured":"Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., and Tran, D. (2018, January 10\u201315). Image transformer. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_31","unstructured":"Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. (2020, January 12\u201318). Generative pretraining from pixels. Proceedings of the International Conference on Machine Learning, Vienna, Austria."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_33","unstructured":"Tan, F., Kong, Y., Fan, Y., Liu, F., Zhou, D., Chen, L., Gao, L., and Qian, Y. (2021). SDNet: Mutil-branch for single image deraining using swin. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009, January 14\u201318). Online dictionary learning for sparse coding. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553463"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1109\/TPAMI.2012.88","article-title":"Robust recovery of subspace structures by low-rank representation","volume":"35","author":"Liu","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1006\/dspr.1999.0361","article-title":"Speaker verification using adapted Gaussian mixture models","volume":"10","author":"Reynolds","year":"2000","journal-title":"Digit. Signal Processing"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1742","DOI":"10.1109\/TIP.2011.2179057","article-title":"Automatic single-image-based rain streaks removal via image decomposition","volume":"21","author":"Kang","year":"2011","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Luo, Y., Xu, Y., and Ji, H. (2015, January 7\u201313). Removing rain from a single image via discriminative sparse coding. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.388"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3943","DOI":"10.1109\/TCSVT.2019.2920407","article-title":"Image de-raining using a conditional generative adversarial network","volume":"30","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4544","DOI":"10.1109\/TIP.2020.2973802","article-title":"Confidence measure guided single image de-raining","volume":"29","author":"Yasarla","year":"2020","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yasarla, R., Sindagi, V.A., and Patel, V.M. (2020, January 13\u201319). Syn2real transfer learning for image deraining using gaussian processes. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00280"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hu, J., Sun, Y., and Xiong, S. (2021). Research on the Cascade Vehicle Detection Method Based on CNN. Electronics, 10.","DOI":"10.3390\/electronics10040481"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016, January 8\u201316). A unified multi-scale deep convolutional neural network for fast object detection. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1049\/ipr2.12297","article-title":"Accurate and efficient vehicle detection framework based on SSD algorithm","volume":"15","author":"Zhao","year":"2021","journal-title":"IET Image Processing"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1007\/s11082-019-1977-7","article-title":"Vehicle detection based on visual attention mechanism and adaboost cascade classifier in intelligent transportation systems","volume":"51","author":"Chen","year":"2019","journal-title":"Opt. Quantum Electron."},{"key":"ref_46","unstructured":"Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. (2018). Crowdhuman: A benchmark for detecting human in a crowd. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Wanchaitanawong, N., Tanaka, M., Shibata, T., and Okutomi, M. (2021, January 25\u201327). Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU. Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA), Nagoya, Japan.","DOI":"10.23919\/MVA51890.2021.9511366"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"14593","DOI":"10.1007\/s11042-018-7143-6","article-title":"Pedestrian object detection with fusion of visual attention mechanism and semantic computation","volume":"79","author":"Xiao","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1965","DOI":"10.1007\/s11554-021-01074-2","article-title":"An improved one-stage pedestrian detection method based on multi-scale attention feature extraction","volume":"18","author":"Ma","year":"2021","journal-title":"J. Real-Time Image Processing"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 20\u201325). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Tan, F., Qian, Y., Kong, Y., Zhang, H., Zhou, D., Fan, Y., and Chen, L. (2021). DBSwin: Transformer based dual branch network for single image deraining. J. Intell. Fuzzy Syst., 1\u201315.","DOI":"10.2139\/ssrn.3993046"},{"key":"ref_52","unstructured":"Valanarasu, J.M.J., Yasarla, R., and Patel, V.M. (2022, January 19\u201320). Transweather: Transformer-based restoration of images degraded by adverse weather conditions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Xu, X., Feng, Z., Cao, C., Li, M., Wu, J., Wu, Z., Shang, Y., and Ye, S. (2021). An improved swin transformer-based model for remote sensing object detection and instance segmentation. Remote Sens., 13.","DOI":"10.3390\/rs13234779"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Yang, W., Tan, R.T., Feng, J., Liu, J., Guo, Z., and Yan, S. (2017, January 21\u201326). Deep joint rain detection and removal from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.183"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Qian, R., Tan, R.T., Yang, W., Su, J., and Liu, J. (2018, January 18\u201323). Attentive generative adversarial network for raindrop removal from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00263"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Li, S., Araujo, I.B., Ren, W., Wang, Z., Tokuda, E.K., Junior, R.H., Cesar-Junior, R., Zhang, J., Guo, X., and Cao, X. (2019, January 15\u201320). Single image deraining: A comprehensive benchmark analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00396"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"102907","DOI":"10.1016\/j.cviu.2020.102907","article-title":"UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking","volume":"193","author":"Wen","year":"2020","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_61","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5667\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:58:36Z","timestamp":1760140716000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,28]]},"references-count":61,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22155667"],"URL":"https:\/\/doi.org\/10.3390\/s22155667","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,28]]}}}