{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T08:27:06Z","timestamp":1770452826798,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2019,3,3]],"date-time":"2019-03-03T00:00:00Z","timestamp":1551571200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1564211"],"award-info":[{"award-number":["U1564211"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key R&amp;D Program of China","award":["2016YFB0100904"],"award-info":[{"award-number":["2016YFB0100904"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.<\/jats:p>","DOI":"10.3390\/s19051089","type":"journal-article","created":{"date-parts":[[2019,3,4]],"date-time":"2019-03-04T05:45:36Z","timestamp":1551678336000},"page":"1089","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection"],"prefix":"10.3390","volume":"19","author":[{"given":"Ye","family":"Wang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130025, China"}]},{"given":"Zhenyi","family":"Liu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130025, China"}]},{"given":"Weiwen","family":"Deng","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130025, China"},{"name":"Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,3]]},"reference":[{"key":"ref_1","unstructured":"Tzomakas, C., and Seelen, W. (1998). Vehicle Detection in Traffic Scenes Using Shadows, Institut fur Neuroninformatic, Ruht-Universitat. Technical Report 98-06."},{"key":"ref_2","unstructured":"Mori, H., and Charkai, N. (1993, January 1\u20133). Shadow and Rhythm as Sign Patterns of Obstacle Detection. Proceedings of the Industrial Electronics Conference Proceedings, Budapest, Hungary."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/0167-8655(91)90039-O","article-title":"Symmetry-Based Recognition for Vehicle Rears","volume":"12","author":"Kuehnle","year":"1991","journal-title":"Pattern Recognit. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1006\/ciun.1993.1037","article-title":"Intensity and Edge Based Symmetry Detection with an Application to Car-Following","volume":"58","author":"Zielke","year":"1993","journal-title":"Cvgip Image Underst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1016\/0967-0661(96)00028-7","article-title":"Vehicle Detection and Recognition in Greyscale Imagery","volume":"4","author":"Matthews","year":"1996","journal-title":"Control Eng. Pract."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/S0262-8856(99)00032-3","article-title":"An Image Processing System for Driver Assistance","volume":"18","author":"Handmann","year":"2000","journal-title":"Image Vis. Comput."},{"key":"ref_7","unstructured":"Khammari, A., Lacroix, E., and Nashashibi, F. (2005, January 16). Vehicle detection combining gradient analysis and AdaBoost classification. Proceedings of the Intelligent Transportation Systems, Vienna, Austria."},{"key":"ref_8","unstructured":"Sun, Z., Bebis, G., and Miller, R. (2002, January 2\u20135). Quantized Wavelet Features and Support Vector Machines for On-Road Vehicle Detection. Proceedings of the IEEE Int. Conf. on Control, Automation, Robotics, and Vision, Singapore."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The PASCAL Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., and Ramos, S. (2016, January 27\u201330). The Cityscapes Dataset for Semantic Urban Scene Understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_12","unstructured":"(2018, February 05). Caffe Deep Learning Framework. Available online: http:\/\/caffe.berkeleyvision.org\/."},{"key":"ref_13","unstructured":"(2018, November 09). Tensorflow Deep Learning Framework. Available online: https:\/\/www.tensorflow.org\/."},{"key":"ref_14","unstructured":"(2018, October 01). Pytorch Deep Learning Framework. Available online: https:\/\/pytorch.org\/2018.10.1."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., and Darrell, T. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (arXiv, 2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks, arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., and Erhan, D. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Singh, B., and Davis, L.S. (arXiv, 2017). An Analysis of Scale Invariance in Object Detection\u2013SNIP, arXiv.","DOI":"10.1109\/CVPR.2018.00377"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Meng, Z., Fan, X., and Chen, X. (2017, January 4\u20136). Detecting Small Signs from Large Images. Proceedings of the IEEE International Conference on Information Reuse and Integration, San Diego, CA, USA.","DOI":"10.1109\/IRI.2017.57"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016, January 11\u201314). A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., and Jia, Y. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., van der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhou, P., Ni, B., and Geng, C. (2018, January 18\u201322). Scale-Transferrable Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00062"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kong, T., Sun, F., Yao, A., and Liu, H. (2017, January 21\u201326). Ron: Reverse connection with objectness prior networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.557"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (arXiv, 2017). Focal Loss for Dense Object Detection, arXiv.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_33","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_34","unstructured":"Hosang, J., Benenson, R., Doll\u00e1r, P., and Schiele, B. (arXiv, 2015). What makes for effective detection proposals?, arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/5\/1089\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:35:59Z","timestamp":1760186159000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/5\/1089"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,3]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["s19051089"],"URL":"https:\/\/doi.org\/10.3390\/s19051089","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,3]]}}}