{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:58:24Z","timestamp":1760241504663,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,4,2]],"date-time":"2018-04-02T00:00:00Z","timestamp":1522627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61771107"],"award-info":[{"award-number":["61771107"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.<\/jats:p>","DOI":"10.3390\/s18041063","type":"journal-article","created":{"date-parts":[[2018,4,2]],"date-time":"2018-04-02T12:32:20Z","timestamp":1522672340000},"page":"1063","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0294-7957","authenticated-orcid":false,"given":"Xinchuan","family":"Fu","sequence":"first","affiliation":[{"name":"National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Rui","family":"Yu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London, London WC1E 6BT, UK"}]},{"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science &amp; Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}]},{"given":"Jie","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of MOE Research Center for Software\/Hardware Co-Design Engineering and Application, East China Normal University, Shanghai 200062, China"}]},{"given":"Shihai","family":"Shao","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China"}]}],"member":"1968","published-online":{"date-parts":[[2018,4,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r, P., Wojek, C., Schiele, B., and Perona, P. (2009, January 20\u201325). Pedestrian detection: A benchmark. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206631"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_3","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ess, A., Leibe, B., Schindler, K., and Gool, L.J.V. (2008, January 24\u201326). A mobile vision system for robust multi-person tracking. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587581"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","article-title":"Fast Feature Pyramids for Object Detection","volume":"36","author":"Appel","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhang, S., Benenson, R., and Schiele, B. (2015, January 7\u201312). Filtered channel features for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298784"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1109\/TITS.2016.2594816","article-title":"Fast and Efficient Pedestrian Detection via the Cascade Implementation of an Additive Kernel Support Vector Machine","volume":"18","author":"Baek","year":"2017","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sun, R., Zhang, G., Yan, X., and Gao, J. (2016). Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation. Sensors, 16.","DOI":"10.3390\/s16081296"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kim, J.H., Hong, H.G., and Park, K.R. (2017). Convolutional Neural Network-Based Human Detection in Nighttime Images Using Visible Light Camera Sensors. Sensors, 17.","DOI":"10.3390\/s17051065"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"He, M., Luo, H., Chang, Z., and Hui, B. (2017). Pedestrian Detection with Semantic Regions of Interest. Sensors, 17.","DOI":"10.3390\/s17112699"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1109\/TPAMI.2009.122","article-title":"Survey of Pedestrian Detection for Advanced Driver Assistance Systems","volume":"32","author":"Sappa","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Du, X., El-Khamy, M., Lee, J., and Davis, L.S. (2017, January 24\u201331). Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.111"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Brazil, G., Yin, X., and Liu, X. (2017, January 22\u201329). Illuminating Pedestrians via Simultaneous Detection and Segmentation. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.530"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Costea, A.D., and Nedevschi, S. (2016, January 27\u201330). Semantic Channels for Fast Pedestrian Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.259"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ohn-Bar, E., and Trivedi, M.M. (2016, January 4\u20138). To boost or not to boost? On the limits of boosted trees for object detection. Proceedings of the 23rd International Conference on Pattern Recognition, ICPR 2016, Canc\u00fan, Mexico.","DOI":"10.1109\/ICPR.2016.7900151"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Costea, A.D., Vesa, A.V., and Nedevschi, S. (2015, January 15\u201318). Fast Pedestrian Detection for Mobile Devices. Proceedings of the IEEE 18th International Conference on Intelligent Transportation Systems, ITSC 2015, Gran Canaria, Spain.","DOI":"10.1109\/ITSC.2015.382"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Costea, A.D., and Nedevschi, S. (2014, January 23\u201328). Word Channel Based Multiscale Pedestrian Detection without Image Resizing and Using Only One Classifier. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.307"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Fu, X., Yu, R., and Shao, S. (2018, January 11\u201314). Fast Pedestrian Detection Using Scale-aware Pooling. Proceedings of the International Conference on Digital Image Processing, ICDIP 2018, Shanghai, China. Accepted.","DOI":"10.1117\/12.2503002"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Benenson, R., Omran, M., Hosang, J.H., and Schiele, B. (12, January 6\u20137). Ten Years of Pedestrian Detection, What Have We Learned?. Proceedings of the European Conference on Computer Vision\u2014ECCV 2014 Workshops, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-16181-5_47"},{"key":"ref_20","unstructured":"Zhu, Q., Yeh, M., Cheng, K., and Avidan, S. (2006, January 17\u201322). Fast Human Detection Using a Cascade of Histograms of Oriented Gradients. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, NY, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Maji, S., Berg, A.C., and Malik, J. (2008, January 24\u201326). Classification using intersection kernel support vector machines is efficient. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587630"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object Detection with Discriminatively Trained Part-Based Models","volume":"32","author":"Felzenszwalb","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Mar\u00edn, J., V\u00e1zquez, D., L\u00f3pez, A.M., Amores, J., and Leibe, B. (2013, January 1\u20138). Random Forests of Local Experts for Pedestrian Detection. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.322"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r, P., Tu, Z., Perona, P., and Belongie, S.J. (2009, January 7\u201310). Integral Channel Features. Proceedings of the British Machine Vision Conference, BMVC 2009, London, UK.","DOI":"10.5244\/C.23.91"},{"key":"ref_25","unstructured":"Nam, W., Doll\u00e1r, P., and Han, J.H. (2014, January 8\u201313). Local Decorrelation For Improved Pedestrian Detection. Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, S., Bauckhage, C., and Cremers, A.B. (2014, January 23\u201328). Informed Haar-Like Features Improve Pedestrian Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.126"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Cao, J., Pang, Y., and Li, X. (2016, January 27\u201330). Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.147"},{"key":"ref_28","first-page":"985","article-title":"Scale-aware Fast R-CNN for Pedestrian Detection","volume":"20","author":"Li","year":"2017","journal-title":"IEEE Trans. Multimed."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, Q., Wang, P., Shen, C., van den Hengel, A., and Porikli, F.M. (2017). Pushing the Limits of Deep CNNs for Pedestrian Detection. IEEE Trans. Circuits Syst. Video Technol.","DOI":"10.1109\/TCSVT.2017.2648850"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"3210","DOI":"10.1109\/TIP.2017.2694224","article-title":"Learning Multilayer Channel Features for Pedestrian Detection","volume":"26","author":"Cao","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1023\/B:VISI.0000013087.49260.fb","article-title":"Robust Real-Time Face Detection","volume":"57","author":"Viola","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r, P., Belongie, S.J., and Perona, P. (September, January 31). The Fastest Pedestrian Detector in the West. Proceedings of the British Machine Vision Conference, BMVC 2010, Aberystwyth, UK.","DOI":"10.5244\/C.24.68"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Benenson, R., Mathias, M., Timofte, R., and Gool, L.J.V. (2012, January 16\u201321). Pedestrian detection at 100 frames per second. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248017"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","unstructured":"Lazebnik, S., Schmid, C., and Ponce, J. (2006, January 17\u201322). Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, NY, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_37","unstructured":"Ren, S., He, K., Girshick, R.B., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3565","DOI":"10.1109\/TITS.2016.2561262","article-title":"Looking at Pedestrians at Different Scales: A Multiresolution Approach and Evaluations","volume":"17","author":"Rajaram","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yan, J., Zhang, X., Lei, Z., Liao, S., and Li, S.Z. (2013, January 23\u201328). Robust Multi-resolution Pedestrian Detection in Traffic Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.390"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Park, D., Ramanan, D., and Fowlkes, C.C. (2010, January 5\u201311). Multiresolution Models for Object Detection. Proceedings of the Computer Vision\u2014ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece.","DOI":"10.1007\/978-3-642-15561-1_18"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Park, D., Zitnick, C.L., Ramanan, D., and Doll\u00e1r, P. (2013, January 23\u201328). Exploring Weak Stabilization for Motion Feature Extraction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.371"},{"key":"ref_42","unstructured":"Irsoy, O., Yildiz, O.T., and Alpaydin, E. (2012, January 11\u201315). Soft decision trees. Proceedings of the 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1214\/aos\/1016218223","article-title":"Additive logistic regression: A statistical view of boosting","volume":"28","author":"Friedman","year":"2000","journal-title":"Ann. Stat."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r, P., Appel, R., and Kienzle, W. (2012, January 7\u201313). Crosstalk Cascades for Frame-Rate Pedestrian Detection. Proceedings of the Computer Vision\u2014ECCV 2012\u201412th European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33709-3_46"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Gualdi, G., Prati, A., and Cucchiara, R. (2010, January 5\u201311). Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos. Proceedings of the Computer Vision\u2014ECCV 2010\u201411th European Conference on Computer Vision, Heraklion, Crete, Greece.","DOI":"10.1007\/978-3-642-15567-3_15"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s11263-008-0137-5","article-title":"Putting Objects in Perspective","volume":"80","author":"Hoiem","year":"2008","journal-title":"Int. J. Comput. Vis."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.imavis.2017.02.007","article-title":"Robust pedestrian detection under deformation using simple boosted features","volume":"61","author":"Kim","year":"2017","journal-title":"Image Vis. Comput."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"14223","DOI":"10.1109\/ACCESS.2018.2803160","article-title":"Pedestrian Detection by Feature Selectedf Self-Similarity Features","volume":"6","author":"Fu","year":"2018","journal-title":"IEEE Access"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian Detection: An Evaluation of the State of the Art","volume":"34","author":"Wojek","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Paisitkriangkrai, S., Shen, C., and van den Hengel, A. (2014). Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features. European Conference on Computer Vision (ECCV), Springer.","DOI":"10.1007\/978-3-319-10593-2_36"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/4\/1063\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:59:20Z","timestamp":1760194760000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/4\/1063"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,2]]},"references-count":50,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,4]]}},"alternative-id":["s18041063"],"URL":"https:\/\/doi.org\/10.3390\/s18041063","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2018,4,2]]}}}