{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T18:42:03Z","timestamp":1755801723027,"version":"3.44.0"},"reference-count":87,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2018,9,27]],"date-time":"2018-09-27T00:00:00Z","timestamp":1538006400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2018,9,27]],"date-time":"2018-09-27T00:00:00Z","timestamp":1538006400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Grant-in-Aid for JSPS Fellows","award":["JP16J04552"],"award-info":[{"award-number":["JP16J04552"]}]},{"name":"JSPS KAKENHI","award":["JP16K16083"],"award-info":[{"award-number":["JP16K16083"]}]},{"name":"Ministry of the Environment"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["IPSJ T Comput Vis Appl"],"published-print":{"date-parts":[[2018,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motion information can be important for detecting objects, but it has been used less for pedestrian detection, particularly with deep-learning-based methods. We propose a method that uses deep motion features as well as deep still-image features, following the success of two-stream convolutional networks, each of which are trained separately for spatial and temporal streams. To extract motion clues for detection differentiated from other background motions, the temporal stream takes as input the difference in frames that are weakly stabilized by optical flow. To make the networks applicable to bounding-box-level detection, the mid-level features are concatenated and combined with a sliding-window detector. We also introduce transfer learning from multiple sources in the two-stream networks, which can transfer still image and motion features from ImageNet and an action recognition dataset respectively, to overcome the insufficiency of training data for convolutional neural networks in pedestrian datasets. We conducted an evaluation on two popular large-scale pedestrian benchmarks, namely the Caltech Pedestrian Detection Benchmark and Daimler Mono Pedestrian Detection Benchmark. We observed 10% improvement compared to the same method but without motion features.<\/jats:p>","DOI":"10.1186\/s41074-018-0048-5","type":"journal-article","created":{"date-parts":[[2018,9,27]],"date-time":"2018-09-27T07:57:26Z","timestamp":1538035046000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Pedestrian detection with motion features via two-stream ConvNets"],"prefix":"10.1186","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1194-9663","authenticated-orcid":false,"given":"Ryota","family":"Yoshihashi","sequence":"first","affiliation":[]},{"given":"Tu Tuan","family":"Trinh","sequence":"additional","affiliation":[]},{"given":"Rei","family":"Kawakami","sequence":"additional","affiliation":[]},{"given":"Shaodi","family":"You","sequence":"additional","affiliation":[]},{"given":"Makoto","family":"Iida","sequence":"additional","affiliation":[]},{"given":"Takeshi","family":"Naemura","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2018,9,27]]},"reference":[{"issue":"8","key":"48_CR1","doi-asserted-by":"publisher","first-page":"1847","DOI":"10.1109\/TPAMI.2012.272","volume":"35","author":"N Kr\u00fcger","year":"2013","unstructured":"Kr\u00fcger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodr\u00edguez-S\u00e1nchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision?PAMI 35(8):1847\u20131871.","journal-title":"PAMI"},{"key":"48_CR2","doi-asserted-by":"crossref","unstructured":"Viola P, Jones MJ, Snow D (2003) Detecting pedestrians using patterns of motion and appearance In: CVPR.","DOI":"10.1109\/ICCV.2003.1238422"},{"key":"48_CR3","doi-asserted-by":"crossref","unstructured":"Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance In: ECCV, 428\u2013441.","DOI":"10.1007\/11744047_33"},{"key":"48_CR4","doi-asserted-by":"crossref","unstructured":"Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking In: CVPR, 1\u20138.","DOI":"10.1109\/CVPR.2008.4587583"},{"key":"48_CR5","doi-asserted-by":"crossref","unstructured":"Jones M, Snow D (2008) Pedestrian detection using boosted features over many frames In: ICPR, 1\u20134.","DOI":"10.1109\/ICPR.2008.4761703"},{"key":"48_CR6","doi-asserted-by":"crossref","unstructured":"Walk S, Majer N, Schindler K, Schiele B (2010) New features and insights for pedestrian detection In: CVPR, 1030\u20131037.","DOI":"10.1109\/CVPR.2010.5540102"},{"key":"48_CR7","doi-asserted-by":"crossref","unstructured":"Park D, Zitnick C, Ramanan D, Doll\u00e1r P (2013) Exploring weak stabilization for motion feature extraction In: CVPR.","DOI":"10.1109\/CVPR.2013.371"},{"issue":"4","key":"48_CR8","doi-asserted-by":"publisher","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","volume":"34","author":"P Doll\u00e1r","year":"2012","unstructured":"Doll\u00e1r P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. PAMI 34(4):743\u2013761.","journal-title":"PAMI"},{"key":"48_CR9","doi-asserted-by":"crossref","unstructured":"Jhuang H, Serre T, Wolf L, Poggio T (2007) A biologically inspired system for action recognition In: ICCV.","DOI":"10.1109\/ICCV.2007.4408988"},{"key":"48_CR10","doi-asserted-by":"crossref","unstructured":"Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features In: ECCV.","DOI":"10.1007\/978-3-642-15567-3_11"},{"key":"48_CR11","doi-asserted-by":"crossref","unstructured":"Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis In: CVPR.","DOI":"10.1109\/CVPR.2011.5995496"},{"issue":"1","key":"48_CR12","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","volume":"35","author":"S Ji","year":"2013","unstructured":"Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. PAMI 35(1):221\u2013231.","journal-title":"PAMI"},{"key":"48_CR13","doi-asserted-by":"crossref","unstructured":"Hasan M, Roy-Chowdhury AK (2014) Continuous learning of human activity models using deep nets In: ECCV, 705\u2013720.","DOI":"10.1007\/978-3-319-10578-9_46"},{"key":"48_CR14","doi-asserted-by":"crossref","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks In: CVPR.","DOI":"10.1109\/CVPR.2014.223"},{"key":"48_CR15","unstructured":"Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos In: NIPS, 568\u2013576."},{"key":"48_CR16","doi-asserted-by":"crossref","unstructured":"Yang B, Yan J, Lei Z, Li S (2015) Convolutional channel features for pedestrian, face and edge detection In: ICCV.","DOI":"10.1109\/ICCV.2015.18"},{"key":"48_CR17","doi-asserted-by":"crossref","unstructured":"Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks In: CVPR.","DOI":"10.1109\/CVPR.2015.7299143"},{"key":"48_CR18","doi-asserted-by":"crossref","unstructured":"Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection In: ICCV.","DOI":"10.1109\/ICCV.2015.221"},{"issue":"8","key":"48_CR19","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.1109\/TPAMI.2014.2300479","volume":"36","author":"P Doll\u00e1r","year":"2014","unstructured":"Doll\u00e1r P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. PAMI 36(8):1532\u20131545.","journal-title":"PAMI"},{"key":"48_CR20","doi-asserted-by":"crossref","unstructured":"Zhang S, Benenson R, Schiele B (2015) Filtered feature channels for pedestrian detection In: CVPR.","DOI":"10.1109\/CVPR.2015.7298784"},{"key":"48_CR21","unstructured":"Soomro K, Zamir A, Shah M (2012) Ucf101: A dataset of 101 human action classes from videos in the wild. CRCV-TR-12-01."},{"key":"48_CR22","doi-asserted-by":"crossref","unstructured":"Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features In: CVPR, 511\u2013518.","DOI":"10.1109\/CVPR.2001.990517"},{"key":"48_CR23","doi-asserted-by":"crossref","unstructured":"Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection In: CVPR, 886\u2013893.","DOI":"10.1109\/CVPR.2005.177"},{"key":"48_CR24","doi-asserted-by":"crossref","unstructured":"Lowe DG (1999) Object recognition from local scale-invariant features In: ICCV, 1150\u20131157.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"48_CR25","doi-asserted-by":"crossref","unstructured":"Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors In: ICCV, 90\u201397.","DOI":"10.1109\/ICCV.2005.74"},{"issue":"10","key":"48_CR26","doi-asserted-by":"publisher","first-page":"1713","DOI":"10.1109\/TPAMI.2008.75","volume":"30","author":"O Tuzel","year":"2008","unstructured":"Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on Riemannian manifolds. PAMI 30(10):1713\u20131727.","journal-title":"PAMI"},{"key":"48_CR27","doi-asserted-by":"crossref","unstructured":"Ren H, Li Z-N (15) Object detection using generalization and efficiency balanced co-occurrence features In: ICCV, 46\u201354.","DOI":"10.1109\/ICCV.2015.14"},{"key":"48_CR28","doi-asserted-by":"crossref","unstructured":"Ojala T, Pietik\u00e4inen M, M\u00e4enp\u00e4\u00e4 T (2000) Gray scale and rotation invariant texture classification with local binary patterns In: ECCV, 404\u2013420.","DOI":"10.1007\/3-540-45054-8_27"},{"key":"48_CR29","doi-asserted-by":"crossref","unstructured":"Wang X, Han T, Yan S (2009) An HOP-LBP human detector with partial occlusion handling In: ICCV, 32\u201339.","DOI":"10.1109\/ICCV.2009.5459207"},{"key":"48_CR30","doi-asserted-by":"crossref","unstructured":"Ohn-Bar E, Trivedi MM (2016) To boost or not to boost? On the limits of boosted trees for object detection In: ICPR, 3350\u20133355.. IEEE.","DOI":"10.1109\/ICPR.2016.7900151"},{"key":"48_CR31","unstructured":"Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks In: NIPS."},{"key":"48_CR32","unstructured":"Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition In: ICLR."},{"key":"48_CR33","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions In: CVPR.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"48_CR34","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation In: CVPR, 580\u2013587.","DOI":"10.1109\/CVPR.2014.81"},{"key":"48_CR35","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation In: CVPR.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"48_CR36","doi-asserted-by":"crossref","unstructured":"Szarvas M, Yoshizawa A, Yamamoto M, Ogata J (2005) Pedestrian detection with convolutional neural networks In: Intelligent Vehicles Symposium, 224\u2013229.","DOI":"10.1109\/IVS.2005.1505106"},{"key":"48_CR37","doi-asserted-by":"crossref","unstructured":"Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning In: CVPR, 3626\u20133633.","DOI":"10.1109\/CVPR.2013.465"},{"key":"48_CR38","doi-asserted-by":"crossref","unstructured":"Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection In: CVPR, 2056\u20132063.","DOI":"10.1109\/ICCV.2013.257"},{"key":"48_CR39","doi-asserted-by":"crossref","unstructured":"Luo P, Tian Y, Wang X, Tang X (2014) Switchable deep network for pedestrian detection In: CVPR, 899\u2013906.","DOI":"10.1109\/CVPR.2014.120"},{"key":"48_CR40","doi-asserted-by":"crossref","unstructured":"Fukui H, Yamashita T, Yamauchi Y, Fujiyoshi H, Murase H (2015) Pedestrian detection based on deep convolutional neural network with ensemble inference network In: IEEE Intelligent Vehicle Symposium.","DOI":"10.1109\/IVS.2015.7225690"},{"key":"48_CR41","doi-asserted-by":"crossref","unstructured":"Yamashita T, Fukui H, Yamauchi Y, Fujiyoshi H (2016) Pedestrian and part position detection using a regression-based multiple task deep convolutional neural network In: International Conference on Pattern Recognition.","DOI":"10.1109\/ICPR.2016.7900176"},{"key":"48_CR42","doi-asserted-by":"crossref","unstructured":"Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians In: CVPR.","DOI":"10.1109\/CVPR.2015.7299034"},{"issue":"3","key":"48_CR43","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","volume":"20","author":"C Corinna","year":"1995","unstructured":"Corinna C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297.","journal-title":"Mach Learn"},{"key":"48_CR44","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/3-540-59119-2_166","volume":"904","author":"Y Freund","year":"1995","unstructured":"Freund Y, Schapire R (1995) A decision-theoretic generalization of on-line learning and an application to boosting. Comput Learn Theory 904:23\u201337.","journal-title":"Comput Learn Theory"},{"issue":"1","key":"48_CR45","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332.","journal-title":"Mach Learn"},{"key":"48_CR46","doi-asserted-by":"crossref","unstructured":"Narasimhan H, Agarwal S (2013) SVM pAUC tight: a new support vector method for optimizing partial AUC based on a tight convex upper bound In: SIGKDD, 167\u2013175.. ACM.","DOI":"10.1145\/2487575.2487674"},{"key":"48_CR47","unstructured":"Narasimhan H, Agarwal S (2013) A structural {SVM} based approach for optimizing partial AUC In: ICML, 516\u2013524."},{"key":"48_CR48","unstructured":"Paisitkriangkrai S, Shen C, van den Hengel A (2014) Pedestrian detection with spatially pooled features and structured ensemble learning. arXiv preprint arXiv:1409.5209."},{"key":"48_CR49","doi-asserted-by":"crossref","unstructured":"Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model In: CVPR, 1\u20138.","DOI":"10.1109\/CVPR.2008.4587597"},{"issue":"1","key":"48_CR50","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","volume":"111","author":"M Everingham","year":"2015","unstructured":"Everingham M, Eslami SMA, Van Gool L, Williams C, Winn J, Zisserman A (2015) The PASCAL Visual Object Classes challenge: a retrospective. IJCV 111(1):98\u2013136.","journal-title":"IJCV"},{"key":"48_CR51","doi-asserted-by":"crossref","unstructured":"Girshick R, Iandola F, Darrell T, Malik J (2015) Deformable part models are convolutional neural networks In: CVPR.","DOI":"10.1109\/CVPR.2015.7298641"},{"issue":"10","key":"48_CR52","doi-asserted-by":"publisher","first-page":"2005","DOI":"10.1109\/TPAMI.2011.281","volume":"34","author":"MJ Saberian","year":"2012","unstructured":"Saberian MJ, Vasconcelos N (2012) Learning optimal embedded cascades. PAMI 34(10):2005\u20132012.","journal-title":"PAMI"},{"key":"48_CR53","doi-asserted-by":"crossref","unstructured":"Cai Z, Saberian M, N V (2015) Learning complexity-aware cascades for deep pedestrian detection In: ICCV.","DOI":"10.1109\/ICCV.2015.384"},{"key":"48_CR54","doi-asserted-by":"crossref","unstructured":"Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: CVPR.","DOI":"10.1109\/CVPR.2017.639"},{"key":"48_CR55","unstructured":"Massimo P. (2004) Background subtraction techniques: a review In: Systems, man and cybernetics, 2004 IEEE international conference on, 3099\u20133104.. IEEE."},{"key":"48_CR56","doi-asserted-by":"crossref","unstructured":"Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors In: CVPR.","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"48_CR57","doi-asserted-by":"crossref","unstructured":"Gkioxari G, Malik J (2015) Finding action tubes In: CVPR.","DOI":"10.1109\/CVPR.2015.7298676"},{"key":"48_CR58","doi-asserted-by":"crossref","unstructured":"Shao J, Kang K, Loy CC, Wang X (2015) Deeply learned attributes for crowded scene understanding In: CVPR.","DOI":"10.1109\/CVPR.2015.7299097"},{"key":"48_CR59","doi-asserted-by":"crossref","unstructured":"Cheron G, Laptev I, Schmid C (2015) P-CNN: pose-based CNN features for action recognition In: ICCV.","DOI":"10.1109\/ICCV.2015.368"},{"issue":"3","key":"48_CR60","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211\u2013252.","journal-title":"Int J Comput Vis"},{"key":"48_CR61","doi-asserted-by":"crossref","unstructured":"Zhu X, Xiong Y, Dai J, Yuan L, Wei Y (2017) Deep feature flow for video recognition In: CVPR.","DOI":"10.1109\/CVPR.2017.441"},{"key":"48_CR62","doi-asserted-by":"crossref","unstructured":"Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection In: ICCV.","DOI":"10.1109\/ICCV.2017.52"},{"key":"48_CR63","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect In: ICCV.","DOI":"10.1109\/ICCV.2017.330"},{"key":"48_CR64","unstructured":"Trinh TT, Yoshihashi R, Kawakami R, Iida M, Naemura T (2016) Bird detection near wind turbines from high-resolution video using LSTM networks In: World Wind Energy Conference (WWEC)."},{"key":"48_CR65","unstructured":"Yoshihashi R, Trinh TT, Kawakami R, You S, Iida M, Naemura T (2017) Differentiating objects by motion: joint detection and tracking of small flying objects. arXiv preprint arXiv:1709.04666."},{"key":"48_CR66","unstructured":"Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks In: ICML, 1310\u20131318."},{"issue":"1","key":"48_CR67","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1016\/0166-2236(92)90344-8","volume":"15","author":"MA Goodale","year":"1992","unstructured":"Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci. 15(1):20\u201325.","journal-title":"Trends Neurosci."},{"key":"48_CR68","doi-asserted-by":"crossref","unstructured":"Gladh S, Danelljan M, Khan FS, Felsberg M (2016) Deep motion features for visual tracking In: ICPR, 1243\u20131248.. IEEE.","DOI":"10.1109\/ICPR.2016.7899807"},{"issue":"2","key":"48_CR69","doi-asserted-by":"publisher","first-page":"201","DOI":"10.3758\/BF03212378","volume":"14","author":"G. Johansson","year":"1973","unstructured":"Johansson G. (1973) Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2):201\u2013211.","journal-title":"Percept. Psychophys."},{"key":"48_CR70","doi-asserted-by":"crossref","unstructured":"Bottou L (2012) Stochastic gradient descent tricks In: Neural networks: Tricks of the trade, 421\u2013436.. Springer.","DOI":"10.1007\/978-3-642-35289-8_25"},{"key":"48_CR71","unstructured":"Appel R, Fuchs T, Doll\u00e1r P, Perona P (2013) Quickly boosting decision trees-pruning underachieving features early In: ICML, 594\u2013602."},{"key":"48_CR72","unstructured":"Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: a deep convolutional activation feature for generic visual recognition In: ICML."},{"key":"48_CR73","unstructured":"Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision In: IJCAI, 674\u2013679."},{"key":"48_CR74","doi-asserted-by":"crossref","unstructured":"Liu C, Yuen J, Torralba A, Sivic J, Freeman WT (2008) Sift flow: dense correspondence across different scenes In: ECCV, 28\u201342.","DOI":"10.1007\/978-3-540-88690-7_3"},{"key":"48_CR75","doi-asserted-by":"crossref","unstructured":"Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: large displacement optical flow with deep matching In: ICCV, 1385\u20131392.","DOI":"10.1109\/ICCV.2013.175"},{"key":"48_CR76","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition In: ECCV, 346\u2013361.","DOI":"10.1007\/978-3-319-10578-9_23"},{"key":"48_CR77","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks In: Advances in Neural Information Processing Systems, 91\u201399."},{"key":"48_CR78","doi-asserted-by":"crossref","unstructured":"Dubout C, Fleuret F (2012) Exact acceleration of linear object detectors In: ECCV, 301\u2013311.","DOI":"10.1007\/978-3-642-33712-3_22"},{"key":"48_CR79","doi-asserted-by":"crossref","unstructured":"Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding In: ACMMM, 675\u2013678.. ACM.","DOI":"10.1145\/2647868.2654889"},{"key":"48_CR80","doi-asserted-by":"crossref","unstructured":"Doll\u00e1r P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark In: CVPR, 304\u2013311.","DOI":"10.1109\/CVPR.2009.5206631"},{"issue":"12","key":"48_CR81","doi-asserted-by":"publisher","first-page":"2179","DOI":"10.1109\/TPAMI.2008.260","volume":"31","author":"M Enzweiler","year":"2009","unstructured":"Enzweiler M, Gavrila DM (2009) Monocular pedestrian detection: survey and experiments. PAMI 31(12):2179\u20132195.","journal-title":"PAMI"},{"key":"48_CR82","unstructured":"Nam W, Doll\u00e1r P, Han JH (2014) Local decorrelation for improved pedestrian detection In: NIPS, 424\u2013432."},{"key":"48_CR83","doi-asserted-by":"crossref","unstructured":"Marin J, V\u00e1zquez D, L\u00f3pez AM, Amores J, Leibe B (2013) Random forests of local experts for pedestrian detection In: Proceedings of the IEEE International Conference on Computer Vision, 2592\u20132599.","DOI":"10.1109\/ICCV.2013.322"},{"key":"48_CR84","doi-asserted-by":"crossref","unstructured":"Nam W, Han B, Han JH (2011) Improving object localization using macrofeature layout selection In: ICCVW, 1801\u20131808.. IEEE.","DOI":"10.1109\/ICCVW.2011.6130467"},{"key":"48_CR85","unstructured":"Li J, Liang X, Shen S, Xu T, Yan S (2015) Scale-aware fast R-CNN for pedestrian detection. arXiv preprint arXiv:1510.08160."},{"key":"48_CR86","doi-asserted-by":"crossref","unstructured":"Zhang L, Lin L, Liang X, He K (2016) Is faster R-CNN doing well for pedestrian detection? In: ECCV, 443\u2013457.. Springer.","DOI":"10.1007\/978-3-319-46475-6_28"},{"key":"48_CR87","doi-asserted-by":"crossref","unstructured":"Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection In: ECCV, 354\u2013370.. Springer.","DOI":"10.1007\/978-3-319-46493-0_22"}],"container-title":["IPSJ Transactions on Computer Vision and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s41074-018-0048-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s41074-018-0048-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s41074-018-0048-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T18:20:41Z","timestamp":1755714041000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/s41074-018-0048-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,27]]},"references-count":87,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,12]]}},"alternative-id":["48"],"URL":"https:\/\/doi.org\/10.1186\/s41074-018-0048-5","relation":{},"ISSN":["1882-6695"],"issn-type":[{"type":"electronic","value":"1882-6695"}],"subject":[],"published":{"date-parts":[[2018,9,27]]},"assertion":[{"value":"12 February 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 August 2018","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2018","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Publisher\u2019s Note"}}],"article-number":"12"}}