{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T12:21:59Z","timestamp":1777897319355,"version":"3.51.4"},"reference-count":49,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2015,4,20]],"date-time":"2015-04-20T00:00:00Z","timestamp":1429488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Driver assistance systems and autonomous robotics rely on the deployment of several sensors for environment perception. Compared to LiDAR systems, the inexpensive vision sensors can capture the 3D scene as perceived by a driver in terms of appearance and depth cues. Indeed, providing 3D image understanding capabilities to vehicles is an essential target in order to infer scene semantics in urban environments. One of the challenges that arises from the navigation task in naturalistic urban scenarios is the detection of road participants (e.g., cyclists, pedestrians and vehicles). In this regard, this paper tackles the detection and orientation estimation of cars, pedestrians and cyclists, employing the challenging and naturalistic KITTI images. This work proposes 3D-aware features computed from stereo color images in order to capture the appearance and depth peculiarities of the objects in road scenes. The successful part-based object detector, known as DPM, is extended to learn richer models from the 2.5D data (color and disparity), while also carrying out a detailed analysis of the training pipeline. A large set of experiments evaluate the proposals, and the best performing approach is ranked on the KITTI website. Indeed, this is the first work that reports results with stereo data for the KITTI object challenge, achieving increased detection ratios for the classes car and cyclist compared to a baseline DPM.<\/jats:p>","DOI":"10.3390\/s150409228","type":"journal-article","created":{"date-parts":[[2015,4,21]],"date-time":"2015-04-21T06:26:47Z","timestamp":1429597607000},"page":"9228-9250","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Visual Object Recognition with 3D-Aware Features in KITTI Urban Scenes"],"prefix":"10.3390","volume":"15","author":[{"given":"J.","family":"Yebes","sequence":"first","affiliation":[{"name":"Department of Electronics, University of Alcal\u00e1, Alcal\u00e1 de Henares 28871, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luis","family":"Bergasa","sequence":"additional","affiliation":[{"name":"Department of Electronics, University of Alcal\u00e1, Alcal\u00e1 de Henares 28871, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miguel","family":"Garc\u00eda-Garrido","sequence":"additional","affiliation":[{"name":"Department of Electronics, University of Alcal\u00e1, Alcal\u00e1 de Henares 28871, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2015,4,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1773","DOI":"10.1109\/TITS.2013.2266661","article-title":"Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis","volume":"14","author":"Sivaraman","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_2","first-page":"228","article-title":"Text Detection and Recognition on Traffic Panels from Street-Level Imagery Using Visual Appearance","volume":"15","author":"Bergasa","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_3","unstructured":"Daza, I.G., Bronte, S., Bergasa, L.M., Almaz\u00e1n, J., and Yebes, J.J. (2012, January 3\u20137). Vision-based drowsiness detector for real driving conditions. Alcala de Henares, Spain."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Buehler, M., Iagnemma, K., and Singh, S. (2009). The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, Springer.","DOI":"10.1007\/978-3-642-03991-1"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1109\/TPAMI.2013.185","article-title":"3D Traffic Scene Understanding from Movable Platforms","volume":"36","author":"Geiger","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object Detection with Discriminatively Trained Part-Based Models","volume":"32","author":"Felzenszwalb","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_8","unstructured":"ICCV Workshop Reconstruction Meets Recognition Challenge. Available online: http:\/\/ttic.uchicago.edu\/rurtasun\/rmrc\/index.php."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_10","unstructured":"KITTI Object Detection and Orientation Estimation Benchmark. Available online: http:\/\/www.cvlibs.net\/datasets\/kitti\/eval_object.php."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pepik, B., Stark, M., Gehler, P., and Schiele, B. (2013, January 25\u201327). Occlusion Patterns for Object Class Detection. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.422"},{"key":"ref_12","unstructured":"Pepik, B., Gehler, P., Stark, M., and Schiele, B. (2012). Computer Vision\u2014ECCV 2012, Springer."},{"key":"ref_13","unstructured":"Park, D., Ramanan, D., and Fowlkes, C. (2010). Computer Vision\u2014ECCV 2010, Springer."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1109\/TPAMI.2012.174","article-title":"Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes","volume":"35","author":"Wojek","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Milford, M., and Wyeth, G. (2012, January 14\u201318). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. St. Paul, MN, USA.","DOI":"10.1109\/ICRA.2012.6224623"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhu, X., Vondrick, C., Ramanan, D., and Fowlkes, C.C. (2012, January 3\u20137). Do we need more training data or better models for object detection?. Surrey, UK.","DOI":"10.5244\/C.26.80"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Behley, J., Steinhage, V., and Cremers, A.B. (2013, January 3\u20138). Laser-based segment classification using a mixture of bag-of-words. Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696957"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Premebida, C., Carreira, J., Batista, J., and Nunes, U. (2014, January 14\u201318). Pedestrian Detection Combining RGB and Dense LIDAR Data. Chicago, IL, USA.","DOI":"10.1109\/IROS.2014.6943141"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ohn-Bar, E., and Trivedi, M.M. (2014, January 23\u201328). Fast and Robust Object Detection Using Visual Subcategories. Colombus, OH, USA.","DOI":"10.1109\/CVPRW.2014.32"},{"key":"ref_20","unstructured":"Long, C., Wang, X., Hua, G., Yang, M., and Lin, Y. (2014, January 1\u20135). Accurate Object Detection with Location Relaxation and Regionlets Relocalization. Singapore."},{"key":"ref_21","first-page":"1467","article-title":"Joint 3D Estimation of Objects and Scene Layout","volume":"24","author":"Geiger","year":"2011","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/T-C.1973.223602","article-title":"The Representation and Matching of Pictorial Structures","volume":"C-22","author":"Fischler","year":"1973","journal-title":"IEEE Trans. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yebes, J., Alcantarilla, P.F., and Bergasa, L.M. (2011, January 5\u20139). Occupant Monitoring System for Traffic Control Based on Visual Categorization. Baden-Baden, Germanay.","DOI":"10.1109\/IVS.2011.5940420"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Gulshan, V., Varma, M., and Zisserman, A. (2009, January 1\u20134). Multiple kernels for object detection. Kyoto, Japan.","DOI":"10.1109\/ICCV.2009.5459183"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Malisiewicz, T., Gupta, A., and Efros, A.A. (2011, January 6\u201313). Ensemble of exemplar-SVMs for object detection and beyond. Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126229"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian Detection: An Evaluation of the State of the Art","volume":"34","author":"Wojek","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yebes, J., Bergasa, L.M., Arroyo, R., and L\u00e1zaro, A. (2014, January 8\u201311). Supervised learning and evaluation of KITTI's cars detector with DPM. Dearborn, MI, USA.","DOI":"10.1109\/IVS.2014.6856452"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"L\u00f3pez-Sastre, R.J., Tuytelaars, T., and Savarese, S. (2011, January 6\u201313). Deformable part models revisited: A performance evaluation for object category pose estimation. Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130367"},{"key":"ref_29","unstructured":"Hejrati, M., and Ramanan, D. (2012). Advances in Neural Information Processing Systems 25 (NIPS 2012), Neural Information Processing Systems Foundation, Inc."},{"key":"ref_30","first-page":"620","article-title":"3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model","volume":"25","author":"Fidler","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, T., He, X., and Barnes, N. (2013, January 25\u201327). Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.234"},{"key":"ref_32","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of Oriented Gradients for Human Detection. San Diego, CA, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kl\u00e4ser, A., Marszalek, M., and Schmid, C. (2008, January 1\u20134). A Spatio-Temporal Descriptor Based on 3D-Gradients. Leeds, UK.","DOI":"10.5244\/C.22.99"},{"key":"ref_34","unstructured":"Walk, S., Schindler, K., and Schiele, B. Computer Vision\u2014ECCV 2010, Springer."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Rohrbach, M., Enzweiler, M., and Gavrila, D.M. (2009, January 9\u201311). High-level fusion of depth and intensity for pedestrian classification. Jena, Germany.","DOI":"10.1007\/978-3-642-03798-6_11"},{"key":"ref_36","unstructured":"INRIA Visual Recognition and Machine Learning Summer School. Available online: http:\/\/www.di.ens.fr\/willow\/events\/cvml2012\/."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ren, X., and Ramanan, D. (2013, January 25\u201327). Histograms of Sparse Codes for Object Detection. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.417"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Benenson, R., Mathias, M., Tuytelaars, T., and Gool, L.J.V. (2013, January 25\u201327). Seeking the Strongest Rigid Detector. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.470"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","article-title":"Stereo Processing by Semiglobal Matching and Mutual Information","volume":"30","author":"Hirschmuller","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pepik, B., Stark, M., Gehler, P., and Schiele, B. (2012, January 16\u201321). Teaching 3D Geometry to Deformable Part Models. Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248075"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1896","DOI":"10.1109\/TITS.2013.2271113","article-title":"Probabilistic Integration of Intensity and Depth Information for Part-Based Vehicle Detection","volume":"14","author":"Makris","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision Meets Robotics: The KITTI Dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dubout, C., and Fleuret, F. (2013, January 25\u201327). Accelerated Training of Linear Object Detectors. Portland, OR, USA.","DOI":"10.1109\/CVPRW.2013.156"},{"key":"ref_44","unstructured":"Kokkinos, I. (2012). Computer Vision\u2014ECCV 2012, Springer."},{"key":"ref_45","unstructured":"Ng, A.Y. (1997). Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1109\/TITS.2011.2143410","article-title":"The Benefits of Dense Stereo for Pedestrian Detection","volume":"12","author":"Keller","year":"2011","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Benenson, R., Mathias, M., Timofte, R., and Gool, L.J.V. (2012, January 16\u201321). Pedestrian detection at 100 frames per second. Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248017"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Sermanet, P., Kavukcuoglu, K., Chintala, S., and LeCun, Y. (2013, January 25\u201327). Pedestrian Detection with Unsupervised Multi-Stage Feature Learning. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.465"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Maji, S., and Shakhnarovich, G. (2013, January 25\u201327). Part Discovery from Partial Correspondence. Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.125"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/4\/9228\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T20:44:58Z","timestamp":1760215498000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/15\/4\/9228"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,4,20]]},"references-count":49,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2015,4]]}},"alternative-id":["s150409228"],"URL":"https:\/\/doi.org\/10.3390\/s150409228","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,4,20]]}}}