{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T00:51:20Z","timestamp":1783039880087,"version":"3.54.6"},"reference-count":36,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2017,10,28]],"date-time":"2017-10-28T00:00:00Z","timestamp":1509148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we introduce the so-called DEEP-SEE framework that jointly exploits computer vision algorithms and deep convolutional neural networks (CNNs) to detect, track and recognize in real time objects encountered during navigation in the outdoor environment. A first feature concerns an object detection technique designed to localize both static and dynamic objects without any a priori knowledge about their position, type or shape. The methodological core of the proposed approach relies on a novel object tracking method based on two convolutional neural networks trained offline. The key principle consists of alternating between tracking using motion information and predicting the object location in time based on visual similarity. The validation of the tracking technique is performed on standard benchmark VOT datasets, and shows that the proposed approach returns state-of-the-art results while minimizing the computational complexity. Then, the DEEP-SEE framework is integrated into a novel assistive device, designed to improve cognition of VI people and to increase their safety when navigating in crowded urban scenes. The validation of our assistive device is performed on a video dataset with 30 elements acquired with the help of VI users. The proposed system shows high accuracy (&gt;90%) and robustness (&gt;90%) scores regardless on the scene dynamics.<\/jats:p>","DOI":"10.3390\/s17112473","type":"journal-article","created":{"date-parts":[[2017,10,30]],"date-time":"2017-10-30T12:16:23Z","timestamp":1509365783000},"page":"2473","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":75,"title":["DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance"],"prefix":"10.3390","volume":"17","author":[{"given":"Ruxandra","family":"Tapu","sequence":"first","affiliation":[{"name":"Advanced Research and TEchniques for Multidimensional Imaging Systems Department, Institut Mines-T\u00e9l\u00e9com\/T\u00e9l\u00e9com SudParis, UMR CNRS MAP5 8145 and 5157 SAMOVAR, 9 rue Charles Fourier, 91000 \u00c9vry, France"},{"name":"Telecommunication Department, Faculty of ETTI, University \u201cPolitehnica\u201d of Bucharest, SplaiulIndependentei 313, 060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bogdan","family":"Mocanu","sequence":"additional","affiliation":[{"name":"Advanced Research and TEchniques for Multidimensional Imaging Systems Department, Institut Mines-T\u00e9l\u00e9com\/T\u00e9l\u00e9com SudParis, UMR CNRS MAP5 8145 and 5157 SAMOVAR, 9 rue Charles Fourier, 91000 \u00c9vry, France"},{"name":"Telecommunication Department, Faculty of ETTI, University \u201cPolitehnica\u201d of Bucharest, SplaiulIndependentei 313, 060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Titus","family":"Zaharia","sequence":"additional","affiliation":[{"name":"Advanced Research and TEchniques for Multidimensional Imaging Systems Department, Institut Mines-T\u00e9l\u00e9com\/T\u00e9l\u00e9com SudParis, UMR CNRS MAP5 8145 and 5157 SAMOVAR, 9 rue Charles Fourier, 91000 \u00c9vry, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2017,10,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1016\/j.patrec.2014.03.025","article-title":"Robust scale-adaptive mean-shift for tracking","volume":"49","author":"Vojir","year":"2014","journal-title":"Pattern Recognit. Lett."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Becker, S., Krah, S., Hubner, W., and Arens, M. (2016, January 26\u201329). Mad for visual tracker fusion. Proceedings of the Optics and Photonics for Counterterrorism, Crime Fighting, and Defence XIII, Edinburgh, UK.","DOI":"10.1117\/12.2243473"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, X., Valstar, M., Martinez, B., Khan, H., and Pridmore, T. (2015, January 11\u201318). Tric-track: Tracking by regression with incrementally learned cascades. Proceedings of the International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.493"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-speed tracking with kernelized correlation filters","volume":"37","author":"Henriques","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, L., Ouyang, W., Wang, X., and Lu, H. (2016, January 27\u201330). Stct: Sequentially training convolutional networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.153"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gundogdu, E., and Alatan, A. (2016, January 25\u201328). Spatial windowing for correlation filter based visual tracking. Proceedings of the International Conference on Image Processing, Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532645"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Xiao, J., Stolkin, R., and Leonardis, A. (2015, January 7\u201312). Single target tracking using adaptive clustered decision trees and dynamic multi-level appearance models. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299132"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Possegger, H., Mauthner, T., and Bischof, H. (2015, January 7\u201312). In defense of color-based model-free tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298823"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, K., Zhang, L., Liu, Q., Zhang, D., and Yang, M. (2014, January 6\u201312). Fast visual tracking via dense spatio-temporal context learning. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_9"},{"key":"ref_11","unstructured":"Nam, H., Baek, M., and Han, B. (arXiv, 2016). Modeling and propagating CNNs in a tree structure for visual tracking, arXiv."},{"key":"ref_12","unstructured":"Cehovin, L., Leonardis, A., and Kristan, M. (arXiv, 2015). Visual object tracking performance measures revisited, arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P.H.S. (2016, January 27\u201330). Staple: Complementary learners for real-time tracking. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.156"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F.K.S., and Felsberg, M. (2016, January 8\u201316). Beyond correlation filters: Learning continuous convolution operators for visual tracking. Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands.","DOI":"10.1007\/978-3-319-46454-1_29"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., and Savarese, S. (2016, January 8\u201316). Learning to track at 100 fps with deep regression net-works. Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tapu, R., Mocanu, B., Bursuc, A., and Zaharia, T. (2013, January 2\u20138). A Smartphone-Based Obstacle Detection and Classification System for Assisting Visually Impaired People. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.65"},{"key":"ref_17","first-page":"1442","article-title":"Visual Tracking: An Experimental Survey","volume":"36","author":"Smeulders","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","unstructured":"Jia, Y. (2017, October 25). Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding. Available online: http:\/\/caffe.berkeleyvision.org\/."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2015, January 7\u201312). Learning to compare image patches via convolutional neural networks. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299064"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1442","DOI":"10.1109\/TIP.2002.806251","article-title":"Adaptive rood pattern search for fast block-matching motion estimation","volume":"11","author":"Nie","year":"2002","journal-title":"IEEE Trans. Image Process."},{"key":"ref_22","unstructured":"(2017, October 25). A World Health Organization (WHO)\u2014Visual Impairment and Blindness. Available online: http:\/\/www.who.int\/mediacentre\/factsheets\/fs282\/en\/."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"17476","DOI":"10.3390\/s121217476","article-title":"Assisting the Visually Impaired: Obstacle Detection and Warning System by Acoustic Feedback","volume":"12","author":"Yebes","year":"2012","journal-title":"Sensors"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tapu, R., Mocanu, B., and Tapu, E. (2014, January 14\u201315). A survey on wearable devices used to assist the visual impaired user navigation in outdoor environments. Proceedings of the 11th International Symposium on Electronics and Telecommunications (ISETC), Timisoara, Romania.","DOI":"10.1109\/ISETC.2014.7010793"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Croce, D., Giarr\u00e9, L., Rosa, F.G.L., Montana, E., and Tinnirello, I. (2016, January 21\u201324). Enhancing tracking performance in a smartphone-based navigation system for visually impaired people. Proceedings of the 24th Mediterranean Conference on Control and Automation (MED), Athens, Greece.","DOI":"10.1109\/MED.2016.7535871"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Manduchi, R. (2012, January 11\u201313). Vision as assistive technology for the blind: An experimental study. Proceedings of the 13th International Conference on Computers Helping People with Special Needs, Linz, Austria.","DOI":"10.1007\/978-3-642-31534-3_2"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Everding, L., Walger, L., Ghaderi, V.S., and Conradt, J. (2016, January 14\u201316). A mobility device for the blind with improved vertical resolution using dynamic vision sensors. Proceedings of the IEEE 18th International Conference on E-Health Networking, Applications and Services (Healthcom), Munich, Germany.","DOI":"10.1109\/HealthCom.2016.7749459"},{"key":"ref_28","unstructured":"Cloix, S., Weiss, V., Bologna, G., Pun, T., and Hasler, D. (2014, January 5\u20138). Obstacle and planar object detection using sparse 3D information for a smart walker. Proceedings of the 2014 International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal."},{"key":"ref_29","first-page":"361","article-title":"The SmartVision navigation prototype for blind users","volume":"5","author":"Buf","year":"2011","journal-title":"Int. J. Digital Content Technol. Appl."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Mocanu, B., Tapu, R., and Zaharia, T. (2016). When Ultrasonic Sensors and Computer Vision Join Forces for Efficient Obstacle Detection and Recognition. Sensors, 16.","DOI":"10.3390\/s16111807"},{"key":"ref_31","unstructured":"Lucas, B., and Kanade, T. (1981, January 24\u201328). An iterative technique of image registration and its application to stereo. Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI\u201981), Vancouver, BC, Canada."},{"key":"ref_32","unstructured":"Lee, J.J., and Kim, G. (2007, January 26\u201329). Robust estimation of camera homography using fuzzy RANSAC. Proceedings of the International Conference on Computational Science and Its Applications, Kuala Lumpur, Malaysia."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Pradeep, V., Medioni, G., and Weiland, J. (2010, January 13\u201318). Robot vision for the visually impaired. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition\u2014Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543579"},{"key":"ref_34","first-page":"52","article-title":"A Kinect-Based Wearable Face Recognition System to Aid Visually Impaired Users","volume":"47","author":"Neto","year":"2017","journal-title":"IEEE Trans. Hum. Mach. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, B., M\u0169noz, J.P., Rong, X., Xiao, J., Tian, Y., and Arditi, A. (and, January 8\u201310). ISANA: Wearable Context-Aware Indoor Assistive Navigation with Obstacle Avoidance for the Blind. Proceedings of the Computer Vision\u2014European Conference on Computer Vision 2016 Workshops, Amsterdam, Netherlands.","DOI":"10.1007\/978-3-319-48881-3_31"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Elmannai, W., and Elleithy, K. (2017). Sensor-Based Assistive Devices for Visually-Impaired People: Current Status, Challenges, and Future Directions. Sensors, 17.","DOI":"10.3390\/s17030565"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/17\/11\/2473\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:48:47Z","timestamp":1760208527000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/17\/11\/2473"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,28]]},"references-count":36,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2017,11]]}},"alternative-id":["s17112473"],"URL":"https:\/\/doi.org\/10.3390\/s17112473","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,10,28]]}}}