{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:50:16Z","timestamp":1778082616472,"version":"3.51.4"},"reference-count":28,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,2,11]],"date-time":"2019-02-11T00:00:00Z","timestamp":1549843200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>State-of-the-art human detection methods focus on deep network architectures to achieve higher recognition performance, at the expense of huge computation. However, computational efficiency and real-time performance are also important evaluation indicators. This paper presents a fast real-time human detection and flow estimation method using depth images captured by a top-view TOF camera. The proposed algorithm mainly consists of head detection based on local pooling and searching, classification refinement based on human morphological features, and tracking assignment filter based on dynamic multi-dimensional feature. A depth image dataset record with more than 10k entries and departure events with detailed human location annotations is established. Taking full advantage of the distance information implied in the depth image, we achieve high-accuracy human detection and people counting with accuracy of 97.73% and significantly reduce the running time. Experiments demonstrate that our algorithm can run at 23.10 ms per frame on a CPU platform. In addition, the proposed robust approach is effective in complex situations such as fast walking, occlusion, crowded scenes, etc.<\/jats:p>","DOI":"10.3390\/s19030729","type":"journal-article","created":{"date-parts":[[2019,2,12]],"date-time":"2019-02-12T03:18:20Z","timestamp":1549941500000},"page":"729","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["A High-Computational Efficiency Human Detection and Flow Estimation Method Based on TOF Measurements"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0675-7402","authenticated-orcid":false,"given":"Weihang","family":"Wang","sequence":"first","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peilin","family":"Liu","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rendong","family":"Ying","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiuchao","family":"Qian","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jialu","family":"Jia","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiefeng","family":"Gao","sequence":"additional","affiliation":[{"name":"Brain-inspired Application Technology Center, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,11]]},"reference":[{"key":"ref_1","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_2","unstructured":"Zhu, Q., Yeh, M.-C., Cheng, K.-T., and Avidan, S. (2006, January 17\u201322). Fast human detection using a cascade of histograms of oriented gradients. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Beleznai, C., and Bischof, H. (2009, January 20\u201325). Fast human detection in crowded scenes by contour integration and local shape estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206564"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"D\u2019Orazio, T., Leo, M., Spagnolo, P., Mazzeo, P.L., Mosca, N., and Nitti, M. (2007, January 6\u20138). A visual tracking algorithm for real time people detection. Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS\u201907), Santorini, Greece.","DOI":"10.1109\/WIAMIS.2007.14"},{"key":"ref_5","unstructured":"Demirkus, M., Wang, L., Eschey, M., Kaestle, H., and Galasso, F. (March, January 27). People Detection in Fish-eye Top-views. Proceedings of the VISIGRAPP (5: VISAPP), Porto, Portugal."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vision"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 13\u201316). Fast r-CNN. Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1007\/s11263-012-0605-9","article-title":"A machine-learning approach to keypoint detection and landmarking on 3D meshes","volume":"102","author":"Creusot","year":"2013","journal-title":"Int. J. Comput. Vision"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2270","DOI":"10.1109\/TPAMI.2014.2316828","article-title":"3D object recognition in cluttered scenes with local surface features: A survey","volume":"36","author":"Guo","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lai, K., Bo, L., Ren, X., and Fox, D. (2011, January 9\u201313). A large-scale hierarchical multi-view rgb-d object dataset. Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China.","DOI":"10.1109\/ICRA.2011.5980382"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wu, C.-J., Houben, S., and Marquardt, N. (2017, January 6\u201311). Eaglesense: Tracking people and devices in interactive spaces using real-time top-view depth-sensing. Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, Denver, CO, USA.","DOI":"10.1145\/3025453.3025562"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhang, X., Yan, J., Feng, S., Lei, Z., Yi, D., and Li, S.Z. (2012, January 18\u201321). Water filling: Unsupervised people counting via vertical kinect sensor. Proceedings of the IEEE International Conference on Advanced Video & Signal-based Surveillance, Beijing, China.","DOI":"10.1109\/AVSS.2012.82"},{"key":"ref_15","unstructured":"Fu, H., Ma, H., and Xiao, H. (October, January 30). Real-time accurate crowd counting based on RGB-D information. Proceedings of the IEEE International Conference on Image Processing, Orlando, FL, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Hsieh, C.-T., Wang, H.-C., Wu, Y.-K., Chang, L.-C., and Kuo, T.-K. (2012, January 4\u20137). A kinect-based people-flow counting system. Proceedings of the International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), Tamsui, New Taipei City, Taiwan.","DOI":"10.1109\/ISPACS.2012.6473470"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tseng, T.E., Liu, A.S., Hsiao, P.H., and Huang, C.M. (2014, January 14\u201318). Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots & Systems, Chicago, IL, USA.","DOI":"10.1109\/IROS.2014.6943136"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yu, S., Wu, S., and Liang, W. (2012, January 18\u201321). SLTP: A Fast Descriptor for People Detection in Depth Images. Proceedings of the IEEE International Conference on Advanced Video & Signal-based Surveillance, Beijing, China.","DOI":"10.1109\/AVSS.2012.67"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Rauter, M. (2013, January 23\u201328). Reliable Human Detection and Tracking in Top-View Depth Images. Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition Workshops, Portland, OR, USA.","DOI":"10.1109\/CVPRW.2013.84"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1016\/j.eswa.2016.11.019","article-title":"Robust people detection using depth information from an overhead Time-of-Flight camera","volume":"71","author":"Luna","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_21","unstructured":"Pizzo, L.D., Foggia, P., Greco, A., Percannella, G., and Vento, M. (July, January 29). A versatile and effective method for counting people on either RGB or depth overhead cameras. Proceedings of the IEEE International Conference on Multimedia & Expo Workshops, Torino, Italy."},{"key":"ref_22","first-page":"1","article-title":"Applications for a people detection and tracking algorithm using a time-of-flight camera","volume":"75","author":"Stahlschmidt","year":"2014","journal-title":"Multimedia Tools Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"9315","DOI":"10.1007\/s11042-016-3344-z","article-title":"People-flow counting in complex environments by combining depth and color information","volume":"75","author":"Gao","year":"2016","journal-title":"Multimedia Tools Appl."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wu, K., Otoo, E., and Shoshani, A. (2005, January 12\u201317). Optimizing Connected Component Labeling Algorithms. Proceedings of the Medical Imaging 2005: Image Processing, San Diego, CA, USA.","DOI":"10.1117\/12.596105"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liciotti, D., Paolanti, M., Pietrini, R., Frontoni, E., and Zingaretti, P. (2018, January 20\u201324). Convolutional Networks for semantic Heads Segmentation using Top-View Depth Data in Crowded Environment. Proceedings of the International Conference on Pattern Recognition, Beijing, China.","DOI":"10.1109\/ICPR.2018.8545397"},{"key":"ref_26","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Pham, T.Q. (2010, January 13\u201316). Non-maximum Suppression Using Fewer than Two Comparisons per Pixel. Proceedings of the Advanced Concepts for Intelligent Vision Systems, Sydney, Australia.","DOI":"10.1007\/978-3-642-17688-3_41"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Bondi, E., Seidenari, L., Bagdanov, A.D., and Bimbo, A.D. (2014, January 26\u201329). Real-time people counting from depth imagery of crowded environments. Proceedings of the International Conference on Advanced Video & Signal Based Surveillance, Seoul, Korea.","DOI":"10.1109\/AVSS.2014.6918691"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/729\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:31:08Z","timestamp":1760185868000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/729"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,11]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["s19030729"],"URL":"https:\/\/doi.org\/10.3390\/s19030729","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,11]]}}}