{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T22:21:49Z","timestamp":1774736509991,"version":"3.50.1"},"reference-count":101,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T00:00:00Z","timestamp":1599782400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Recently, our world witnessed major events that attracted a lot of attention towards the importance of automatic crowd scene analysis. For example, the COVID-19 breakout and public events require an automatic system to manage, count, secure, and track a crowd that shares the same area. However, analyzing crowd scenes is very challenging due to heavy occlusion, complex behaviors, and posture changes. This paper surveys deep learning-based methods for analyzing crowded scenes. The reviewed methods are categorized as (1) crowd counting and (2) crowd actions recognition. Moreover, crowd scene datasets are surveyed. In additional to the above surveys, this paper proposes an evaluation metric for crowd scene analysis methods. This metric estimates the difference between calculated crowed count and actual count in crowd scene videos.<\/jats:p>","DOI":"10.3390\/jimaging6090095","type":"journal-article","created":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T09:05:16Z","timestamp":1599815116000},"page":"95","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Deep Learning-Based Crowd Scene Analysis Survey"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7614-7795","authenticated-orcid":false,"given":"Sherif","family":"Elbishlawi","sequence":"first","affiliation":[{"name":"The University of British Columbia, 3333 University Way, Kelowna, BC V1V 1V7, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4753-8380","authenticated-orcid":false,"given":"Mohamed H.","family":"Abdelpakey","sequence":"additional","affiliation":[{"name":"Memorial University of Newfoundland, St. John\u2019s, NL A1C 5S7, Canada"}]},{"given":"Agwad","family":"Eltantawy","sequence":"additional","affiliation":[{"name":"The University of British Columbia, 3333 University Way, Kelowna, BC V1V 1V7, Canada"}]},{"given":"Mohamed S.","family":"Shehata","sequence":"additional","affiliation":[{"name":"The University of British Columbia, 3333 University Way, Kelowna, BC V1V 1V7, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9086-0180","authenticated-orcid":false,"given":"Mostafa M.","family":"Mohamed","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, University of Calgary, AB T2N 1N4, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Musse, S.R., and Thalmann, D. (1997). A model of human crowd behavior: Group inter-relationship and collision detection analysis. Computer Animation and Simulation\u201997, Springer.","DOI":"10.1007\/978-3-7091-6874-5_3"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Watkins, J. (2012, May 08). Preventing a Covid-19 Pandemic. Available online: https:\/\/www.bmj.com\/content\/368\/bmj.m810.full.","DOI":"10.1136\/bmj.m810"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1080\/14775085.2011.568089","article-title":"The importance of tourism motivations among sport event volunteers at the 2007 world artistic gymnastics championships, stuttgart, germany","volume":"16","author":"Jarvis","year":"2011","journal-title":"J. Sport Tour."},{"key":"ref_4","unstructured":"Da Matta, R. (1991). Carnivals, Rogues, and Heroes: An Interpretation of the Brazilian Dilemma, University of Notre Dame Press Notre Dame."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1080\/13683500408667989","article-title":"Landscape, memory and heritage: New year celebrations at angkor, cambodia","volume":"7","author":"Winter","year":"2004","journal-title":"Curr. Issues Tour."},{"key":"ref_6","unstructured":"Peters, F.E. (1996). The Hajj: The Muslim Pilgrimage to Mecca and the Holy Places, Princeton University Press."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Cui, X., Liu, Q., Gao, M., and Metaxas, D.N. (2011, January 20). Abnormal detection using interaction energy potentials. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995558"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mehran, R., Moore, B.E., and Shah, M. (2010). A streakline representation of flow in crowded scenes. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15558-1_32"},{"key":"ref_9","first-page":"163682","article-title":"Motion pattern extraction and event detection for automatic visual surveillance","volume":"7","author":"Benabbas","year":"2011","journal-title":"J. Image Video Process."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1016\/j.ssci.2007.01.015","article-title":"Waiting time in emergency evacuation of crowded public transport terminals","volume":"46","author":"Chow","year":"2008","journal-title":"Saf. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0925-7535(96)81011-3","article-title":"Crowd psychology and engineering","volume":"21","author":"Sime","year":"1995","journal-title":"Saf. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.patrec.2017.07.007","article-title":"A survey of recent advances in cnn-based single image crowd counting and density estimation","volume":"107","author":"Sindagi","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/MSP.2008.930649","article-title":"Mean squared error: Love it or leave it? a new look at signal fidelity measures","volume":"26","author":"Wang","year":"2009","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"79","DOI":"10.3354\/cr030079","article-title":"Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance","volume":"30","author":"Willmott","year":"2005","journal-title":"Clim. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1109\/TPAMI.2011.155","article-title":"Pedestrian detection: An evaluation of the state of the art","volume":"34","author":"Dollar","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, M., Zhang, Z., Huang, K., and Tan, T. (2008, January 8). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, FL, USA.","DOI":"10.1109\/ICPR.2008.4761705"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-540-24673-2_3"},{"key":"ref_19","unstructured":"Dalal, N., and Triggs, B. (2005, January June). Histograms of oriented gradients for human detection. Proceedings of the Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1023\/B:VISI.0000013087.49260.fb","article-title":"Robust real-time face detection","volume":"57","author":"Viola","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_21","unstructured":"Wu, B., and Nevatia, R. (2005, January 17). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV\u201905), Beijing, China."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ali, S., and Shah, M. (2007, January 22). A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.382977"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sabzmeydani, P., and Mori, G. (2007, January 17). Detecting pedestrians by learning shapelet features. Proceedings of the Computer Vision and Pattern Recognition (CVPR\u201907), Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383134"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol. (TIST)"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2188","DOI":"10.1109\/TPAMI.2011.70","article-title":"Hough forests for object detection, tracking, and action recognition","volume":"33","author":"Gall","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s11263-005-6644-8","article-title":"Detecting pedestrians using patterns of motion and appearance","volume":"63","author":"Viola","year":"2005","journal-title":"Int. J. Comput. Vis."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, T., Jia, K., Xu, C., Ma, Y., and Ahuja, N. (2014, January 24). Partial occlusion handling for visual tracking via robust part matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.164"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1016\/j.cviu.2007.02.003","article-title":"Estimating pedestrian counts in groups","volume":"110","author":"Kilambi","year":"2008","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Whitt, W. (2002). Stochastic-Process Limits: An Introduction to Stochastic-Process Limits and Their Application to Queues, Springer Science & Business Media.","DOI":"10.1007\/b97479"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ge, W., and Collins, R.T. (2009, January 20). Marked point processes for crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206621"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chatelain, F., Costard, A., and Michel, O.J. (2011, January 22). A bayesian marked point process for object detection: Application to muse hyperspectral data. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5947136"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Juan, A., and Vidal, E. (2004, January 26\u201326). Bernoulli mixture models for binary images. Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, UK.","DOI":"10.1109\/ICPR.2004.1334543"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1198","DOI":"10.1109\/TPAMI.2007.70770","article-title":"Segmentation and tracking of multiple humans in crowded environments","volume":"30","author":"Zhao","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Geyer, C.J. (1991). Markov Chain Monte Carlo Maximum Likelihood, Interface Foundation of North America.","DOI":"10.1214\/ss\/1177011137"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.cosrev.2018.01.004","article-title":"On the role and the importance of features for background modeling and foreground detection","volume":"28","author":"Bouwmans","year":"2018","journal-title":"Comput. Sci. Rev."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Tuceryan, M., and Jain, A.K. (1993). Texture analysis. Handbook of Pattern Recognition and Computer Vision, World Scientific.","DOI":"10.1142\/9789814343138_0010"},{"key":"ref_37","unstructured":"Mikolajczyk, K., Zisserman, A., and Schmid, C. (2020, September 11). Shape rEcognition With Edge-Based Features. Available online: https:\/\/hal.inria.fr\/inria-00548226\/."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1109\/LSP.2003.821718","article-title":"Adaptive image interpolation based on local gradient features","volume":"11","author":"Hwang","year":"2004","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chan, A.B., Liang, Z.S.J., and Vasconcelos, N. (2008, January 24). Privacy preserving crowd monitoring: Counting people without people models or tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587569"},{"key":"ref_40","unstructured":"Paragios, N., and Ramesh, V. (2001, January 8). A mrf-based approach for real-time subway monitoring. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA."},{"key":"ref_41","first-page":"3","article-title":"Feature mining for localised crowd counting","volume":"Volume 1","author":"Chen","year":"2012","journal-title":"Proceedings of the British Machine Vision Conference"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Idrees, H., Saleemi, I., Seibert, C., and Shah, M. (2013, January 23). Multi-source multi-scale counting in extremely dense crowd images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.329"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Vu, T.H., Osokin, A., and Laptev, I. (2015, January 7). Context-aware cnns for person head detection. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.331"},{"key":"ref_44","unstructured":"Lindeberg, T. (2020, September 11). Scale Invariant Feature Transform. Available online: https:\/\/www.diva-portal.org\/smash\/get\/diva2:480321\/FULLTEXT02."},{"key":"ref_45","unstructured":"Li, S.Z. (2012). Markov Random Field Modeling in Computer Vision, Springer Science & Business Media."},{"key":"ref_46","unstructured":"Lempitsky, V., and Zisserman, A. Learning to count objects in images. Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6 December 2010."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Loy, C.C., Chen, K., Gong, S., and Xiang, T. (2013). Crowd counting and profiling: Methodology and evaluation. Modeling, Simulation and Visual Analysis of Crowds, Springer.","DOI":"10.1007\/978-1-4614-8483-7_14"},{"key":"ref_48","first-page":"311","article-title":"Bundle methods for regularized risk minimization","volume":"11","author":"Teo","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1080\/1055678021000060829a","article-title":"Convex nondifferentiable optimization: A survey focused on the analytic center cutting plane method","volume":"17","author":"Goffin","year":"2002","journal-title":"Optim. Methods Softw."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Pham, V.Q., Kozakaya, T., Yamaguchi, O., and Okada, R. (2015, January 3\u201317). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.372"},{"key":"ref_51","first-page":"18","article-title":"Classification and regression by randomforest","volume":"2","author":"Liaw","year":"2002","journal-title":"News"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Sirmacek, B., and Reinartz, P. (2011, January 6\u201311). Automatic crowd density and motion analysis in airborne image sequences based on a probabilistic framework. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130347"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/10485250310001624819","article-title":"Density estimation using inverse and reciprocal inverse gaussian kernels","volume":"16","author":"Scaillet","year":"2004","journal-title":"Nonparametric Stat."},{"key":"ref_54","first-page":"1","article-title":"Comprehensive survey on distance\/similarity measures between probability density functions","volume":"1","author":"Cha","year":"2007","journal-title":"City"},{"key":"ref_55","first-page":"111","article-title":"Performance analysis of various activation functions in generalized mlp architectures of neural networks","volume":"1","author":"Karlik","year":"2011","journal-title":"Int. J. Artif. Intell. Expert Syst."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Wang, C., Zhang, H., Yang, L., Liu, S., and Cao, X. (2015). Deep people counting in extremely dense crowds. Proceedings of the 23rd ACM International Conference on Multimedia, ACM.","DOI":"10.1145\/2733373.2806337"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.engappai.2015.04.006","article-title":"Fast crowd density estimation with convolutional neural networks","volume":"43","author":"Fu","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Sermanet, P., Kavukcuoglu, K., Chintala, S., and LeCun, Y. (2013, January 23\u201328). Pedestrian detection with unsupervised multi-stage feature learning. Proceedings of the IEEE conference on computer vision and pattern recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.465"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1109\/TSMCC.2005.848169","article-title":"Improving iris recognition accuracy via cascaded classifiers","volume":"35","author":"Sun","year":"2005","journal-title":"IEEE Trans. Syst. Man Cybern. Part Appl. Rev."},{"key":"ref_60","unstructured":"Zhang, C., Li, H., Wang, X., and Yang, X. (2015, January 7\u201312). Cross-scene crowd counting via deep convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Ledig, C., Theis, L., Husz\u00e1r, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., and Wang, Z. (2017, January 21). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.19"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., and Yang, X. (2018, January 18\u201322). Crowd counting via adversarial cross-scale consistency pursuit. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake Cite, UT, USA.","DOI":"10.1109\/CVPR.2018.00550"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Onoro-Rubio, D., and L\u00f3pez-Sastre, R.J. (2016). Towards perspective-free object counting with deep learning. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46478-7_38"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., and Wu, H. (2019, January 16\u201320). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00334"},{"key":"ref_66","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (July, January 26). Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Caesars Palace, Las Vegas, NV, USA."},{"key":"ref_67","unstructured":"Oh, M.H., Olsen, P.A., and Ramamurthy, K.N. (2020, January 7\u201312). Crowd counting with decomposed uncertainty. Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), New York, NY, USA."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Amirgholipour, S., He, X., Jia, W., Wang, D., and Liu, L. (2020). PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting. arXiv Preprint.","DOI":"10.1016\/j.neucom.2021.04.037"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., and Lin, L. (2019, January 27). Crowd counting with deep structured scale integration network. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00186"},{"key":"ref_70","unstructured":"Reddy, M.K.K., Hossain, M., Rochan, M., and Wang, Y. (2020, January 1\u20135). Few-shot scene adaptive crowd counting using meta-learning. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Liu, W., Salzmann, M., and Fua, P. (2019, January June). Context-aware crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00524"},{"key":"ref_72","unstructured":"Andersson, M., Rydell, J., and Ahlberg, J. (2009, January 6\u20139). Estimation of crowd behavior using sensor networks and sensor fusion. Proceedings of the 12th International Conference on Information Fusion, Seattle, WA, USA."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Beal, M.J., Ghahramani, Z., and Rasmussen, C.E. (2002, January 9\u201314). The infinite hidden markov model. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada.","DOI":"10.7551\/mitpress\/1120.003.0079"},{"key":"ref_74","unstructured":"Siva, P., and Xiang, T. (September, January 31). Action detection in crowd. Proceedings of the British Machine Vision Conference (BMVC), Aberystwyth, Wales, UK."},{"key":"ref_75","unstructured":"Li, B., Yu, S., and Lu, Q. (2003). An improved k-nearest neighbor algorithm for text categorization. arXiv."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Hassner, T., Itcher, Y., and Kliper-Gross, O. (2012, January 21). Violent flows: Real-time detection of violent crowd behavior. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239348"},{"key":"ref_77","unstructured":"Shao, J., Loy, C.C., Kang, K., and Wang, X. (July, January 26). Slicing convolutional neural network for crowd video understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Wang, J., Zhu, X., Gong, S., and Li, W. (2017, January 22\u201329). Attribute recognition by joint recurrent learning of context and correlation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.65"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Lazaridis, L., Dimou, A., and Daras, P. (2018, January 3\u20137). Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), A Coru\u00f1a, Spain.","DOI":"10.23919\/EUSIPCO.2018.8553620"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"You, Q., and Jiang, H. (2019, January 16\u201320). Action4d: Online action recognition in the crowd and clutter. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01213"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Ke, Y., Sukthankar, R., and Hebert, M. (2007, January 8\u201311). Event detection in crowded videos. Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Setubal, Portugal.","DOI":"10.1109\/ICCV.2007.4409011"},{"key":"ref_82","first-page":"615","article-title":"The action similarity labeling challenge","volume":"34","author":"Hassner","year":"2011","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Deng, Y., Luo, P., Loy, C.C., and Tang, X. (2014, January 3\u20137). Pedestrian attribute recognition at far distance. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654966"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Rabiee, H., Haddadnia, J., Mousavi, H., Kalantarzadeh, M., Nabi, M., and Murino, V. (2016, January 23\u201326). Novel dataset for fine-grained abnormal behavior understanding in crowd. Proceedings of the 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, USA.","DOI":"10.1109\/AVSS.2016.7738074"},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1016\/j.patrec.2010.05.009","article-title":"Dyntex: A comprehensive database of dynamic textures","volume":"31","author":"Fazekas","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1007\/s11263-008-0184-y","article-title":"Dynamic texture detection based on motion analysis","volume":"82","author":"Fazekas","year":"2009","journal-title":"Int. J. Comput. Vis."},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Ghanem, B., and Ahuja, N. (2010). Maximum margin distance learning for dynamic texture recognition. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15552-9_17"},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"El Gamal, A., and Kim, Y.H. (2011). Network Information Theory, Cambridge University Press.","DOI":"10.1017\/CBO9781139030687"},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"2910","DOI":"10.1109\/TIT.2003.819324","article-title":"Kullback-leibler approximation of spectral density functions","volume":"49","author":"Georgiou","year":"2003","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Shang, C., Ai, H., and Bai, B. (2016, January 25\u201328). End-to-end crowd counting via joint learning local and global count. Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532551"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Chen, K., Gong, S., Xiang, T., and Change Loy, C. (2013, January 23). Cumulative attribute space for age and crowd density estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.319"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Zou, Y. (2016, January 25\u201328). Fast visual object counting via example-based density estimation. Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533041"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Xu, B., and Qiu, G. (2016, January 7\u20139). Crowd density estimation based on rich features and random projection forest. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), New York, NY, USA.","DOI":"10.1109\/WACV.2016.7477682"},{"key":"ref_95","unstructured":"Boominathan, L., Kruthiventi, S.S., and Babu, R.V. Crowdnet: A deep convolutional network for dense crowd counting. Proceedings of the 24th ACM International Conference on Multimedia."},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Walach, E., and Wolf, L. (2016). Learning to count with cnn boosting. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46475-6_41"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Kumagai, S., Hotta, K., and Kurita, T. (2017). Mixture of counting cnns: Adaptive integration of cnns specialized to specific appearance for crowd counting. arXiv.","DOI":"10.1007\/s00138-018-0955-6"},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Marsden, M., McGuinness, K., Little, S., and O\u2019Connor, N.E. (2016). Fully convolutional crowd counting on highly congested scenes. arXiv.","DOI":"10.5220\/0006097300270033"},{"key":"ref_99","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1109\/TCSVT.2018.2837153","article-title":"Beyond counting: Comparisons of density maps for crowd analysis tasks\u2014Counting, detection, and tracking","volume":"29","author":"Kang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"1788","DOI":"10.1109\/TCSVT.2016.2637379","article-title":"Crowd counting via weighted vlad on a dense attribute feature map","volume":"28","author":"Sheng","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Sam, D.B., Surya, S., and Babu, R.V. (2017, January 22\u201325). Switching convolutional neural network for crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.429"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/6\/9\/95\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:09:02Z","timestamp":1760177342000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/6\/9\/95"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,11]]},"references-count":101,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["jimaging6090095"],"URL":"https:\/\/doi.org\/10.3390\/jimaging6090095","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,11]]}}}