{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:24:35Z","timestamp":1760145875085,"version":"build-2065373602"},"reference-count":72,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T00:00:00Z","timestamp":1726272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["52071349","2020MDJC08","SZKY2024027"],"award-info":[{"award-number":["52071349","2020MDJC08","SZKY2024027"]}]},{"name":"Interdisciplinary Research Program of Minzu University of China","award":["52071349","2020MDJC08","SZKY2024027"],"award-info":[{"award-number":["52071349","2020MDJC08","SZKY2024027"]}]},{"name":"Graduate Research and Practice Program, Minzu University of China","award":["52071349","2020MDJC08","SZKY2024027"],"award-info":[{"award-number":["52071349","2020MDJC08","SZKY2024027"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This study focuses on the problem of dense object counting. In dense scenes, variations in object scales and uneven distributions greatly hinder counting accuracy. The current methods, whether CNNs with fixed convolutional kernel sizes or Transformers with fixed attention sizes, struggle to handle such variability effectively. Lower-resolution features are more sensitive to larger objects closer to the camera, while higher-resolution features are more efficient for smaller objects further away. Thus, preserving features that carry the most relevant information at each scale is crucial for improving counting precision. Motivated by this, we propose a multi-resolution scale feature fusion-based universal density counting network (MRSNet). It utilizes independent modules to process high- and low-resolution features, adaptively adjusts receptive field sizes, and incorporates dynamic sparse attention mechanisms to optimize feature information at each resolution, by integrating optimal features across multiple scales into density maps for counting evaluation. Our proposed network effectively mitigates issues caused by large variations in object scales, thereby enhancing counting accuracy. Furthermore, extensive quantitative analyses on six public datasets demonstrate the algorithm\u2019s strong generalization ability in handling diverse object scale variations.<\/jats:p>","DOI":"10.3390\/s24185974","type":"journal-article","created":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T11:36:37Z","timestamp":1726486597000},"page":"5974","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["MRSNet: Multi-Resolution Scale Feature Fusion-Based Universal Density Counting Network"],"prefix":"10.3390","volume":"24","author":[{"given":"Yi","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Information and Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2324-4302","authenticated-orcid":false,"given":"Wei","family":"Song","sequence":"additional","affiliation":[{"name":"School of Information and Engineering, Minzu University of China, Beijing 100081, China"},{"name":"Language Information Security Research Center, Institute of National Security MUC, Minzu University of China, Beijing 100081, China"},{"name":"National Language Resource Monitoring and Research Center of Minority Languages, Minzu University of China, Beijing 100081, China"},{"name":"Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8556-8096","authenticated-orcid":false,"given":"Mingyue","family":"Shao","sequence":"additional","affiliation":[{"name":"School of Information and Engineering, Minzu University of China, Beijing 100081, China"}]},{"given":"Xiangchun","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information and Engineering, Minzu University of China, Beijing 100081, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,14]]},"reference":[{"key":"ref_1","unstructured":"Handte, M., Iqbal, M.U., Wagner, S., Apolinarski, W., Marr\u00f3n, P.J., Navarro, E.M.M., Martinez, S., Barthelemy, S.I., and Fern\u00e1ndez, M.G. (2014, January 24\u201328). Crowd Density Estimation for Public Transport Vehicles. Proceedings of the EDBT\/ICDT Workshops, Athens, Greece."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1016\/j.ssci.2011.01.005","article-title":"CDES: A pixel-based crowd density estimation system for Masjid al-Haram","volume":"49","author":"Hussain","year":"2011","journal-title":"Saf. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wang, Z., Liu, H., Qian, Y., and Xu, T. (2012, January 9\u201313). Crowd density estimation based on local binary pattern co-occurrence matrix. Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, Melbourne, VIC, Australia.","DOI":"10.1109\/ICMEW.2012.71"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yuan, Y., Qiu, C., Xi, W., and Zhao, J. (2011, January 16\u201318). Crowd density estimation using wireless sensor networks. Proceedings of the 2011 Seventh International Conference on Mobile Ad-Hoc and Sensor Networks, Beijing, China.","DOI":"10.1109\/MSN.2011.31"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Marsden, M., McGuinness, K., Little, S., Keogh, C.E., and O\u2019Connor, N.E. (2018, January 18\u201323). People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00842"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Mundhenk, T.N., Konjevod, G., Sakla, W.A., and Boakye, K. (2016, January 11\u201314). A large contextual dataset for classification, detection and counting of cars with deep learning. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Part III 14.","DOI":"10.1007\/978-3-319-46487-9_48"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Arteta, C., Lempitsky, V., and Zisserman, A. (2016, January 11\u201314). Counting in the wild. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Part VII 14.","DOI":"10.1007\/978-3-319-46478-7_30"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Barbedo, J.G.A., Koenigkan, L.V., Santos, P.M., and Ribeiro, A.R.B.J.S. (2020). Counting cattle in UAV images\u2014Dealing with clustered animals and animal\/background contrast changes. Sensors, 20.","DOI":"10.3390\/s20072126"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., and Schmidt, M. (2018, January 8\u201314). Where are the blobs: Counting by localization with point supervision. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_34"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1186\/s13007-017-0224-0","article-title":"TasselNet: Counting maize tassels in the wild via local counts regression network","volume":"13","author":"Lu","year":"2017","journal-title":"Plant Methods"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ma, Y., Sanchez, V., and Guha, T. (2022, January 16\u201319). Fusioncount: Efficient crowd counting via multiscale feature fusion. Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France.","DOI":"10.1109\/ICIP46576.2022.9897322"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"9957","DOI":"10.1002\/int.23023","article-title":"Hierarchical feature aggregation network with semantic attention for counting large-scale crowd","volume":"37","author":"Meng","year":"2022","journal-title":"Int. J. Intell. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2074","DOI":"10.1109\/TMM.2022.3142398","article-title":"STNet: Scale tree network with multi-level auxiliator for crowd counting","volume":"25","author":"Wang","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_14","unstructured":"Cheng, Z.-Q., Li, J.-X., Dai, Q., Wu, X., and Hauptmann, A.G. (November, January 27). Learning spatial awareness to improve crowd counting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3664","DOI":"10.1109\/TIP.2023.3289290","article-title":"Redesigning multi-scale neural network for crowd counting","volume":"32","author":"Du","year":"2023","journal-title":"IEEE Trans. Image Process."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ma, Z., Wei, X., Hong, X., and Gong, Y. (2020, January 12\u201316). Learning scales from points: A scale-aware probabilistic model for crowd counting. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413642"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 11\u201314). Stacked hourglass networks for human pose estimation. Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands. Part VIII 14.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xiao, B., Wu, H., and Wei, Y. (2018, January 8\u201314). Simple baselines for human pose estimation and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, X., Li, G., Han, Z., Zhang, W., Yang, Y., Huang, Q., and Sebe, N. (2021, January 11\u201317). Exploiting sample correlation for crowd counting with multi-expert network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00320"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., Wu, J., and Ma, J. (2021, January 2\u20139). To choose or to fuse? Scale selection for crowd counting. Proceedings of the AAAI Conference on Artificial Intelligence, Virtually.","DOI":"10.1609\/aaai.v35i3.16360"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Subburaman, V.B., Descamps, A., and Carincotte, C. (2012, January 18\u201321). Counting people in the crowd using a generic head detector. Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, Beijing, China.","DOI":"10.1109\/AVSS.2012.87"},{"key":"ref_22","unstructured":"(2003, January 13\u201316). Detecting pedestrians using patterns of motion and appearance. Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France."},{"key":"ref_23","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2179","DOI":"10.1109\/TPAMI.2008.260","article-title":"Monocular pedestrian detection: Survey and experiments","volume":"31","author":"Enzweiler","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","unstructured":"Leibe, B., Seemann, E., and Schiele, B. (2005, January 20\u201325). Pedestrian detection in crowded scenes. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.1109\/TPAMI.2008.75","article-title":"Pedestrian detection via classification on riemannian manifolds","volume":"30","author":"Tuzel","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/3468.983420","article-title":"Estimation of number of people in crowded scenes using perspective transformation","volume":"31","author":"Lin","year":"2001","journal-title":"IEEE Trans. Syst. Man Cybern. Part A Syst. Humans"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1007\/s11263-006-0027-7","article-title":"Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors","volume":"75","author":"Wu","year":"2007","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, M., Zhang, Z., Huang, K., and Tan, T. (2008, January 8\u201311). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA.","DOI":"10.1109\/ICPR.2008.4761705"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1198","DOI":"10.1109\/TPAMI.2007.70770","article-title":"Segmentation and tracking of multiple humans in crowded environments","volume":"30","author":"Zhao","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ge, W., and Collins, R.T. (2009, January 20\u201325). Marked point processes for crowd counting. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206621"},{"key":"ref_33","unstructured":"Zhao, T., and Nevatia, R. (2003, January 18\u201320). Bayesian human segmentation in crowded situations. Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.neucom.2014.01.019","article-title":"Recognizing human group action by layered model with multiple cues","volume":"136","author":"Cheng","year":"2014","journal-title":"Neurocomputing"},{"key":"ref_35","unstructured":"Paragios, N., and Ramesh, V. (2001, January 8\u201314). A MRF-based approach for real-time subway monitoring. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chan, A.B., Liang, Z.-S.J., and Vasconcelos, N. (2008, January 23\u201328). Privacy preserving crowd monitoring: Counting people without people models or tracking. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587569"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1109\/MSP.2005.1511827","article-title":"Graphical model architectures for speech recognition","volume":"22","author":"Bilmes","year":"2005","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_38","first-page":"1324","article-title":"Learning to count objects in images","volume":"23","author":"Lempitsky","year":"2010","journal-title":"Proc. Adv. Neural Inf. Process. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Rodriguez, M., Laptev, I., Sivic, J., and Audibert, J.-Y. (2011, January 6\u201313). Density-aware person detection and tracking in crowds. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126526"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pham, V.-Q., Kozakaya, T., Yamaguchi, O., and Okada, R. (2015, January 7\u201313). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.372"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Zou, Y. (2016, January 25\u201328). Fast visual object counting via example-based density estimation. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7533041"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhu, L., Wang, X., Ke, Z., Zhang, W., and Lau, R.W. (2023, January 17\u201324). Biformer: Vision transformer with bi-level routing attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00995"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Idrees, H., Tayyab, M., Athrey, K., Zhang, D., Al-Maadeed, S., Rajpoot, N., and Shah, M. (2018, January 8\u201314). Composition loss for counting, density map estimation and localization in dense crowds. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_33"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2141","DOI":"10.1109\/TPAMI.2020.3013269","article-title":"NWPU-crowd: A large-scale benchmark for crowd counting and localization","volume":"43","author":"Wang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Guerrero-G\u00f3mez-Olmedo, R., Torre-Jim\u00e9nez, B., L\u00f3pez-Sastre, R., Maldonado-Basc\u00f3n, S., and Onoro-Rubio, D. (2015, January 17\u201319). Extremely overlapping vehicle counting. Proceedings of the Pattern Recognition and Image Analysis: 7th Iberian Conference, IbPRIA 2015, Santiago de Compostela, Spain. Proceedings 7.","DOI":"10.1007\/978-3-319-19390-8_48"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Liu, W., Salzmann, M., and Fua, P. (2019, January 15\u201320). Context-aware crowd counting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00524"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, Q., Gao, J., Lin, W., and Yuan, Y. (2019, January 15\u201320). Learning from synthetic data for crowd counting in the wild. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00839"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., and Shen, C. (2019, January 15\u201320). From open set to closed set: Counting objects by spatial divide-and-conquer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00845"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Ma, Z., Wei, X., Hong, X., and Gong, Y. (2019, January 15\u201320). Bayesian loss for crowd count estimation with point supervision. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00624"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (2019, January 15\u201320). Multi-level bottom-top and top-bottom feature fusion for crowd counting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00109"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1357","DOI":"10.1109\/TPAMI.2020.3022878","article-title":"Kernel-based density map generation for dense object counting","volume":"44","author":"Wan","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_54","first-page":"2739","article-title":"Locate, size, and count: Accurately resolving people in dense crowds via detection","volume":"43","author":"Sam","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., and Pang, Y. (2020, January 13\u201319). Attention scaling for crowd counting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00476"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Liu, X., Yang, J., Ding, W., Wang, T., Wang, Z., and Xiong, J. (2020, January 23\u201328). Adaptive mixture regression network with local counting map for crowd counting. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Part XXIV 16.","DOI":"10.1007\/978-3-030-58586-0_15"},{"key":"ref_57","first-page":"3386","article-title":"Modeling noisy annotations for crowd counting","volume":"33","author":"Wan","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_58","first-page":"1595","article-title":"Distribution matching for crowd counting","volume":"33","author":"Wang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_59","first-page":"3602","article-title":"Locality-aware crowd counting","volume":"44","author":"Zhou","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1007\/s11263-021-01542-z","article-title":"Autoscale: Learning to scale for crowd counting","volume":"130","author":"Xu","year":"2022","journal-title":"Int. J. Comput. Vis."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Wan, J., Liu, Z., and Chan, A.B. (2021, January 20\u201325). A generalized loss function for crowd counting and localization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00201"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"2862","DOI":"10.1109\/TIP.2021.3055631","article-title":"Decoupled two-stage crowd counting and beyond","volume":"30","author":"Cheng","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Song, Q., Wang, C., Jiang, Z., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., and Wu, Y. (2021, January 11\u201317). Rethinking counting and localization in crowds: A purely point-based framework. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00335"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Ma, Z., Hong, X., Wei, X., Qiu, Y., and Gong, Y. (2021, January 11\u201317). Towards a universal model for cross-dataset crowd counting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00319"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Shu, W., Wan, J., Tan, K.C., Kwong, S., and Chan, A.B. (2022, January 18\u201324). Crowd counting in the frequency domain. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01900"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Cheng, Z.-Q., Dai, Q., Li, H., Song, J., Wu, X., and Hauptmann, A.G. (2022, January 18\u201324). Rethinking spatial invariance of convolutional networks for object counting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01902"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Wang, M., Cai, H., Dai, Y., and Gong, M. (2023, January 2\u20137). Dynamic mixture of counter network for location-agnostic crowd counting. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00025"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1016\/j.future.2023.05.013","article-title":"Crowd counting in smart city via lightweight ghost attention pyramid network","volume":"147","author":"Guo","year":"2023","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Alhawsawi, A.N., Khan, S.D., and Ur Rehman, F.J.I. (2024). Crowd Counting in Diverse Environments Using a Deep Routing Mechanism Informed by Crowd Density Levels. Information, 15.","DOI":"10.3390\/info15050275"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wu, G., Costeira, J.P., and Moura, J.M. (2017, January 22\u201329). Fcn-rlstm: Deep spatio-temporal neural networks for vehicle counting in city cameras. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.396"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1186\/s13007-019-0537-2","article-title":"TasselNetv2: In-field counting of wheat spikes with context-augmented local regression networks","volume":"15","author":"Xiong","year":"2019","journal-title":"Plant Methods"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., and Chen, D. (2018, January 18\u201323). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00120"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/18\/5974\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:56:29Z","timestamp":1760111789000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/18\/5974"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,14]]},"references-count":72,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["s24185974"],"URL":"https:\/\/doi.org\/10.3390\/s24185974","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2024,9,14]]}}}