{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:26:20Z","timestamp":1760239580423,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Crowd Crowd counting is not simply a matter of counting the numbers of people, but also requires that one obtains people\u2019s spatial distribution in a picture. It is still a challenging task for crowded scenes, occlusion, and scale variation. This paper proposes a global and local attention network (GLANet) for efficient crowd counting, which applies an attention mechanism to enhance the features. Firstly, the feature extractor module (FEM) uses the pertained VGG-16 to parse out a simple feature map. Secondly, the global and local attention module (GLAM) effectively captures the local and global attention information to enhance features. Thirdly, the feature fusing module (FFM) applies a series of convolutions to fuse various features, and generate density maps. Finally, we conduct some experiments on a mainstream dataset and compare them with state-of-the-art methods\u2019 performances.<\/jats:p>","DOI":"10.3390\/info11120567","type":"journal-article","created":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T11:59:00Z","timestamp":1607083140000},"page":"567","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Crowd Counting Guided by Attention Network"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7694-5375","authenticated-orcid":false,"given":"Pei","family":"Nie","sequence":"first","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4973-6444","authenticated-orcid":false,"given":"Cien","family":"Fan","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lian","family":"Zou","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5058-0523","authenticated-orcid":false,"given":"Liqiong","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaopeng","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cao, X., Wang, Z., Zhao, Y., and Su, F. (2018, January 8\u201314). Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_45"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (2017, January 22\u201329). Generating high-quality crowd density maps using contextual pyramid cnns. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.206"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhao, M., Zhang, J., Zhang, C., and Zhang, W. (2019, January 15\u201320). Leveraging heterogeneous auxiliary tasks to assist crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01302"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, J., Gao, C., Meng, D., and Hauptmann, A.G. (2018, January 18\u201322). Decidenet: Counting varying density crowds through attention guided detection and density estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00545"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, C., Chang, F., Kot, A.C., and Zhang, W. (2019, January 23\u201325). Attention to head locations for crowd counting. Proceedings of the International Conference on Image and Graphics. Springer, Beijing, China.","DOI":"10.1007\/978-3-030-34110-7_61"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Onoro-Rubio, D., and L\u00f3pez-Sastre, R.J. (2016, January 11\u201314). Towards perspective-free object counting with deep learning. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46478-7_38"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Shi, M., Yang, Z., Xu, C., and Chen, Q. (2019, January 15\u201320). Revisiting perspective information for efficient crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00745"},{"key":"ref_10","unstructured":"Zhang, C., Li, H., Wang, X., and Yang, X. (2015, January 7\u201312). Cross-scene crowd counting via deep convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sam, D.B., Surya, S., and Babu, R.V. (2017, January 21\u201326). Switching convolutional neural network for crowd counting. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.429"},{"key":"ref_12","unstructured":"Abdolrashidi, A., Minaei, M., Azimi, E., and Minaee, S. (2020). Age and Gender Prediction From Face Images Using Attentional Convolutional Network. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Li, L., Tang, S., Deng, L., Zhang, Y., and Tian, Q. (2017, January 4\u20139). Image caption with global-local attention. Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11236"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, M., Zhang, Z., Huang, K., and Tan, T. (2008, January 8\u201311). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. Proceedings of the IEEE 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA.","DOI":"10.1109\/ICPR.2008.4761705"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, M., and Wang, X. (2011, January 20\u201325). Automatic adaptation of a generic pedestrian detector to a specific traffic scene. Proceedings of the IEEE CVPR, Providence, RI, USA.","DOI":"10.1109\/CVPR.2011.5995698"},{"key":"ref_18","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, IEEE."},{"key":"ref_19","unstructured":"Paragios, N., and Ramesh, V. (2001, January 8\u201314). A MRF-based approach for real-time subway monitoring. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/0165-1684(96)00075-8","article-title":"Distributed data fusion for real-time crowding estimation","volume":"53","author":"Regazzoni","year":"1996","journal-title":"Signal Process."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chan, A.B., Liang, Z.S.J., and Vasconcelos, N. (2008, January 23\u201328). Privacy preserving crowd monitoring: Counting people without people models or tracking. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587569"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2160","DOI":"10.1109\/TIP.2011.2172800","article-title":"Counting people with low-level features and Bayesian regression","volume":"21","author":"Chan","year":"2011","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, X., Van De Weijer, J., and Bagdanov, A.D. (2018, January 18\u201322). Leveraging unlabeled data for crowd counting by learning to rank. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00799"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, L., Wang, H., Li, G., Ouyang, W., and Lin, L. (2018). Crowd counting using deep recurrent spatial-aware network. arXiv.","DOI":"10.24963\/ijcai.2018\/118"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., and Chen, D. (2018, January 18\u201322). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00120"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Chen, X., Bin, Y., Sang, N., and Gao, C. (2019, January 7\u201311). Scale pyramid network for crowd counting. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA.","DOI":"10.1109\/WACV.2019.00211"},{"key":"ref_27","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and So Kweon, I. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_30","unstructured":"Mnih, V., Heess, N., and Graves, A. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems, NIPS\u201914."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Luong, M.T., Pham, H., and Manning, C.D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv.","DOI":"10.18653\/v1\/D15-1166"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hou, Y., Li, C., Yang, F., Ma, C., Zhu, L., Li, Y., Jia, H., and Xie, X. (2020, January 4\u20138). BBA-NET: A Bi-Branch Attention Network For Crowd Counting. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053955"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Guo, D., Li, K., Zha, Z.J., and Wang, M. (2019, January 21\u201325). Dadnet: Dilated-attention-deformable convnet for crowd counting. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350881"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Von Borstel, M., Kandemir, M., Schmidt, P., Rao, M.K., Rajamani, K., and Hamprecht, F.A. (2016, January 11\u201314). Gaussian process density counting from weak supervision. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_22"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (September, January 29). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078491"},{"key":"ref_36","unstructured":"Shi, Z., Mettes, P., and Snoek, C.G. (November, January 27). Counting with focus for free. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Jiang, S., Lu, X., Lei, Y., and Liu, L. (2019). Mask-aware networks for crowd counting. IEEE Trans. Circuits Syst. Video Technol.","DOI":"10.1109\/TCSVT.2019.2934989"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gao, J., Wang, Q., and Li, X. (2019). PCC Net: Perspective Crowd Counting via Spatial Convolutional Network. IEEE Trans. Circuits Syst. Video Technol.","DOI":"10.1109\/TCSVT.2019.2919139"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M.-M., and Zheng, G. (2018, January 18\u201322). Crowd counting with deep negative correlation learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00564"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Idrees, H., Saleemi, I., Seibert, C., and Shah, M. (2013, January 23\u201328). Multi-source multi-scale counting in extremely dense crowd images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.329"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/12\/567\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:41:28Z","timestamp":1760179288000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/12\/567"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,4]]},"references-count":41,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["info11120567"],"URL":"https:\/\/doi.org\/10.3390\/info11120567","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2020,12,4]]}}}