{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T11:28:08Z","timestamp":1775042888535,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"24","license":[{"start":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T00:00:00Z","timestamp":1671062400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61801272"],"award-info":[{"award-number":["61801272"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object counting is a fundamental task in remote sensing analysis. Nevertheless, it has been barely studied compared with object counting in natural images due to the challenging factors, e.g., background clutter and scale variation. This paper proposes a triple attention and scale-aware network (TASNet). Specifically, a triple view attention (TVA) module is adopted to remedy the background clutter, which executes three-dimension attention operations on the input tensor. In this case, it can capture the interaction dependencies between three dimensions to distinguish the object region. Meanwhile, a pyramid feature aggregation (PFA) module is employed to relieve the scale variation. The PFA module is built in a four-branch architecture, and each branch has a similar structure composed of dilated convolution layers to enlarge the receptive field. Furthermore, a scale transmit connection is introduced to enable the lower branch to acquire the upper branch\u2019s scale, increasing the output\u2019s scale diversity. Experimental results on remote sensing datasets prove that the proposed model can address the issues of background clutter and scale variation. Moreover, it outperforms the state-of-the-art (SOTA) competitors subjectively and objectively.<\/jats:p>","DOI":"10.3390\/rs14246363","type":"journal-article","created":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T02:54:02Z","timestamp":1671159242000},"page":"6363","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Object Counting in Remote Sensing via Triple Attention and Scale-Aware Network"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9405-3792","authenticated-orcid":false,"given":"Xiangyu","family":"Guo","sequence":"first","affiliation":[{"name":"School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5438-9467","authenticated-orcid":false,"given":"Marco","family":"Anisetti","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Universit\u2019a degli Studi di Milano, 20133 Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7273-7499","authenticated-orcid":false,"given":"Mingliang","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0651-4278","authenticated-orcid":false,"given":"Gwanggil","family":"Jeon","sequence":"additional","affiliation":[{"name":"School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255000, China"},{"name":"Department of Embedded Systems Engineering, Incheon National University, Incheon 22012, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1305","DOI":"10.1109\/TIP.2020.3042084","article-title":"Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images","volume":"30","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gadamsetty, S., Ch, R., Ch, A., Iwendi, C., and Gadekallu, T.R. (2022). Hash-Based Deep Learning Approach for Remote Sensing Satellite Imagery Detection. Water, 14.","DOI":"10.3390\/w14050707"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bazi, Y., Bashmal, L., Rahhal, M.M.A., Dayil, R.A., and Ajlan, N.A. (2021). Vision Transformers for Remote Sensing Image Classification. Remote. Sens., 13.","DOI":"10.3390\/rs13030516"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4764","DOI":"10.1109\/TGRS.2020.2966805","article-title":"Scene-Adaptive Remote Sensing Image Super-Resolution Using a Multiscale Attention Network","volume":"58","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.comnet.2015.12.023","article-title":"Urban planning and building smart cities based on the Internet of Things using Big Data analytics","volume":"101","author":"Rathore","year":"2016","journal-title":"Comput. Netw."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.isprsjprs.2016.10.010","article-title":"MRF-based segmentation and unsupervised classification for building and road detection in peri-urban areas of high-resolution satellite images","volume":"122","author":"Grinias","year":"2016","journal-title":"Isprs J. Photogramm. Remote. Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/TPAMI.2011.94","article-title":"Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics","volume":"34","author":"Benedek","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1007\/s13753-017-0143-8","article-title":"Quantifying Disaster Physical Damage Using Remote Sensing Data\u2014A Technical Work Flow and Case Study of the 2014 Ludian Earthquake in China","volume":"8","author":"Fan","year":"2017","journal-title":"Int. J. Disaster Risk Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pham, V.Q., Kozakaya, T., Yamaguchi, O., and Okada, R. (2015, January 7\u201313). COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.372"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dai, F., Liu, H., Ma, Y., Cao, J., Zhao, Q., and Zhang, Y. (2021, January 22\u201324). Dense Scale Network for Crowd Counting. Proceedings of the 2021 International Conference on Multimedia Retrieval, Tokyo, Japan.","DOI":"10.1145\/3460426.3463628"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gao, J., Gong, M., and Li, X. (2022). Global Multi-Scale Information Fusion for Multi-Class Object Counting in Remote Sensing Images. Remote. Sens., 14.","DOI":"10.3390\/rs14164026"},{"key":"ref_14","unstructured":"Gao, G., Gao, J., Liu, Q., Wang, Q., and Wang, Y. (2020). CNN-based Density Estimation and Crowd Counting: A Survey. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_16","first-page":"1","article-title":"PSGCNet: A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote-Sensing Images","volume":"60","author":"Gao","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3642","DOI":"10.1109\/TGRS.2020.3020555","article-title":"Counting From Sky: A Large-Scale Data Set for Remote Sensing Object Counting and a Benchmark Method","volume":"59","author":"Gao","year":"2021","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.ins.2020.05.062","article-title":"Global context based automatic road segmentation via dilated convolutional neural network","volume":"535","author":"Lan","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, X., Bin, Y., Sang, N., and Gao, C. (2019, January 7\u201311). Scale Pyramid Network for Crowd Counting. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2019.00211"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1089\/big.2022.0039","article-title":"Spatial-Frequency Attention Network for Crowd Counting","volume":"10","author":"Guo","year":"2022","journal-title":"Big Data"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"41214","DOI":"10.1117\/1.JEI.31.4.041214","article-title":"Group-split attention network for crowd counting","volume":"31","author":"Zhai","year":"2022","journal-title":"J. Electron. Imaging"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.neucom.2019.08.018","article-title":"SCAR: Spatial-\/channel-wise attention regression networks for crowd counting","volume":"363","author":"Gao","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_23","unstructured":"Zhu, L., Zhao, Z., Lu, C., Lin, Y., Peng, Y., and Yao, T. (2019). Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jiang, X., Zhang, L., Xu, M., Zhang, T., Lv, P., Zhou, B., Yang, X., and Pang, Y. (2020, January 13\u201319). Attention Scaling for Crowd Counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00476"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Khan, K., Khan, R., Albattah, W., Nayab, D., Qamar, A.M., Habib, S., and Islam, M. (2021). Crowd Counting Using End-to-End Semantic Image Segmentation. Electronics, 10.","DOI":"10.3390\/electronics10111293"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Meng, Y., Zhang, H., Zhao, Y., Yang, X., Qian, X., Huang, X., and Zheng, Y. (2021, January 10\u201317). Spatial Uncertainty-Aware Semi-Supervised Crowd Counting. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01526"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3486","DOI":"10.1109\/TCSVT.2019.2919139","article-title":"PCC Net: Perspective Crowd Counting via Spatial Convolutional Network","volume":"30","author":"Gao","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, Y., Liu, L., Wang, P., Zhang, P., and Lei, Y. (2020). Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks. arXiv.","DOI":"10.1007\/978-3-030-58555-6_15"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Cao, X., Wang, Z., Zhao, Y., and Su, F. (2018, January 8\u201314). Scale Aggregation Network for Accurate and Efficient Crowd Counting. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_45"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., and Chen, D. (2018, January 18\u201323). CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00120"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, L., Qiu, Z., Li, G., Liu, S., Ouyang, W., and Lin, L. (November, January 27). Crowd Counting With Deep Structured Scale Integration Network. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00186"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, W., Salzmann, M., and Fua, P. (2019, January 15\u201320). Context-Aware Crowd Counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00524"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.neucom.2020.09.059","article-title":"A multi-scale and multi-level feature aggregation network for crowd counting","volume":"423","author":"Zhu","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_34","first-page":"1","article-title":"Distillation Remote Sensing Object Counting via Multi-Scale Context Feature Aggregation","volume":"60","author":"Duan","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020, January 13\u201319). GhostNet: More Features From Cheap Operations. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"6547","DOI":"10.1109\/TII.2022.3160634","article-title":"SSR-HEF: Crowd Counting With Multiscale Semantic Refining and Hard Example Focusing","volume":"18","author":"Chen","year":"2022","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_37","unstructured":"Simonyan, K., and Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Misra, D., Nalamada, T., Arasanipalai, A.U., and Hou, Q. (2021, January 3\u20138). Rotate to Attend: Convolutional Triplet Attention Module. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00318"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhai, W., Gao, M., Souri, A., Li, Q., Guo, X., Shang, J., and Zou, G. (2022). An attentive hierarchy ConvNet for crowd counting in smart city. Clust. Comput.","DOI":"10.1007\/s10586-022-03749-2"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Hsieh, M.R., Lin, Y.L., and Hsu, W.H. (2017, January 22\u201329). Drone-Based Object Counting by Spatially Regularized Regional Proposal Network. Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.446"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.neucom.2020.05.056","article-title":"MobileCount: An efficient encoder-decoder framework for real-time crowd counting","volume":"407","author":"Wang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_42","unstructured":"Sindagi, V., and Patel, V. (September, January 29). CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, Q., Gao, J., Lin, W., and Yuan, Y. (2019, January 15\u201320). Learning From Synthetic Data for Crowd Counting in the Wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00839"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2015","journal-title":"TPAMI"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.1109\/TIP.2018.2875353","article-title":"Divide and Count: Generic Object Counting by Image Divisions","volume":"28","author":"Stahl","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal Loss for Dense Object Detection","volume":"42","author":"Lin","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Mundhenk, T.N., Konjevod, G., Sakla, W.A., and Boakye, K. (2016, January 11\u201314). A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_48"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ma, Z., Wei, X., Hong, X., and Gong, Y. (November, January 27). Bayesian Loss for Crowd Count Estimation With Point Supervision. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00624"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R.B., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated Residual Transformations for Deep Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Yu, X., Han, Z., Gong, Y., Jan, N., and Zhao, J. (2020, January 23\u201328). The 1st Tiny Object Detection Challenge: Methods and Results. Proceedings of the 2020 ECCV Workshops, Glasgow, UK.","DOI":"10.1007\/978-3-030-68238-5_23"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6363\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:42:21Z","timestamp":1760146941000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/24\/6363"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,15]]},"references-count":52,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["rs14246363"],"URL":"https:\/\/doi.org\/10.3390\/rs14246363","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,15]]}}}