{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T00:54:09Z","timestamp":1769561649553,"version":"3.49.0"},"reference-count":64,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T00:00:00Z","timestamp":1652140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Polish Ministry of Science and Higher Education","award":["0214\/SBAD\/0233"],"award-info":[{"award-number":["0214\/SBAD\/0233"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Recent advances in deep learning-based image processing have enabled significant improvements in multiple computer vision fields, with crowd counting being no exception. Crowd counting is still attracting research interest due to its potential usefulness for traffic and pedestrian stream monitoring and analysis. This study considered a specific case of crowd counting, namely, counting based on low-altitude aerial images collected by an unmanned aerial vehicle. We evaluated a range of neural network architectures to find ones appropriate for on-board image processing using edge computing devices while minimising the loss in performance. Through experiments on a range of neural network architectures, we also showed that the input image resolution significantly impacts the prediction quality and should be considered an important factor before going for a more complex neural network model to improve accuracy. Moreover, by extending a state-of-the-art benchmark with more in-depth testing, we showed that larger models might be prone to overfitting because of the relative scarcity of training data.<\/jats:p>","DOI":"10.3390\/rs14102288","type":"journal-article","created":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T21:52:11Z","timestamp":1652219531000},"page":"2288","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["On-Board Crowd Counting and Density Estimation Using Low Altitude Unmanned Aerial Vehicles\u2014Looking beyond Beating the Benchmark"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1601-6560","authenticated-orcid":false,"given":"Bartosz","family":"Ptak","sequence":"first","affiliation":[{"name":"Institute of Robotics and Machine Intelligence, Poznan University of Technology, Piotrowo 3A, 60-965 Poznan, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0275-5629","authenticated-orcid":false,"given":"Dominik","family":"Pieczy\u0144ski","sequence":"additional","affiliation":[{"name":"Institute of Robotics and Machine Intelligence, Poznan University of Technology, Piotrowo 3A, 60-965 Poznan, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3479-0237","authenticated-orcid":false,"given":"Mateusz","family":"Piechocki","sequence":"additional","affiliation":[{"name":"Institute of Robotics and Machine Intelligence, Poznan University of Technology, Piotrowo 3A, 60-965 Poznan, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6483-2357","authenticated-orcid":false,"given":"Marek","family":"Kraft","sequence":"additional","affiliation":[{"name":"Institute of Robotics and Machine Intelligence, Poznan University of Technology, Piotrowo 3A, 60-965 Poznan, Poland"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.patrec.2017.07.007","article-title":"A survey of recent advances in cnn-based single image crowd counting and density estimation","volume":"107","author":"Sindagi","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_2","unstructured":"Gao, G., Gao, J., Liu, Q., Wang, Q., and Wang, Y. (2020). Cnn-based density estimation and crowd counting: A survey. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ilyas, N., Shahzad, A., and Kim, K. (2020). Convolutional-neural network-based image crowd counting: Review, categorization, analysis, and performance evaluation. Sensors, 20.","DOI":"10.3390\/s20010043"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Perko, R., Klopschitz, M., Almer, A., and Roth, P.M. (2021). Critical Aspects of Person Counting and Density Estimation. J. Imaging, 7.","DOI":"10.3390\/jimaging7020021"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1007\/s10044-021-00959-z","article-title":"Approaches on crowd counting and density estimation: A review","volume":"24","author":"Li","year":"2021","journal-title":"Pattern Anal. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shao, J., Kang, K., Change Loy, C., and Wang, X. (2015, January 7\u201312). Deeply learned attributes for crowded scene understanding. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299097"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yi, S., Li, H., and Wang, X. (2015, January 7\u201312). Understanding pedestrian behaviors from stationary crowd groups. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298971"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Marsden, M., McGuinness, K., Little, S., and O\u2019Connor, N.E. (September, January 29). Resnetcrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078482"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1525\/ctx.2004.3.3.12","article-title":"Who counts and how: Estimating the size of protests","volume":"3","author":"McPhail","year":"2004","journal-title":"Contexts"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"172231","DOI":"10.1109\/ACCESS.2019.2956508","article-title":"A survey on the new generation of deep learning in image processing","volume":"7","author":"Jiao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_11","unstructured":"Thompson, N.C., Greenewald, K., Lee, K., and Manso, G.F. (2020). The computational limits of deep learning. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, M., Zhang, Z., Huang, K., and Tan, T. (2008, January 8\u201311). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA.","DOI":"10.1109\/ICPR.2008.4761705"},{"key":"ref_13","unstructured":"Sim, C.H., Rajmadhan, E., and Ranganath, S. (2008, January 12\u201315). Using color bin images for crowd detections. Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Subburaman, V.B., Descamps, A., and Carincotte, C. (2012, January 18\u201321). Counting people in the crowd using a generic head detector. Proceedings of the 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, Beijing, China.","DOI":"10.1109\/AVSS.2012.87"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Topkaya, I.S., Erdogan, H., and Porikli, F. (2014, January 26\u201329). Counting people by clustering person detector outputs. Proceedings of the 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Seoul, Korea.","DOI":"10.1109\/AVSS.2014.6918687"},{"key":"ref_16","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 14\u201319). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2020, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2160","DOI":"10.1109\/TIP.2011.2172800","article-title":"Counting people with low-level features and Bayesian regression","volume":"21","author":"Chan","year":"2011","journal-title":"IEEE Trans. Image Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, K., Loy, C.C., Gong, S., and Xiang, T. (2012, January 3\u20137). Feature mining for localised crowd counting. Proceedings of the British Machine Vision Conference, 2012, Surrey, UK.","DOI":"10.5244\/C.26.21"},{"key":"ref_21","first-page":"1324","article-title":"Learning to count objects in images","volume":"23","author":"Lempitsky","year":"2010","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","unstructured":"Pham, V.Q., Kozakaya, T., Yamaguchi, O., and Okada, R. (2021, January 11\u201317). Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, BC, Canada."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5455","DOI":"10.1007\/s10462-020-09825-6","article-title":"A survey of the recent architectures of deep convolutional neural networks","volume":"53","author":"Khan","year":"2020","journal-title":"Artif. Intell. Rev."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"102705","DOI":"10.1016\/j.jvcir.2019.102705","article-title":"Research on image feature extraction and retrieval algorithms based on convolutional neural network","volume":"69","author":"Peng","year":"2020","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.engappai.2015.04.006","article-title":"Fast crowd density estimation with convolutional neural networks","volume":"43","author":"Fu","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, C., Zhang, H., Yang, L., Liu, S., and Cao, X. (2015, January 26\u201330). Deep people counting in extremely dense crowds. Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia.","DOI":"10.1145\/2733373.2806337"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Boominathan, L., Kruthiventi, S.S., and Babu, R.V. (2016, January 27\u201330). Crowdnet: A deep convolutional network for dense crowd counting. Proceedings of the 24th ACM International Conference on Multimedia, Las Vegas, NV, USA.","DOI":"10.1145\/2964284.2967300"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Onoro-Rubio, D., and L\u00f3pez-Sastre, R.J. (2016, January 11\u201314). Towards perspective-free object counting with deep learning. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46478-7_38"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Babu Sam, D., Surya, S., and Venkatesh Babu, R. (2017, January 21\u201326). Switching convolutional neural network for crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.429"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., and Shao, L. (2019, January 27\u201328). Relational attention network for crowd counting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00689"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., and Chen, D. (2018, January 18\u201322). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00120"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., and Wu, H. (2019, January 27\u201328). Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/CVPR.2019.00334"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_35","unstructured":"Valloli, V.K., and Mehta, K. (2019). W-Net: Reinforced u-net for density map estimation. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gao, J., Gong, M., and Li, X. (2021). Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer. arXiv.","DOI":"10.1016\/j.neucom.2022.09.113"},{"key":"ref_37","unstructured":"Tian, Y., Chu, X., and Wang, H. (2021). CCTrans: Simplifying and Improving Crowd Counting with Transformer. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hsieh, M.R., Lin, Y.L., and Hsu, W.H. (2017, January 22\u201329). Drone-based object counting by spatially regularized regional proposal network. Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.446"},{"key":"ref_39","unstructured":"Wen, L., Du, D., Zhu, P., Hu, Q., Wang, Q., Bo, L., and Lyu, S. (2019). Drone-based joint density map estimation, localization and tracking with space-time multi-scale attention network. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Du, D., Wen, L., Zhu, P., Fan, H., Hu, Q., Ling, H., Shah, M., Pan, J., Al-Ali, A., and Mohamed, A. (2020, January 23\u201328). Visdrone-cc2020: The vision meets drone crowd counting challenge results. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-66823-5_41"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wen, L., Du, D., Zhu, P., Hu, Q., Wang, Q., Bo, L., and Lyu, S. (2021, January 19\u201325). Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.00772"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Tian, Y., Duan, C., Zhang, R., Wei, Z., and Wang, H. (2021, January 6\u201311). Lightweight Dual-Task Networks For Crowd Counting In Aerial Images. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413949"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Han, T., Gao, J., Wang, Q., and Li, X. (2020, January 23\u201328). A flow base bi-path network for cross-scene video crowd understanding in aerial view. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-66823-5_34"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MSSC.2017.2745818","article-title":"Embedded deep neural network processing: Algorithmic and processor techniques bring deep learning to IoT and edge devices","volume":"9","author":"Verhelst","year":"2017","journal-title":"IEEE-Solid-State Circuits Mag."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1016\/j.neucom.2021.07.045","article-title":"Pruning and quantization for deep neural network acceleration: A survey","volume":"461","author":"Liang","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"04021092","DOI":"10.1061\/(ASCE)CF.1943-5509.0001652","article-title":"Building and Infrastructure Defect Detection and Visualization Using Drone and Deep Learning Technologies","volume":"35","author":"Jiang","year":"2021","journal-title":"J. Perform. Constr. Facil."},{"key":"ref_47","unstructured":"Franklin, D., Hariharapura, S.S., and Todd, S. (2021, August 11). Bringing Cloud-Native Agility to Edge AI Devices with the NVIDIA Jetson Xavier NX Developer Kit. Available online: https:\/\/developer.nvidia.com\/blog\/bringing-cloud-native-agility-to-edge-ai-with-jetson-xavier-nx\/."},{"key":"ref_48","unstructured":"Gorbachev, Y., Fedorov, M., Slavutin, I., Tugarev, A., Fatekhov, M., and Tarkan, Y. (2019, January 27\u201328). OpenVINO deep learning workbench: Comprehensive analysis and tuning of neural networks inference. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea."},{"key":"ref_49","unstructured":"Libutti, L.A., Igual, F.D., Pinuel, L., De Giusti, L., and Naiouf, M. (2020, January 31). Benchmarking performance and power of USB accelerators for inference with MLPerf. Proceedings of the 2nd Workshop on Accelerated Machine Learning (AccML), Valencia, Spain."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., and Liang, J. (2018). UNet++: A Nested U-Net Architecture for Medical Image Segmentation. arXiv.","DOI":"10.1007\/978-3-030-00889-5_1"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_53","unstructured":"Tan, M., and Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Tan, M., Chen, B., Pang, R., Vasudevan, V., and Le, Q.V. (2018). MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv.","DOI":"10.1109\/CVPR.2019.00293"},{"key":"ref_55","unstructured":"Tan, M., and Le, Q.V. (2019). MixConv: Mixed Depthwise Convolutional Kernels. arXiv."},{"key":"ref_56","unstructured":"Yakubovskiy, P. (2022, March 13). Segmentation Models Pytorch. Available online: https:\/\/github.com\/qubvel\/segmentation_models.pytorch."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Kroeger, T., Timofte, R., Dai, D., and Van Gool, L. (2016, January 11\u201314). Fast optical flow using dense inverse search. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_29"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Hur, J., and Roth, S. (2020). Optical flow estimation in the deep learning age. Modelling Human Motion, Springer.","DOI":"10.1007\/978-3-030-46732-6_7"},{"key":"ref_59","unstructured":"Buslaev, A.V., Parinov, A., Khvedchenya, E., Iglovikov, V.I., and Kalinin, A.A. (2018). Albumentations: Fast and flexible image augmentations. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on Image Data Augmentation for Deep Learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"ref_61","unstructured":"Bai, J., Lu, F., and Zhang, K. (2022, March 13). ONNX: Open Neural Network Exchange. Available online: https:\/\/github.com\/onnx\/onnx."},{"key":"ref_62","unstructured":"Developers, O.R. (2022, March 13). ONNX Runtime. Version: 1.10.0. Available online: https:\/\/onnxruntime.ai\/."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Wang, Q., Gao, J., Lin, W., and Li, X. (2020). NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting. arXiv.","DOI":"10.1109\/TPAMI.2020.3013269"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Iqbal, S. (2021, January 27\u201330). A Study on UAV Operating System Security and Future Research Challenges. Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Online.","DOI":"10.1109\/CCWC51732.2021.9376151"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/10\/2288\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:08:31Z","timestamp":1760137711000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/10\/2288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,10]]},"references-count":64,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["rs14102288"],"URL":"https:\/\/doi.org\/10.3390\/rs14102288","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,10]]}}}