{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T03:11:46Z","timestamp":1780369906457,"version":"3.54.1"},"reference-count":50,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,5,6]],"date-time":"2021-05-06T00:00:00Z","timestamp":1620259200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008678","name":"Universit\u00e4t Leipzig","doi-asserted-by":"publisher","award":["VATDE141510383"],"award-info":[{"award-number":["VATDE141510383"]}],"id":[{"id":"10.13039\/501100008678","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The detection and localization of the ball in sport videos is crucial to better understand events and actions occurring in those sports. Despite recent advances in the field of object detection, the automatic detection of balls remains a challenging task due to the unsteady nature of balls in images. In this paper, we address the detection of small, fast-moving balls in sport video data and introduce a real-time ball detection approach based on the YOLOv3 object detection model. We apply specific adjustments to the network architecture and training process in order to enhance the detection accuracy and speed: We facilitate an efficient integration of motion information, avoiding a complex modification of the network architecture. Furthermore, we present a customized detection approach that is designed to primarily focus on the detection of small objects. We integrate domain-specific knowledge to adapt image pre-processing and a data augmentation strategy that takes advantage of the special features of balls in images in order to improve the generalization ability of the detection network. We demonstrate that the general trade-off between detection speed and accuracy of the YOLOv3 model can be enhanced in consideration of domain-specific prior knowledge.<\/jats:p>","DOI":"10.3390\/s21093214","type":"journal-article","created":{"date-parts":[[2021,5,6]],"date-time":"2021-05-06T11:10:27Z","timestamp":1620299427000},"page":"3214","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Enhancement of Speed and Accuracy Trade-Off for Sports Ball Detection in Videos\u2014Finding Fast Moving, Small Objects in Real Time"],"prefix":"10.3390","volume":"21","author":[{"given":"Alexander","family":"Hiemann","sequence":"first","affiliation":[{"name":"Institute of Computer Science, Leipzig University, 04109 Leipzig, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas","family":"Kautz","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), 91052 Erlangen, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tino","family":"Zottmann","sequence":"additional","affiliation":[{"name":"Media Seasons, 04105 Leipzig, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mario","family":"Hlawitschka","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Media, Leipzig University of Applied Sciences, 04277 Leipzig, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,5,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1007\/s12662-017-0487-7","article-title":"Sports Analytics","volume":"48","author":"Link","year":"2018","journal-title":"Ger. J. Exerc. Sport Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.cviu.2017.04.011","article-title":"Computer Vision for Sports: Current Applications and Research Topics","volume":"159","author":"Thomas","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Buri\u0107, M., Pobar, M., and Iva\u0161i\u0107-Kos, M. (2018, January 21\u201325). Object Detection in Sports Videos. Proceedings of the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia.","DOI":"10.23919\/MIPRO.2018.8400189"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"103910","DOI":"10.1016\/j.imavis.2020.103910","article-title":"Recent advances in small object detection based on deep learning: A review","volume":"97","author":"Tong","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1655","DOI":"10.1007\/s10462-017-9582-2","article-title":"Ball Tracking in Sports: A Survey","volume":"52","author":"Kamble","year":"2019","journal-title":"Artif. Intell. Rev."},{"key":"ref_6","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","article-title":"Object Detection with Deep Learning: A Review","volume":"30","author":"Zhao","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive Image Features from Scale-Invariant Keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_9","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of Oriented Gradients for Human Detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_10","unstructured":"Lienhart, R., and Maydt, J. (2002, January 22\u201325). An Extended Set of Haar-like Features for Rapid Object Detection. Proceedings of the International Conference on Image Processing, Rochester, NY, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support Vector Machine","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_12","first-page":"2","article-title":"Fast Multi-view Face Detection","volume":"Volume 3","author":"Jones","year":"2003","journal-title":"Mitsubishi Electric Research Lab TR-20003-96"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Soviany, P., and Ionescu, R.T. (2018, January 20\u201323). Optimizing the Trade-Off between Single-Stage and Two-Stage Deep Object Detectors using Image Difficulty Prediction. Proceedings of the 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania.","DOI":"10.1109\/SYNASC.2018.00041"},{"key":"ref_16","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_18","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_21","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., and Guadarrama, S. (2017, January 21\u201326). Speed\/Accuracy Trade-Offs for Modern Convolutional Object Detectors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.351"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Buric, M., Pobar, M., and Ivasic-Kos, M. (2018, January 12\u201314). Ball Detection using YOLO and Mask R-CNN. Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA.","DOI":"10.1109\/CSCI46756.2018.00068"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Buri\u0107, M., Pobar, M., and Iva\u0161i\u0107-Kos, M. (2019, January 19\u201321). Adapting YOLO Network for Ball and Player Detection. Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), Prague, Czech Republic.","DOI":"10.5220\/0007582008450851"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kisantal, M., Wojna, Z., Murawski, J., Naruniec, J., and Cho, K. (2019). Augmentation for Small Object Detection. arXiv.","DOI":"10.5121\/csit.2019.91713"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"27","DOI":"10.2352\/ISSN.2470-1173.2017.10.IMAWM-163","article-title":"Training Object Detection And Recognition CNN Models Using Data Augmentation","volume":"2017","author":"Montserrat","year":"2017","journal-title":"Electron. Imaging"},{"key":"ref_27","unstructured":"Weng, L. (2021, January 14). Object Detection Part 4: Fast Detection Models. Available online: lilianweng.github.io\/lil-log."},{"key":"ref_28","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis. IJCV"},{"key":"ref_31","unstructured":"Han, W., Khorrami, P., Paine, T.L., Ramachandran, P., Babaeizadeh, M., Shi, H., Li, J., Yan, S., and Huang, T.S. (2016). Seq-NMS for Video Object Detection. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hou, R., Chen, C., and Shah, M. (2017). An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos. arXiv.","DOI":"10.1109\/ICCV.2017.620"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hara, K., Kataoka, H., and Satoh, Y. (2017, January 22\u201329). Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.373"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D Convolutional Neural Networks for Human Action Recognition","volume":"35","author":"Ji","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xiao, F., and Jae Lee, Y. (2018, January 8\u201314). Video Object Detection with an Aligned Spatial-Temporal Memory. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_30"},{"key":"ref_36","unstructured":"Liu, M., and Zhu, M. (2018, January 18\u201322). Mobile Video Object Detection with Temporally-Aware Feature Maps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1023\/B:VISI.0000011205.11775.fd","article-title":"Lucas-Kanade 20 Years On: A Unifying Framework","volume":"56","author":"Baker","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., and Brox, T. (2015, January 13\u201316). FlowNet: Learning Optical Flow with Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.316"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., and Brox, T. (2017, January 21\u201326). FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.179"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Ranjan, A., and Black, M.J. (2017, January 21\u201326). Optical Flow Estimation Using a Spatial Pyramid Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.291"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1109\/TPAMI.2019.2894353","article-title":"Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation","volume":"42","author":"Sun","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhu, X., Xiong, Y., Dai, J., Yuan, L., and Wei, Y. (2017, January 21\u201326). Deep Feature Flow for Video Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.441"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhu, X., Wang, Y., Dai, J., Yuan, L., and Wei, Y. (2017, January 22\u201329). Flow-Guided Feature Aggregation for Video Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.52"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhu, X., Dai, J., Yuan, L., and Wei, Y. (2018, January 18\u201322). Towards High Performance Video Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00753"},{"key":"ref_45","unstructured":"(2021, January 23). NVIDIA TensorRT Developer Guide. Available online: https:\/\/docs.nvidia.com\/deeplearning\/tensorrt\/developer-guide\/index.html."},{"key":"ref_46","unstructured":"(2021, January 23). TensorFlow Lite Guide. Available online: https:\/\/www.tensorflow.org\/lite\/guide."},{"key":"ref_47","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_48","unstructured":"Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., and Zisserman, A. (2021, January 23). The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Available online: http:\/\/host.robots.ox.ac.uk\/pascal\/VOC\/voc2012\/."},{"key":"ref_49","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_50","unstructured":"Long, X., Deng, K., Wang, G., Zhang, Y., Dang, Q., Gao, Y., Shen, H., Ren, J., Han, S., and Ding, E. (2020). PP-YOLO: An Effective and Efficient Implementation of Object Detector. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3214\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:57:30Z","timestamp":1760162250000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/9\/3214"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,6]]},"references-count":50,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["s21093214"],"URL":"https:\/\/doi.org\/10.3390\/s21093214","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,6]]}}}