{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:50:59Z","timestamp":1765547459988,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T00:00:00Z","timestamp":1756252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The automated analysis of pool game videos presents significant challenges due to complex object interactions, precise rule requirements, and event-driven game dynamics that traditional computer vision approaches struggle to address effectively. This research introduces TCGA-Pool, a novel video analytics framework specifically designed for comprehensive 9-ball pool game understanding through advanced object attention mechanisms and temporal context modeling. Our approach addresses the critical gap in automated cue sports analysis by focusing on three essential classification tasks: Clear shot detection (successful ball potting without fouls), win condition identification (game-ending scenarios), and potted balls counting (accurate enumeration of successfully pocketed balls). The proposed framework leverages a Temporal Context Gated Attention (TCGA) mechanism that dynamically focuses on salient game elements while incorporating sequential dependencies inherent in pool game sequences. Through comprehensive evaluation on a dataset comprising 58,078 annotated video frames from diverse 9-ball pool scenarios, our TCGA-Pool framework demonstrates substantial improvements over existing video analysis methods, achieving accuracy gains of 4.7%, 3.2%, and 6.2% for clear shot detection, win condition identification, and potted ball counting tasks, respectively. The framework maintains computational efficiency with only 27.3 M parameters and 13.9 G FLOPs, making it suitable for real-time applications. Our contributions include the introduction of domain-specific object attention mechanisms, the development of adaptive temporal modeling strategies for cue sports, and the implementation of a practical real-time system for automated pool game monitoring. This work establishes a foundation for intelligent sports analytics in precision-based games and demonstrates the effectiveness of specialized deep learning approaches for complex temporal video understanding tasks.<\/jats:p>","DOI":"10.3390\/computers14090352","type":"journal-article","created":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T09:58:32Z","timestamp":1756288712000},"page":"352","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Attention-Pool: 9-Ball Game Video Analytics with Object Attention and Temporal Context Gated Attention"],"prefix":"10.3390","volume":"14","author":[{"given":"Anni","family":"Zheng","sequence":"first","affiliation":[{"name":"Department of Computer and Information Sciences, Auckland University of Technology, Auckland 1142, New Zealand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7443-3285","authenticated-orcid":false,"given":"Wei Qi","family":"Yan","sequence":"additional","affiliation":[{"name":"Department of Computer and Information Sciences, Auckland University of Technology, Auckland 1142, New Zealand"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3D convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_2","first-page":"6275","article-title":"Prediction and analysis of sphere motion trajectory based on deep learning algorithm optimization","volume":"37","author":"Liang","year":"2019","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_3","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems 27, Montreal, QC, Canada."},{"key":"ref_4","first-page":"23","article-title":"Pool game analysis using computer vision techniques","volume":"115","author":"Huang","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_5","unstructured":"Li, K., He, Y., Wang, Y., Li, Y., Wang, W., Luo, P., Wang, Y., Wang, L., and Qiao, Y. (2023, January 17\u201324). VideoChat: Chat-centric video understanding. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada."},{"key":"ref_6","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, QC, Canada."},{"key":"ref_7","unstructured":"Siddiqui, M.H., and Ahmad, I. (2019, January 22\u201325). Automated billiard ball tracking and event detection. Proceedings of the International Conference on Image Processing, Taipei, Taiwan."},{"key":"ref_8","unstructured":"Lin, J., Gan, C., and Han, S. (November, January 27). TSM: Temporal shift module for efficient video understanding. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"3844770","DOI":"10.1155\/2022\/3844770","article-title":"Video analysis in sports by lightweight object detection network under the background of sports industry development","volume":"2022","author":"Zheng","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Naik, B.T., Hashmi, M.F., and Bokde, N.D. (2022). A comprehensive review of computer vision in sports: Open issues, future trends and research directions. Appl. Sci., 12.","DOI":"10.3390\/app12094429"},{"key":"ref_13","first-page":"987","article-title":"A survey of video based action recognition in sports","volume":"11","author":"Rahmad","year":"2018","journal-title":"Indones. J. Electr. Eng. Comput. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"100494","DOI":"10.1016\/j.iot.2021.100494","article-title":"A real-time tennis level evaluation and strokes classification system based on the Internet of Things","volume":"17","author":"Wu","year":"2022","journal-title":"Internet Things"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"796","DOI":"10.1109\/TIP.2003.812758","article-title":"Automatic soccer video analysis and summarization","volume":"12","author":"Ekin","year":"2003","journal-title":"IEEE Trans. Image Process."},{"key":"ref_16","unstructured":"Yoon, S., Rameau, F., Kim, J., Lee, S., Kang, S., and Kweon, I.S. (2019, January 15\u201320). Online detection of action start in untrimmed, streaming videos. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/j.proenv.2011.12.074","article-title":"A billiards track and score recording system by RFID trigger","volume":"11","author":"Tang","year":"2011","journal-title":"Procedia Environ. Sci."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Cioppa, A., Deli\u00e8ge, A., Giancola, S., Ghanem, B., Van Droogenbroeck, M., Gade, R., and Moeslund, T.B. (2021, January 19\u201325). Camera calibration and player localization in soccernet-v2 and investigation of their representations for action spotting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00511"},{"key":"ref_19","first-page":"1","article-title":"Artificial intelligence: A survey on evolution, models, applications and future trends","volume":"6","author":"Lu","year":"2019","journal-title":"J. Manag. Anal."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nie, B.X., Wei, P., and Zhu, S.C. (2017, January 22\u201329). Monocular 3d human pose estimation by predicting depth on joints. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.373"},{"key":"ref_21","first-page":"1","article-title":"Billiards sports analytics: Datasets and tasks","volume":"18","author":"Zhang","year":"2025","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Teachabarikiti, K., Chalidabhongse, T.H., and Thammano, A. (2010, January 7\u201310). Players tracking and ball detection for an automatic tennis video annotation. Proceedings of the 2010 11th International Conference on Control Automation Robotics & Vision, Singapore.","DOI":"10.1109\/ICARCV.2010.5707906"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Song, H., Wang, W., Zhao, S., Shen, J., and Lam, K.M. (2018, January 2\u20137). Exploring temporal preservation networks for precise temporal action localization. Proceedings of the AAAI Conference on Artificial Intelligence 32, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12234"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_26","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Advances in Neural Information Processing Systems 30, Long Beach, CA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Herzig, R., Ben-Avraham, E., Mangalam, K., Bar, A., Chechik, G., Rohrbach, A., Darrell, T., and Globerson, A. (2022, January 18\u201324). Object-region video transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00315"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.cviu.2017.04.011","article-title":"Computer vision in sports: A survey","volume":"159","author":"Thomas","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_30","unstructured":"Xu, M., Orwell, J., Lowey, L., and Thirde, D. (2001, January 22\u201325). Algorithms and system for segmentation and structure analysis in soccer video. Proceedings of the IEEE International Conference on Multimedia and Expo, Tokyo, Japan."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Li, F. (2014, January 3\u201328). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., and Lu, H. (2021, January 20\u201325). Transformer tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Ashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_34","first-page":"19","article-title":"3D reconstruction system and multiobject local tracking algorithm designed for billiards","volume":"53","author":"Palomares","year":"2023","journal-title":"Appl. Intell."},{"key":"ref_35","unstructured":"Faizan, A., and Mansoor, A.B. (2008, January 23\u201324). Computer vision based automatic scoring of shooting targets. Proceedings of the 2008 IEEE International Multitopic Conference, Karachi, Pakistan."},{"key":"ref_36","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). Slow fast networks for video recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., and Darrell, T. (2015, January 7\u201312). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.21236\/ADA623249"},{"key":"ref_38","first-page":"283","article-title":"Attention mechanisms in computer vision: A survey","volume":"7","author":"Liu","year":"2021","journal-title":"Comput. Vis. Media"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lu\u010di\u0107, M., and Schmid, C. (2021, January 10\u201317). ViVit: A video vision transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_40","first-page":"1","article-title":"Deep learning for sports analytics: A survey","volume":"55","author":"Wang","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_41","unstructured":"Zhang, Y., Yao, L., Xu, M., Qiao, Y., and Liu, Q. (2023). Video understanding with large language models: A survey. arXiv."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/9\/352\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:33:35Z","timestamp":1760034815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/9\/352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,27]]},"references-count":41,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["computers14090352"],"URL":"https:\/\/doi.org\/10.3390\/computers14090352","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,8,27]]}}}