{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T00:10:27Z","timestamp":1778112627528,"version":"3.51.4"},"reference-count":102,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T00:00:00Z","timestamp":1773619200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T00:00:00Z","timestamp":1775606400000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001215","name":"La Trobe University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001215","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>This review provides an in-depth exploration of the field of animal action recognition, focusing on coarse-grained (CG) and fine-grained (FG) techniques. The primary aim is to examine the current state of research in animal behaviour recognition and to elucidate the unique challenges associated with recognising subtle animal actions in outdoor environments. These challenges differ significantly from those encountered in human action recognition due to factors such as non-rigid body structures, frequent occlusions, and the lack of large-scale, annotated datasets. This review underscores the critical differences between human and animal action recognition. While inspired by progress in the human domain, animal action recognition presents unique challenges due to high intra-species variability, complex environmental interactions, and unstructured datasets that human-centric models cannot fully address. Recent multimodal frameworks such as ARTEMIS and MSQNet exemplify state-of-the-art progress by integrating textual cues derived from video with visual and audio modalities. When considered alongside established spatio-temporal architectures like SlowFast, these developments signal a shift toward richer multimodal paradigms in behaviour analysis. By assessing the strengths and weaknesses of current methodologies and introducing a recently published dataset, the review outlines future directions for advancing fine-grained action recognition, aiming to improve accuracy and generalisability in behaviour analysis across species. This review extends beyond earlier reviews by offering the first systematic treatment of coarse-grained (CG) and fine-grained (FG) action recognition in animals.<\/jats:p>","DOI":"10.1007\/s10462-026-11526-5","type":"journal-article","created":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T21:54:13Z","timestamp":1773698053000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A review on vision-centric coarse to fine-grained animal action recognition"],"prefix":"10.1007","volume":"59","author":[{"given":"Ali","family":"Zia","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Renuka","family":"Sharma","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdelwahed","family":"Khamis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Usman","family":"Ali","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuesong","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Husnain","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Numan","family":"Shafi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saeed","family":"Anwar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Imran","family":"Raza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad Hasan","family":"Jamal","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sabine","family":"Schmoelzl","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric","family":"Stone","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lars","family":"Petersson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vivien","family":"Rolland","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,3,16]]},"reference":[{"key":"11526_CR151","unstructured":"Aharon N, Orfaig R, Bobrovsky BZ. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv"},{"key":"11526_CR1","doi-asserted-by":"crossref","unstructured":"Alfasly S, Lu J, Xu C, Li Y, Zou Y (2024) Auxiliary audio\u2013textual modalities for better action recognition on vision-specific annotated videos. Pattern Recogn 156:110808","DOI":"10.1016\/j.patcog.2024.110808"},{"key":"11526_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107353","volume":"104","author":"AM Atto","year":"2020","unstructured":"Atto AM, Benoit A, Lambert P (2020) Timed-image based deep learning for action recognition in video sequences. Pattern Recogn 104:107353","journal-title":"Pattern Recogn"},{"issue":"46","key":"11526_CR4","doi-asserted-by":"publisher","first-page":"eabi4883","DOI":"10.1126\/sciadv.abi4883","volume":"7","author":"M Bain","year":"2021","unstructured":"Bain M, Nagrani A et al (2021) Automated audiovisual behaviour recognition in wild primates. Sci Adv 7(46):eabi4883","journal-title":"Sci Adv"},{"key":"11526_CR5","doi-asserted-by":"crossref","unstructured":"Beery S, Van Horn G, Perona P (2018) Recognition in Terra Incognita. In: Proceedings of the European Conference on Computer Vision (ECCV); p. 456\u2013473","DOI":"10.1007\/978-3-030-01270-0_28"},{"key":"11526_CR6","doi-asserted-by":"crossref","unstructured":"Behera A, Wharton Z, P G Hewage PR, Bera A (2021) Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification. AAAI Conference on Artificial Intelligence","DOI":"10.1609\/aaai.v35i2.16176"},{"key":"11526_CR7","doi-asserted-by":"crossref","unstructured":"Bendale A, Boult TE (2016) Towards open set deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; p. 1563\u20131572","DOI":"10.1109\/CVPR.2016.173"},{"key":"11526_CR8","doi-asserted-by":"publisher","first-page":"6017","DOI":"10.1109\/TIP.2022.3205215","volume":"31","author":"A Bera","year":"2022","unstructured":"Bera A, Wharton Z, Liu Y, Bessis N, Behera A (2022) SR-GNN: spatial relation-aware graph neural network for fine-grained image categorization. IEEE Trans Image Process 31:6017\u20136031","journal-title":"IEEE Trans Image Process"},{"issue":"2","key":"11526_CR9","doi-asserted-by":"publisher","first-page":"2873","DOI":"10.1007\/s11227-021-03957-4","volume":"78","author":"M Bilal","year":"2021","unstructured":"Bilal M, Maqsood M, Yasmin S, Hasan N, Rho S (2021) A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes. J Supercomput 78(2):2873\u20132908","journal-title":"J Supercomput"},{"issue":"2","key":"11526_CR10","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1007\/s11263-022-01716-3","volume":"131","author":"S Broom\u00e9","year":"2023","unstructured":"Broom\u00e9 S, Feighelstein M, Zamansky A, Carreira Lencioni G, Haubro Andersen P, Pessanha F et al (2023) Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions. Int J Comput Vis 131(2):572\u2013590","journal-title":"Int J Comput Vis"},{"key":"11526_CR150","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A (2020) End-to-End Object Detection with Transformers. Proc European Conference on Computer Vision (ECCV) 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"11526_CR11","doi-asserted-by":"crossref","unstructured":"Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR; p. 6299\u20136308","DOI":"10.1109\/CVPR.2017.502"},{"key":"11526_CR12","doi-asserted-by":"crossref","unstructured":"Chen J, Hu M, Coker DJ, Berumen ML, Costelloe B, Beery S, et al (2023) MammalNet: a large-scale video benchmark for mammal recognition and behavior understanding. In: CVPR; p. 13052\u201313061","DOI":"10.1109\/CVPR52729.2023.01254"},{"key":"11526_CR13","doi-asserted-by":"publisher","first-page":"RP86873","DOI":"10.7554\/eLife.86873","volume":"12","author":"N Desai","year":"2023","unstructured":"Desai N, Bala P, Richardson R, Raper J, Zimmermann J, Hayden B (2023) OpenApePose, a database of annotated ape photographs for pose estimation. Elife 12:RP86873","journal-title":"Elife"},{"key":"11526_CR14","doi-asserted-by":"publisher","DOI":"10.32604\/cmc.2024.049512","author":"A Dey","year":"2024","unstructured":"Dey A, Biswas S et al (2024) Workout action recognition in video streams using an attention driven residual DC-GRU network. Comput Mater Continua. https:\/\/doi.org\/10.32604\/cmc.2024.049512","journal-title":"Comput Mater Continua"},{"key":"11526_CR15","doi-asserted-by":"publisher","first-page":"6578","DOI":"10.1007\/s11263-025-02532-1","volume":"133","author":"I Duporge","year":"2025","unstructured":"Duporge I, Kholiavchenko M, Harel R, Wolf S, Rubenstein DI, Crofoot MC et al (2025) BaboonLand dataset: tracking primates in the wild and automating behaviour recognition from drone videos. Int J Comput Vis 133:6578\u20136589. https:\/\/doi.org\/10.1007\/s11263-025-02532-1","journal-title":"Int J Comput Vis"},{"key":"11526_CR16","doi-asserted-by":"crossref","unstructured":"Farha YA, Gall J (2019) MS-TCN: Multi-stage temporal convolutional network for action segmentation. In: CVPR; p. 3575\u20133584","DOI":"10.1109\/CVPR.2019.00369"},{"issue":"5","key":"11526_CR17","doi-asserted-by":"publisher","first-page":"1985","DOI":"10.1007\/s13042-023-02009-y","volume":"15","author":"E Fazzari","year":"2024","unstructured":"Fazzari E, Carrara F, Falchi F, Stefanini C, Romano D (2024) Using AI to decode the behavioral responses of an insect to chemical stimuli: towards machine-animal computational technologies. Int J Mach Learn Cybern 15(5):1985\u20131994","journal-title":"Int J Mach Learn Cybern"},{"key":"11526_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2025.128330","volume":"289","author":"E Fazzari","year":"2025","unstructured":"Fazzari E, Romano D, Falchi F, Stefanini C (2025) Animal behavior analysis methods using deep learning: A survey. Expert Syst Appl 289:128330. https:\/\/doi.org\/10.1016\/j.eswa.2025.128330","journal-title":"Expert Syst Appl"},{"key":"11526_CR19","doi-asserted-by":"publisher","first-page":"5877","DOI":"10.1007\/s13042-025-02602-3","volume":"21","author":"E Fazzari","year":"2025","unstructured":"Fazzari E, Romano D, Falchi F, Stefanini C (2025) ARTEMIS: animal recognition through enhanced multimodal integration system. Int J Mach Learn Cybern 21:5877\u20135892. https:\/\/doi.org\/10.1007\/s13042-025-02602-3","journal-title":"Int J Mach Learn Cybern"},{"key":"11526_CR20","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecoinf.2024.102955","volume":"85","author":"E Fazzari","year":"2025","unstructured":"Fazzari E, Romano D, Falchi F, Stefanini C (2025) Selective state models are what you need for animal action recognition. Eco Inform 85:102955. https:\/\/doi.org\/10.1016\/j.ecoinf.2024.102955","journal-title":"Eco Inform"},{"key":"11526_CR21","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C (2020) X3d: Expanding architectures for efficient video recognition. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition; p. 203\u2013213","DOI":"10.1109\/CVPR42600.2020.00028"},{"key":"11526_CR22","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: CVPR; p. 6202\u20136211","DOI":"10.1109\/ICCV.2019.00630"},{"issue":"2","key":"11526_CR23","doi-asserted-by":"publisher","first-page":"485","DOI":"10.3390\/ani11020485","volume":"11","author":"L Feng","year":"2021","unstructured":"Feng L, Zhao Y, Sun Y, Zhao W, Tang J (2021) Action recognition using a spatial-temporal network for wild felines. Animals 11(2):485","journal-title":"Animals"},{"key":"11526_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.applanim.2023.106099","volume":"269","author":"J Feng","year":"2023","unstructured":"Feng J, Luo H, Fang D (2023) A progressive deep learning framework for fine-grained primate behaviour recognition. Appl Anim Behav Sci 269:106099","journal-title":"Appl Anim Behav Sci"},{"key":"11526_CR25","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1016\/j.compag.2018.04.017","volume":"150","author":"ES Fogarty","year":"2018","unstructured":"Fogarty ES, Swain DL, Cronin G, Trotter M (2018) Autonomous on-animal sensors in sheep research: A systematic review. Comput Electron Agric 150:245\u2013256","journal-title":"Comput Electron Agric"},{"issue":"10","key":"11526_CR26","doi-asserted-by":"publisher","first-page":"6668","DOI":"10.1007\/s11263-025-02484-6","volume":"133","author":"M Fuchs","year":"2025","unstructured":"Fuchs M, Genty E, Bangerter A, Zuberb\u00fchler K, Odobez JM, Cotofrei P (2025) From forest to zoo: great ape behavior recognition with ChimpBehave. Int J Comput Vis 133(10):6668\u20136688","journal-title":"Int J Comput Vis"},{"issue":"12","key":"11526_CR27","doi-asserted-by":"publisher","first-page":"2020","DOI":"10.3390\/ani13122020","volume":"13","author":"A Fuentes","year":"2023","unstructured":"Fuentes A, Han S, Nasir MF, Park J, Yoon S, Park DS (2023) Multiview monitoring of individual cattle behavior based on action recognition in closed barns using deep learning. Animals 13(12):2020","journal-title":"Animals"},{"key":"11526_CR28","unstructured":"Gagne C, Kini J, Smith D, Shah M (2021) Florida wildlife camera trap dataset. arXiv preprint arXiv:2106.12628"},{"key":"11526_CR29","unstructured":"Gupta N, Shesh, Brown BN (2022) Adjusting for bias with procedural data. arXiv e-prints"},{"key":"11526_CR30","unstructured":"Hacker L, Bartels F, Martin PE (2023) Fine-grained action detection with RGB and pose information using two stream convolutional networks. In: MediaEval 2022 Workshop; p. 1\u20136"},{"issue":"1","key":"11526_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s42256-023-00776-5","volume":"6","author":"Y Han","year":"2024","unstructured":"Han Y, Chen K et al (2024) Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework. Nat Mach Intell 6(1):1\u201314","journal-title":"Nat Mach Intell"},{"key":"11526_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107267","volume":"103","author":"T Han","year":"2020","unstructured":"Han T, Yao H, Xie W, Sun X, Zhao S, Yu J (2020) TVENet: Temporal variance embedding network for fine-grained action representation. Pattern Recogn 103:107267","journal-title":"Pattern Recogn"},{"key":"11526_CR33","doi-asserted-by":"publisher","first-page":"5188","DOI":"10.1038\/s41467-021-25420-x","volume":"12","author":"AI Hsu","year":"2021","unstructured":"Hsu AI, Yttri EA (2021) B-SOiD, an open-source unsupervised algorithm for identification and fast prediction of behaviors. Nat Commun 12:5188. https:\/\/doi.org\/10.1038\/s41467-021-25420-x","journal-title":"Nat Commun"},{"key":"11526_CR34","doi-asserted-by":"publisher","DOI":"10.1080\/21642583.2022.2062479","author":"J Hu","year":"2022","unstructured":"Hu J, Wang Y, Cheng S, Liu J, Kang J, Yang W (2022) SFGNet detecting objects via spatial fine-grained feature and enhanced RPN with spatial context. SIViP. https:\/\/doi.org\/10.1080\/21642583.2022.2062479","journal-title":"SIViP"},{"issue":"1","key":"11526_CR35","doi-asserted-by":"publisher","first-page":"2784","DOI":"10.1038\/s41467-021-22970-y","volume":"12","author":"K Huang","year":"2021","unstructured":"Huang K, Han Y, Chen K, Pan H, Zhao G, Yi W et al (2021) A hierarchical 3D-motion learning framework for animal spontaneous behavior mapping. Nat Commun 12(1):2784","journal-title":"Nat Commun"},{"key":"11526_CR36","unstructured":"Jiang Y, Chen H, Ko H (2023) Spatial-temporal transformer-guided diffusion based data augmentation for efficient skeleton-based action recognition. arXiv preprint arXiv:2302.13434"},{"key":"11526_CR37","doi-asserted-by":"crossref","unstructured":"Joska D, Clark L, Muramatsu N, Jericevich R, Nicolls F, Mathis A (2021) Acinoset: a 3d pose estimation dataset and baseline models for cheetahs in the wild. In: IEEE international conference on robotics and automation (ICRA). IEEE 2021:13901\u201313908","DOI":"10.1109\/ICRA48506.2021.9561338"},{"issue":"1","key":"11526_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3041960","volume":"50","author":"A Jukan","year":"2017","unstructured":"Jukan A, Masip-Bruin X, Amla N (2017) Smart computing and sensing technologies for animal welfare: A systematic review. ACM Comput Surv 50(1):1\u201327","journal-title":"ACM Comput Surv"},{"key":"11526_CR39","doi-asserted-by":"crossref","unstructured":"Kamminga JW, Meratnia N, Havinga PJ (2019) Dataset: Horse movement data and analysis of its potential for activity recognition. In: Proceedings of the 2nd workshop on data acquisition to analysis; p. 22\u201325","DOI":"10.1145\/3359427.3361908"},{"key":"11526_CR40","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/j.neucom.2021.10.126","volume":"491","author":"N Kleanthous","year":"2022","unstructured":"Kleanthous N, Hussain AJ, Khan W, Sneddon J, Al-Shamma\u2019a A, Liatsis P (2022) A survey of machine learning approaches in animal behaviour. Neurocomputing 491:442\u2013463","journal-title":"Neurocomputing"},{"key":"11526_CR41","unstructured":"Koh PW, Sagawa S, Marklund H, Xie SM, Zhang M, Balsubramani A, et al (2021) WILDS: A benchmark of in-the-wild distribution shifts. In: Proceedings of the International Conference on Machine Learning (ICML); p. 5637\u20135664"},{"issue":"1","key":"11526_CR42","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1007\/s42761-021-00099-x","volume":"3","author":"ME Kret","year":"2022","unstructured":"Kret ME, Massen JJM, de Waal FBM (2022) My fear is not, and never will be, your fear: on emotions and feelings in animals. Affect Sci 3(1):182\u2013189","journal-title":"Affect Sci"},{"issue":"4","key":"11526_CR43","doi-asserted-by":"publisher","first-page":"496","DOI":"10.1038\/s41592-022-01443-0","volume":"19","author":"J Lauer","year":"2022","unstructured":"Lauer J, Zhou M et al (2022) Multi-animal pose estimation, identification and tracking with DeepLabCut. Nat Methods 19(4):496\u2013504","journal-title":"Nat Methods"},{"key":"11526_CR44","doi-asserted-by":"crossref","unstructured":"Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR; p. 156\u2013165","DOI":"10.1109\/CVPR.2017.113"},{"key":"11526_CR45","doi-asserted-by":"crossref","unstructured":"Li T, Foo LG, Ke Q, Rahmani H, Wang A, Wang J, et al (2022) Dynamic spatio-temporal specialization learning for fine-grained action recognition. In: ECCV; p. 386\u2013403","DOI":"10.1007\/978-3-031-19772-7_23"},{"key":"11526_CR46","unstructured":"Li W, Swetha S, Shah M. Wildlife Action Recognition Using Deep Learning. Center for Research in Computer Vision (CRCV), University of Central Florida; 2020. Available from: https:\/\/www.crcv.ucf.edu\/wp-content\/uploads\/2018\/11\/Weining_L_Report.pdf"},{"key":"11526_CR47","doi-asserted-by":"crossref","unstructured":"Li Y, Wu CY, Fan H, Mangalam K, Xiong B, Malik J, et al (2022) Mvitv 2: Improved multiscale vision transformers for classification and detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition; p. 4804\u20134814","DOI":"10.1109\/CVPR52688.2022.00476"},{"issue":"10","key":"11526_CR48","doi-asserted-by":"publisher","first-page":"4983","DOI":"10.1109\/JBHI.2023.3299321","volume":"27","author":"Y Li","year":"2023","unstructured":"Li Y, Xia T, Luo H, He B, Jia F (2023) MT-FiST: a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition. IEEE J Biomed Health Inform 27(10):4983\u20134994","journal-title":"IEEE J Biomed Health Inform"},{"key":"11526_CR49","doi-asserted-by":"crossref","unstructured":"Liang Y, Xue F, Chen X, Wu Z, Chen X (2018) A benchmark for action recognition of large animals. In: International Conference on Digital Home; p. 64\u201371","DOI":"10.1109\/ICDH.2018.00020"},{"key":"11526_CR50","doi-asserted-by":"crossref","unstructured":"Lin W, Mirza MJ, Kozinski M, Possegger H, Kuehne H, Bischof H (2023) Video test-time adaptation for action recognition. In: CVPR; p. 22952\u201322961","DOI":"10.1109\/CVPR52729.2023.02198"},{"key":"11526_CR51","doi-asserted-by":"crossref","unstructured":"Liu D, Li Q, Dinh AD, Jiang T, Shah M, Xu C (2023) Diffusion action segmentation. In: CVPR; p. 10139\u201310149","DOI":"10.1109\/ICCV51070.2023.00930"},{"issue":"21","key":"11526_CR52","doi-asserted-by":"publisher","first-page":"4506","DOI":"10.3390\/electronics12214506","volume":"12","author":"W Lu","year":"2023","unstructured":"Lu W, Zhao Y, Wang J, Zheng Z, Feng L, Tang J (2023) Mammalclub: an annotated wild mammal dataset for species recognition, individual identification, and behavior recognition. Electronics 12(21):4506","journal-title":"Electronics"},{"issue":"1","key":"11526_CR53","doi-asserted-by":"publisher","first-page":"1267","DOI":"10.1038\/s42003-022-04080-7","volume":"5","author":"K Luxem","year":"2022","unstructured":"Luxem K, Mocellin P, Fuhrmann F, K\u00fcrsch J, Remy S, Bauer P (2022) Identifying behavioral structure from deep variational embeddings of animal motion. Commun Biol 5(1):1267. https:\/\/doi.org\/10.1038\/s42003-022-04080-7","journal-title":"Commun Biol"},{"key":"11526_CR54","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-19105-0","author":"T Maekawa","year":"2020","unstructured":"Maekawa T, Ohara K et al (2020) Deep learning-assisted comparative analysis of animal trajectories with DeepHL. Nat Commun. https:\/\/doi.org\/10.1038\/s41467-020-19105-0","journal-title":"Nat Commun"},{"key":"11526_CR55","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2023.108043","volume":"211","author":"A Mao","year":"2023","unstructured":"Mao A, Huang E, Wang X, Liu K (2023) Deep learning-based animal activity recognition with wearable sensors: overview, challenges, and future directions. Comput Electron Agric 211:108043","journal-title":"Comput Electron Agric"},{"issue":"21","key":"11526_CR56","doi-asserted-by":"publisher","first-page":"6074","DOI":"10.3390\/s20216074","volume":"20","author":"F Marin","year":"2020","unstructured":"Marin F (2020) Human and animal motion tracking using inertial sensors. Sensors 20(21):6074","journal-title":"Sensors"},{"key":"11526_CR57","doi-asserted-by":"crossref","unstructured":"Mondal A, Nag S, Prada JM, Zhu X, Dutta A (2023) Actor-agnostic multi-label action recognition with multi-modal query. In: Proceedings of the IEEE\/CVF international conference on computer vision; p. 784\u2013794","DOI":"10.1109\/ICCVW60793.2023.00086"},{"issue":"1","key":"11526_CR58","doi-asserted-by":"publisher","DOI":"10.1093\/conphys\/coab044","volume":"9","author":"RN Moraes","year":"2021","unstructured":"Moraes RN, Laske TG, Leimgruber P, Stabach JA, Marinari PE, Horning MM et al (2021) Inside out: heart rate monitoring to advance the welfare and conservation of maned wolves (Chrysocyon brachyurus). Conserv Physiol 9(1):coab044","journal-title":"Conserv Physiol"},{"key":"11526_CR59","doi-asserted-by":"crossref","unstructured":"Nag S, Zhu X, Deng J, Song YZ, Xiang T (2023) Difftad: Temporal action detection with proposal denoising diffusion. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision; p. 10362\u201310374","DOI":"10.1109\/ICCV51070.2023.00951"},{"key":"11526_CR60","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.livsci.2017.05.014","volume":"202","author":"A Nasirahmadi","year":"2017","unstructured":"Nasirahmadi A, Edwards SA, Sturm B (2017) Implementation of machine vision for detecting behaviour of cattle and pigs. Livest Sci 202:25\u201338","journal-title":"Livest Sci"},{"key":"11526_CR61","doi-asserted-by":"crossref","unstructured":"Ng XL, Ong KE, Zheng Q, Ni Y, Yeo SY, Liu J (2022) Animal kingdom: a large and diverse dataset for animal behavior understanding. In: CVPR; p. 19023\u201319034","DOI":"10.1109\/CVPR52688.2022.01844"},{"key":"11526_CR62","doi-asserted-by":"crossref","unstructured":"Nguyen C, Wang D, Von Richter K, Valencia P, Alvarenga FA, Bishop-Hurley G (2021) Video-based cattle identification and action recognition. In: Digital Image Computing: Techniques and Applications; p. 01\u201305","DOI":"10.1109\/DICTA52665.2021.9647417"},{"key":"11526_CR63","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.patcog.2018.02.001","volume":"79","author":"C Panagiotakis","year":"2018","unstructured":"Panagiotakis C, Papoutsakis K, Argyros A (2018) A graph-based approach for detecting common actions in motion capture data and videos. Pattern Recogn 79:1\u201311","journal-title":"Pattern Recogn"},{"key":"11526_CR64","doi-asserted-by":"crossref","unstructured":"Pandurangan S, Papandrea M, Gelsomini M (2023) Fine-grained human activity recognition - a new paradigm. In: Proceedings of iWOAR; p. 1\u20138","DOI":"10.1145\/3558884.3558893"},{"key":"11526_CR65","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.110188","volume":"148","author":"H Qiu","year":"2024","unstructured":"Qiu H, Hou B (2024) Multi-grained clip focus for skeleton-based action recognition. Pattern Recogn 148:110188","journal-title":"Pattern Recogn"},{"key":"11526_CR66","doi-asserted-by":"publisher","DOI":"10.1016\/j.atech.2025.101539","volume":"12","author":"RA Rajagukguk","year":"2025","unstructured":"Rajagukguk RA, Sy L, Jy P, Daniel KF, Cr L, Chen Z et al (2025) Deep learning for visual animal monitoring (detection, tracking, pose estimation, and behavior classification): a comprehensive review. Smart Agric Technol 12:101539","journal-title":"Smart Agric Technol"},{"key":"11526_CR67","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2020.100289","volume":"38","author":"PC Ravoor","year":"2020","unstructured":"Ravoor PC, Sudarshan T (2020) Deep learning methods for multi-species animal re-identification and tracking-a survey. Comput Sci Rev 38:100289","journal-title":"Comput Sci Rev"},{"key":"11526_CR68","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1016\/j.applanim.2014.11.018","volume":"163","author":"CA Richardson","year":"2015","unstructured":"Richardson CA (2015) The power of automated behavioural homecage technologies in characterizing disease progression in laboratory mice: A review. Appl Anim Behav Sci 163:19\u201327","journal-title":"Appl Anim Behav Sci"},{"key":"11526_CR69","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecoinf.2024.102893","volume":"84","author":"LS Saoud","year":"2024","unstructured":"Saoud LS, Sultan A, Elmezain M, Heshmat M, Seneviratne L, Hussain I (2024) Beyond observation: deep learning for animal behavior and ecological conservation. Ecol Informatics 84:102893","journal-title":"Ecol Informatics"},{"issue":"2","key":"11526_CR70","doi-asserted-by":"publisher","first-page":"514","DOI":"10.3390\/app14020514","volume":"14","author":"F Schindler","year":"2024","unstructured":"Schindler F, Steinhage V, van Beeck CS, Heurich M (2024) Action detection for wildlife monitoring with camera traps based on segmentation with filtering of tracklets (SWIFT) and mask-guided action recognition (MAROON). Appl Sci 14(2):514","journal-title":"Appl Sci"},{"issue":"9","key":"11526_CR71","doi-asserted-by":"publisher","first-page":"2660","DOI":"10.3390\/ani11092660","volume":"11","author":"L Schmeling","year":"2021","unstructured":"Schmeling L, Elmamooz G, Hoang PT, Kozar A, Nicklas D, S\u00fcnkel M et al (2021) Training and validating a machine learning model for the sensor-based monitoring of lying behavior in dairy cows on pasture and in the barn. Animals 11(9):2660. https:\/\/doi.org\/10.3390\/ani11092660","journal-title":"Animals"},{"key":"11526_CR72","doi-asserted-by":"publisher","first-page":"328","DOI":"10.1016\/j.eswa.2018.09.022","volume":"116","author":"Y Seo","year":"2019","unstructured":"Seo Y, Shin KS (2019) Hierarchical convolutional neural networks for fashion image classification. Expert Syst Appl 116:328\u2013339","journal-title":"Expert Syst Appl"},{"issue":"10","key":"11526_CR73","doi-asserted-by":"publisher","first-page":"5499","DOI":"10.1007\/s00521-023-09186-5","volume":"36","author":"MB Shaikh","year":"2024","unstructured":"Shaikh MB, Chai D, Islam SMS, Akhtar N (2024) Multimodal fusion for audio-image and video action recognition. Neural Comput Appl 36(10):5499\u20135513","journal-title":"Neural Comput Appl"},{"key":"11526_CR74","doi-asserted-by":"crossref","unstructured":"Spampinato C, Chen-Burger YH, Nadarajan G, Fisher RB (2008) Detecting, tracking and counting fish in low quality unconstrained underwater videos. In: International Conference on Computer Vision Theory and Applications. vol. 2; p. 514\u2013519","DOI":"10.5220\/0001077705140519"},{"issue":"1","key":"11526_CR75","doi-asserted-by":"publisher","first-page":"14351","DOI":"10.1038\/srep14351","volume":"5","author":"U Stern","year":"2015","unstructured":"Stern U, He R, Yang CH (2015) Analyzing animal behavior via classifying each video frame using convolutional neural networks. Sci Rep 5(1):14351","journal-title":"Sci Rep"},{"issue":"3","key":"11526_CR76","doi-asserted-by":"publisher","first-page":"368","DOI":"10.1111\/2041-210X.13103","volume":"10","author":"D Stowell","year":"2019","unstructured":"Stowell D, Wood M, Stylianou Y, Glotin H (2019) Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge. Methods Ecol Evol 10(3):368\u2013380","journal-title":"Methods Ecol Evol"},{"key":"11526_CR77","unstructured":"Sun JJ, Karigo T, Chakraborty D, Mohanty SP, Wild B, Sun Q, et al (2021) The multi-agent behavior dataset: mouse dyadic social interactions. arXiv preprint arXiv:2104.02710"},{"key":"11526_CR78","doi-asserted-by":"crossref","unstructured":"Sun JJ, Zhou H, Zhao L, Yuan L, Seybold B, Hendon D, et al (2024) Video foundation models for animal behavior analysis. bioRxiv. p. 2024-07","DOI":"10.1101\/2024.07.30.605655"},{"key":"11526_CR79","doi-asserted-by":"crossref","unstructured":"Sun B, Ye X, Yan T, Wang Z, Li H, Wang Z (2022) Fine-grained action recognition with robust motion representation decoupling and concentration. In: ACM International Conference Proceeding Series","DOI":"10.1145\/3503161.3548046"},{"issue":"3","key":"11526_CR80","doi-asserted-by":"publisher","first-page":"3147","DOI":"10.1007\/s40747-022-00914-3","volume":"9","author":"J Tang","year":"2022","unstructured":"Tang J, Liu B, Guo W, Wang Y (2022) Two-stream temporal enhanced Fisher vector encoding for skeleton-based action recognition. Complex Intell Syst 9(3):3147\u20133159","journal-title":"Complex Intell Syst"},{"key":"11526_CR81","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision; p. 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"key":"11526_CR82","doi-asserted-by":"crossref","unstructured":"Van H G, Mac Aodha O, Song Y, Cui Y, Sun C, Shepard A, et al (2018) The inaturalist species classification and detection dataset. In: CVPR; p. 8769\u20138778","DOI":"10.1109\/CVPR.2018.00914"},{"key":"11526_CR83","doi-asserted-by":"crossref","unstructured":"Vu NT, Huynh VT, Nguyen TN, Kim SH (2023) Ensemble spatial and temporal vision transformer for action units detection. In: Proceedings of the IEEE\/CVF CVPR; p. 5770\u20135776","DOI":"10.1109\/CVPRW59228.2023.00612"},{"key":"11526_CR84","unstructured":"Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. California Institute of Technology"},{"issue":"7","key":"11526_CR85","doi-asserted-by":"publisher","first-page":"1329","DOI":"10.1038\/s41592-024-02318-2","volume":"21","author":"C Weinreb","year":"2024","unstructured":"Weinreb C, Pearl JE, Lin S, Osman MAM, Zhang L, Annapragada S et al (2024) Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. Nat Methods 21(7):1329\u20131339. https:\/\/doi.org\/10.1038\/s41592-024-02318-2","journal-title":"Nat Methods"},{"issue":"12","key":"11526_CR86","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0226669","volume":"14","author":"K Wurtz","year":"2019","unstructured":"Wurtz K, Camerlink I, D\u2019Eath RB, Fern\u00e1ndez AP, Norton T, Steibel J et al (2019) Recording behaviour of indoor-housed farm animals automatically using machine vision technology: a systematic review. PLoS ONE 14(12):e0226669","journal-title":"PLoS ONE"},{"issue":"9","key":"11526_CR87","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.1109\/TPAMI.2018.2857768","volume":"41","author":"Y Xian","year":"2018","unstructured":"Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning\u2014a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251\u20132265","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11526_CR88","doi-asserted-by":"crossref","unstructured":"Xiao J, Jing L, Zhang L, He J, She Q, Zhou Z, et al (2021) Learning from temporal gradient for semi-supervised action recognition. In: CVPR; p. 3252\u20133262","DOI":"10.1109\/CVPR52688.2022.00325"},{"issue":"7","key":"11526_CR89","doi-asserted-by":"publisher","first-page":"2698","DOI":"10.1007\/s11263-024-02008-8","volume":"132","author":"T Xu","year":"2024","unstructured":"Xu T, Kang Z, Zhu X, Wu XJ (2024) Learning adaptive spatio-temporal inference transformer for coarse-to-fine animal visual tracking: algorithm and benchmark. Int J Comput Vis 132(7):2698\u20132712","journal-title":"Int J Comput Vis"},{"key":"11526_CR90","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.patcog.2018.07.028","volume":"85","author":"H Yang","year":"2019","unstructured":"Yang H, Yuan C, Li B, Du Y, Xing J, Hu W et al (2019) Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn 85:1\u201312","journal-title":"Pattern Recogn"},{"key":"11526_CR91","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1109\/TIP.2021.3129117","volume":"31","author":"H Yang","year":"2021","unstructured":"Yang H, Yan D, Zhang L, Sun Y, Li D, Maybank SJ (2021) Feedback graph convolutional network for skeleton-based action recognition. IEEE Trans Image Process 31:164\u2013175","journal-title":"IEEE Trans Image Process"},{"key":"11526_CR92","unstructured":"Yang T, Zhu Y, Xie Y, Zhang A, Chen C, Li M (2023) AIM: adapting image models for efficient video action recognition. In: The Eleventh International Conference on Learning Representations; p. 1\u201318"},{"key":"11526_CR93","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108282","volume":"122","author":"S Yenduri","year":"2022","unstructured":"Yenduri S, Perveen N, Chalavadi V (2022) Fine-grained action recognition using dynamic kernels. Pattern Recogn 122:108282","journal-title":"Pattern Recogn"},{"key":"11526_CR94","unstructured":"Yin Y, Huang Y, Furuta R, Sato Y (2023) Proposal-based temporal action localization with point-level supervision. In: British Machine Vision Conference"},{"issue":"1","key":"11526_CR95","doi-asserted-by":"publisher","DOI":"10.1097\/DM-2024-00011","volume":"11","author":"X Zeng","year":"2025","unstructured":"Zeng X, Gong M, Yin Y, Zhao Y, Zhu S (2025) Research progress of animal behavior recognition algorithms based on deep learning. Digital Med 11(1):e24-00011","journal-title":"Digital Med"},{"issue":"9","key":"11526_CR96","doi-asserted-by":"publisher","first-page":"968","DOI":"10.3390\/agriculture15090968","volume":"15","author":"Z Zeng","year":"2025","unstructured":"Zeng Z, Wu Z, Xie R, Lin K, Tan S, He X et al (2025) MACA-Net: Mamba-driven adaptive cross-layer attention network for multi-behavior recognition in group-housed pigs. Agriculture 15(9):968. https:\/\/doi.org\/10.3390\/agriculture15090968","journal-title":"Agriculture"},{"key":"11526_CR97","doi-asserted-by":"crossref","unstructured":"Zhang C, Gupta A, Zisserman A (2021) Temporal query networks for fine-grained video understanding. In: CVPR; p. 4486\u20134496","DOI":"10.1109\/CVPR46437.2021.00446"},{"key":"11526_CR98","doi-asserted-by":"crossref","unstructured":"Zhang H, Liu D, Zheng Q, Su B (2023) Modeling video as stochastic processes for fine-grained video representation learning. In: CVPR; p. 2225\u20132234","DOI":"10.1109\/CVPR52729.2023.00221"},{"key":"11526_CR99","doi-asserted-by":"crossref","unstructured":"Zhuang D, Jiang M, Arioui H, Tabia H (2023) Action text diffusion prior network for action segmentation. In: Int. Conf. on Content-based Multimedia Indexing; p. 79\u201385","DOI":"10.1145\/3617233.3617244"},{"issue":"8","key":"11526_CR100","doi-asserted-by":"publisher","first-page":"2329","DOI":"10.1016\/j.patcog.2015.03.006","volume":"48","author":"M Ziaeefard","year":"2015","unstructured":"Ziaeefard M, Bergevin R (2015) Semantic human activity recognition: a literature review. Pattern Recogn 48(8):2329\u20132345","journal-title":"Pattern Recogn"},{"key":"11526_CR101","unstructured":"Zia A, Sharma R, Arablouei R, Bishop-Hurley G, McNally J, Bagnall N, et al (2023) CVB: a video dataset of cattle visual behaviors. arXiv preprint arXiv:2305.16555"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-026-11526-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-026-11526-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-026-11526-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T23:13:42Z","timestamp":1778109222000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-026-11526-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,16]]},"references-count":102,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2026,5]]}},"alternative-id":["11526"],"URL":"https:\/\/doi.org\/10.1007\/s10462-026-11526-5","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,16]]},"assertion":[{"value":"15 August 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"133"}}