{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T15:29:22Z","timestamp":1768836562974,"version":"3.49.0"},"reference-count":36,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T00:00:00Z","timestamp":1727308800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation Program of China (NSFC)","award":["61976241"],"award-info":[{"award-number":["61976241"]}]},{"name":"National Science Foundation Program of China (NSFC)","award":["KYCX24_4129"],"award-info":[{"award-number":["KYCX24_4129"]}]},{"name":"National Science Foundation Program of China (NSFC)","award":["GJ2021008"],"award-info":[{"award-number":["GJ2021008"]}]},{"name":"Postgraduate Research and Practice Innovation Program of Jiangsu Province","award":["61976241"],"award-info":[{"award-number":["61976241"]}]},{"name":"Postgraduate Research and Practice Innovation Program of Jiangsu Province","award":["KYCX24_4129"],"award-info":[{"award-number":["KYCX24_4129"]}]},{"name":"Postgraduate Research and Practice Innovation Program of Jiangsu Province","award":["GJ2021008"],"award-info":[{"award-number":["GJ2021008"]}]},{"name":"International Science and technology cooperationplan project of Zhenjiang","award":["61976241"],"award-info":[{"award-number":["61976241"]}]},{"name":"International Science and technology cooperationplan project of Zhenjiang","award":["KYCX24_4129"],"award-info":[{"award-number":["KYCX24_4129"]}]},{"name":"International Science and technology cooperationplan project of Zhenjiang","award":["GJ2021008"],"award-info":[{"award-number":["GJ2021008"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Two-dimensional human pose estimation aims to equip computers with the ability to accurately recognize human keypoints and comprehend their spatial contexts within media content. However, the accuracy of real-time human pose estimation diminishes when processing images with occluded body parts or overlapped individuals. To address these issues, we propose a method based on the YOLO framework. We integrate the convolutional concepts of Kolmogorov\u2013Arnold Networks (KANs) through introducing non-linear activation functions to enhance the feature extraction capabilities of the convolutional kernels. Moreover, to improve the detection of small target keypoints, we integrate the cross-stage partial (CSP) approach and utilize the small object enhance pyramid (SOEP) module for feature integration. We also innovatively incorporate a layered shared convolution with batch normalization detection head (LSCB), consisting of multiple shared convolutional layers and batch normalization layers, to enable cross-stage feature fusion and address the low utilization of model parameters. Given the structure and purpose of the proposed model, we name it KSL-POSE. Compared to the baseline model YOLOv8l-POSE, KSL-POSE achieves significant improvements, increasing the average detection accuracy by 1.5% on the public MS COCO 2017 data set. Furthermore, the model also demonstrates competitive performance on the CrowdPOSE data set, thus validating its generalization ability.<\/jats:p>","DOI":"10.3390\/s24196249","type":"journal-article","created":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T03:44:13Z","timestamp":1727408653000},"page":"6249","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8571-325X","authenticated-orcid":false,"given":"Tianyi","family":"Lu","sequence":"first","affiliation":[{"name":"School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}]},{"given":"Ke","family":"Cheng","sequence":"additional","affiliation":[{"name":"School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8077-0015","authenticated-orcid":false,"given":"Xuecheng","family":"Hua","sequence":"additional","affiliation":[{"name":"School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1491-6904","authenticated-orcid":false,"given":"Suning","family":"Qin","sequence":"additional","affiliation":[{"name":"School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.neunet.2023.01.036","article-title":"BalanceHRNet: An effective network for bottom-up human pose estimation","volume":"161","author":"Li","year":"2023","journal-title":"Neural Netw."},{"key":"ref_2","first-page":"11","article-title":"Deep learning-based human pose estimation: A survey","volume":"56","author":"Zheng","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_3","unstructured":"Zhang, J., Zheng, Y., Qi, D., Li, R., and Yi, X. (November, January 31). DNN-based prediction model for spatio-temporal data. Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Burlingame, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., and Darrell, T. (2015, January 7\u201312). Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"ref_6","unstructured":"Park, J., Woo, S., Lee, J.Y., and Kweon, I.S. (2018). Bam: Bottleneck attention module. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ren, Z., Zhou, Y., Chen, Y., Zhou, R., and Gao, Y. (2021, January 15\u201318). Efficient human pose estimation by maximizing fusion and high-level spatial attention. Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India.","DOI":"10.1109\/FG52635.2021.9666981"},{"key":"ref_8","unstructured":"Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Solja\u010di\u0107, M., Hou, T.Y., and Tegmark, M. (2024). Kan: Kolmogorov-arnold networks. arXiv."},{"key":"ref_9","unstructured":"Han, D., Yun, S., Heo, B., and Yoo, Y. (2020). Rexnet: Diminishing representational bottleneck on convolutional neural network. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Qian, S., Ning, C., and Hu, Y. (2021, January 26\u201328). MobileNetV3 for image classification. Proceedings of the 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Nanchang, China.","DOI":"10.1109\/ICBAIE52039.2021.9389905"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/j.ins.2020.02.067","article-title":"DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection","volume":"522","author":"Huang","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_12","first-page":"100275","article-title":"HARadNet: Anchor-free target detection for radar point clouds using hierarchical attention and multi-task learning","volume":"8","author":"Dubey","year":"2022","journal-title":"Mach. Learn. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.Y., and Le, Q.V. (2019, January 15\u201320). Nas-fpn: Learning scalable feature pyramid architecture for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Maji, D., Nagori, S., Mathew, M., and Poddar, D. (2022, January 18\u201324). Yolo-pose: Enhancing yolo for multi person pose estimation using object keypoint similarity loss. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00297"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hu, M., Feng, J., Hua, J., Lai, B., Huang, J., Gong, X., and Hua, X.S. (2022, January 18\u201324). Online convolutional re-parameterization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00065"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"7699","DOI":"10.1109\/TCSVT.2024.3377365","article-title":"HF-HRNet: A simple hardware friendly high-resolution network","volume":"34","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yang, S., Quan, Z., Nie, M., and Yang, W. (2021, January 11\u201317). Transpose: Keypoint localization via transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01159"},{"key":"ref_18","unstructured":"Qiu, Z., Yang, Q., Wang, J., Wang, X., Xu, C., Fu, D., Yao, K., Han, J., Ding, E., and Wang, J. (2023). Learning structure-guided diffusion model for 2d human pose estimation. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"8068","DOI":"10.1109\/TII.2023.3266366","article-title":"LDCNet: Limb direction cues-aware network for flexible human pose estimation in industrial behavioral biometrics systems","volume":"20","author":"Liu","year":"2023","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"121352","DOI":"10.1016\/j.eswa.2023.121352","article-title":"Large separable kernel attention: Rethinking the large kernel attention design in cnn","volume":"236","author":"Lau","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xia, Z., Pan, X., Song, S., Li, L.E., and Huang, G. (2022, January 18\u201324). Vision transformer with deformable attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00475"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hou, Q., Zhou, D., and Feng, J. (2021, January 20\u201325). Coordinate attention for efficient mobile network design. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, Q.L., and Yang, Y.B. (2021, January 6\u201311). Sa-net: Shuffle attention for deep convolutional neural networks. Proceedings of the ICASSP 2021\u20132021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414568"},{"key":"ref_24","unstructured":"Cui, Y., Ren, W., and Knoll, A. (2024, January 20\u201327). Omni-Kernel Network for Image Restoration. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., and Lu, C. (2019, January 15\u201320). Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01112"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 11\u201314). Stacked Hourglass Networks for Human Pose Estimation. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hua, G., Li, L., and Liu, S. (2020). Multipath affinage stacked\u2014Hourglass networks for human pose estimation. Front. Comput. Sci., 14.","DOI":"10.1007\/s11704-019-8266-2"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, D., Zhao, Z., Wang, X., Hu, Y., Zhang, L., and Huang, T. (2019, January 7\u201311). Improving 3D human pose estimation via 3D part affinity fields. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA.","DOI":"10.1109\/WACV.2019.00112"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., and Zhang, L. (2020, January 13\u201319). Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00543"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Yang, K., Zhu, J., and Qin, L. (2024). YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation. Electronics, 13.","DOI":"10.3390\/electronics13030563"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, F., Wang, G., and Lu, B. (2024). YOLOv8-PoseBoost: Advancements in Multimodal Robot Pose Keypoint Detection. Electronics, 13.","DOI":"10.3390\/electronics13061046"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, S., Zhang, Y., Huang, S., Yi, R., Fan, K., Zhang, R., Chen, P., Wang, J., Ding, S., and Ma, L. (2024, January 16\u201322). SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00109"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tan, D., Chen, H., Tian, W., and Xiong, L. (2024, January 16\u201322). DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00217"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"13498","DOI":"10.1109\/TITS.2021.3124981","article-title":"Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal association","volume":"23","author":"Kreiss","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_36","first-page":"38571","article-title":"Vitpose: Simple vision transformer baselines for human pose estimation","volume":"35","author":"Xu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/19\/6249\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:04:21Z","timestamp":1760112261000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/19\/6249"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,26]]},"references-count":36,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["s24196249"],"URL":"https:\/\/doi.org\/10.3390\/s24196249","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,26]]}}}