{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T16:43:56Z","timestamp":1764089036520,"version":"3.45.0"},"reference-count":49,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62372494"],"award-info":[{"award-number":["62372494"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Specialized Talent Training Program","award":["2024001"],"award-info":[{"award-number":["2024001"]}]},{"name":"Guangdong Engineering Centre","award":["2024GCZX001"],"award-info":[{"award-number":["2024GCZX001"]}]},{"name":"Characteristic Innovation Project (Natural Sciences) of Guangdong Universities\u2019 Scientific Research Platform and Projects","award":["2024KTSCX016"],"award-info":[{"award-number":["2024KTSCX016"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Sitting posture recognition, defined as automatically localizing and categorizing seated human postures, has become essential for large-scale ergonomics assessment and longitudinal health-risk monitoring in classrooms and offices. However, in real-world multi-person scenes, pervasive occlusions and overlaps induce keypoint misalignment, causing global-attention backbones to fail to localize critical local structures. Moreover, annotation scarcity makes small-sample training commonplace, leaving models insufficiently robust to misalignment perturbations and thereby limiting cross-domain generalization. To address these challenges, we propose LAViTSPose, a lightweight cascaded framework for sitting posture recognition. Concretely, a YOLOR-based detector trained with a Range-aware IoU (RaIoU) loss yields tight person crops under partial visibility; ESBody suppresses cross-person leakage and estimates occlusion\/head-orientation cues; a compact ViT head (MLiT) with Spatial Displacement Contact (SDC) and a learnable temperature (LT) mechanism performs skeleton-only classification with a local structural-consistency regularizer. From an information-theoretic perspective, our design enhances discriminative feature compactness and reduces structural entropy under occlusion and annotation scarcity. We conducted a systematic evaluation on the USSP dataset, and the results show that LAViTSPose outperforms existing methods on both sitting posture classification and face-orientation recognition while meeting real-time inference requirements.<\/jats:p>","DOI":"10.3390\/e27121196","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T16:31:54Z","timestamp":1764088314000},"page":"1196","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LAViTSPose: A Lightweight Cascaded Framework for Robust Sitting Posture Recognition via Detection\u2013 Segmentation\u2013Classification"],"prefix":"10.3390","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4006-3828","authenticated-orcid":false,"given":"Shu","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China"},{"name":"Department of Industrial Electronics, University of Minho, 4800-058 Guimaraes, Portugal"}]},{"given":"Adriano","family":"Tavares","sequence":"additional","affiliation":[{"name":"Department of Industrial Electronics, University of Minho, 4800-058 Guimaraes, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8523-5287","authenticated-orcid":false,"given":"Carlos","family":"Lima","sequence":"additional","affiliation":[{"name":"Department of Industrial Electronics, University of Minho, 4800-058 Guimaraes, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4071-9015","authenticated-orcid":false,"given":"Tiago","family":"Gomes","sequence":"additional","affiliation":[{"name":"Department of Industrial Electronics, University of Minho, 4800-058 Guimaraes, Portugal"}]},{"given":"Yicong","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China"},{"name":"Department of Industrial Electronics, University of Minho, 4800-058 Guimaraes, Portugal"}]},{"given":"Jiyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1147-3968","authenticated-orcid":false,"given":"Yanchun","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Zhuhai College of Science and Technology, Zhuhai 519041, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e13127","DOI":"10.7717\/peerj.13127","article-title":"The association between sedentary behavior and low back pain in adults: A systematic review and meta-analysis of longitudinal studies","volume":"10","author":"Alzahrani","year":"2022","journal-title":"PeerJ"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1177\/2165079917737558","article-title":"Health issues and injury risks associated with prolonged sitting and sedentary lifestyles","volume":"66","author":"Lurati","year":"2018","journal-title":"Workplace Health Saf."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"126622","DOI":"10.1016\/j.eswa.2025.126622","article-title":"HKRG: Hierarchical knowledge integration for radiology report generation","volume":"271","author":"Wang","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1123\/jpah.2020-0525","article-title":"Global public health guidelines on physical activity and sedentary behavior for people living with chronic conditions: A call to action","volume":"18","author":"Dempsey","year":"2020","journal-title":"J. Phys. Act. Health"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Nadeem, M., Elbasi, E., Zreikat, A.I., and Sharsheer, M. (2024). Sitting posture recognition systems: Comprehensive literature review and analysis. Appl. Sci., 14.","DOI":"10.3390\/app14188557"},{"key":"ref_6","first-page":"209","article-title":"A vision-based human posture detection approach for smart home applications","volume":"14","author":"Shu","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017, January 21\u201326). Realtime multi-person 2d pose estimation using part affinity fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Belal, M., Hassan, T., Hassan, A., Velayudhan, D., Elhendawi, N., Aljarah, A., and Hussain, I. (2025). FSID: A novel approach to human activity recognition using few-shot weight imprinting. Sci. Rep., 15.","DOI":"10.1038\/s41598-025-04323-7"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"e442","DOI":"10.7717\/peerj-cs.442","article-title":"Detection of sitting posture using hierarchical image composition and deep learning","volume":"7","author":"Kulikajevas","year":"2021","journal-title":"PeerJ Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"12444","DOI":"10.1109\/JSEN.2025.3541821","article-title":"SitPose: Real-Time Detection of Sitting Posture and Sedentary Behavior Using Ensemble Learning with Depth Sensor","volume":"25","author":"Jin","year":"2025","journal-title":"IEEE Sens. J."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"106374","DOI":"10.1016\/j.engappai.2023.106374","article-title":"Abnormal sitting posture recognition based on multi-scale spatiotemporal features of skeleton graph","volume":"123","author":"Li","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jiao, S., Xiao, Y., Wu, X., Liang, Y., Liang, Y., and Zhou, Y. (2023, January 26\u201328). LMSPNet: Improved lightweight network for multi-person sitting posture recognition. Proceedings of the 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI), Taiyuan, China.","DOI":"10.1109\/CCAI57533.2023.10201258"},{"key":"ref_14","unstructured":"Wang, L., Liu, J., and Koniusz, P. (2021). 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Na\u00efve. arXiv."},{"key":"ref_15","unstructured":"Liang, G., Cao, J., and Liu, X. (2017, January 13\u201317). Smart cushion: A practical system for fine-grained sitting posture recognition. Proceedings of the 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Kona, HI, USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"112900","DOI":"10.1016\/j.sna.2021.112900","article-title":"A portable sitting posture monitoring system based on a pressure sensor array and machine learning","volume":"331","author":"Ran","year":"2021","journal-title":"Sens. Actuators A Phys."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Aminosharieh Najafi, T., Abramo, A., Kyamakya, K., and Affanni, A. (2022). Development of a smart chair sensors system and classification of sitting postures with deep learning algorithms. Sensors, 22.","DOI":"10.3390\/s22155585"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_20","first-page":"379","article-title":"R-fcn: Object detection via region-based fully convolutional networks","volume":"29","author":"Dai","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1066","DOI":"10.1016\/j.procs.2022.01.135","article-title":"A Review of Yolo algorithm developments","volume":"199","author":"Jiang","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_22","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15\u201320). Generalized intersection over union: A metric and a loss for bounding box regression. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, UY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_25","unstructured":"Wang, C.Y., Yeh, I.H., and Liao, H.Y.M. (2021). You only learn one representation: Unified network for multiple tasks. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_30","unstructured":"Poudel, R.P., Liwicki, S., and Cipolla, R. (2019). Fast-scnn: Fast semantic segmentation network. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 8\u201314). Bisenet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, X., Kong, T., Shen, C., Jiang, Y., and Li, L. (2020, January 23\u201328). Solo: Segmenting objects by locations. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4664","DOI":"10.1109\/TIP.2023.3295929","article-title":"Uncertainty-aware source-free domain adaptive semantic segmentation","volume":"32","author":"Lu","year":"2023","journal-title":"IEEE Trans. Image Process."},{"key":"ref_34","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_36","unstructured":"Demidov, D., Shtanchaev, A., Mihaylov, M., and Almansoori, M. (2024). Extract More from Less: Efficient Fine-Grained Visual Recognition in Low-Data Regimes. arXiv."},{"key":"ref_37","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, PMLR."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., and Yan, S. (2021, January 11\u201317). Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_39","unstructured":"Li, Y., Zhang, K., Cao, J., Timofte, R., and Van Gool, L. (2021). Localvit: Bringing locality to vision transformers. arXiv."},{"key":"ref_40","unstructured":"Mehta, S., and Rastegari, M. (2021). Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv."},{"key":"ref_41","unstructured":"Oved, D., and Zhu, T. (2019, November 18). [Updated] BodyPix: Real-Time Person Segmentation in the Browser with TensorFlow.js. TensorFlow Blog. BodyPix 2.0 Release with Multi-Person Support, Improved Accuracy (ResNet50), New API, Quantization. Available online: https:\/\/blog.tensorflow.org\/2019\/11\/updated-bodypix-2.html."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, S., Tavares, A., Lima, C., Gomes, T., Zhang, Y., and Liang, Y. (2025). MSBN-SPose: A Multi-Scale Bayesian Neuro-Symbolic Approach for Sitting Posture Recognition. Electronics, 14.","DOI":"10.3390\/electronics14193889"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1109\/CENTCON52345.2021.9687944","article-title":"A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for Image Classification","volume":"Volume 1","author":"Mascarenhas","year":"2021","journal-title":"Proceedings of the 2021 International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Maaz, M., Shaker, A., Cholakkal, H., Khan, S., Zamir, S.W., Anwer, R.M., and Shahbaz Khan, F. (2022, January 23\u201327). Edgenext: Efficiently amalgamated cnn-transformer architecture for mobile vision applications. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25082-8_1"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Shafiq, M., and Gu, Z. (2022). Deep residual learning for image recognition: A survey. Appl. Sci., 12.","DOI":"10.3390\/app12188972"},{"key":"ref_46","first-page":"11960","article-title":"Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition","volume":"34","author":"Wang","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Heo, B., Yun, S., Han, D., Chun, S., Choe, J., and Oh, S.J. (2021, January 11\u201317). Rethinking spatial dimensions of vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01172"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., and J\u00e9gou, H. (2021, January 11\u201317). Going deeper with image transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00010"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Lina, W., and Ding, J. (2020, January 3\u20135). Behavior detection method of OpenPose combined with Yolo network. Proceedings of the 2020 International Conference on Communications, Information System and Computer Engineering (CISCE), Kuala Lumpur, Malaysia.","DOI":"10.1109\/CISCE50729.2020.00072"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/12\/1196\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T16:34:15Z","timestamp":1764088455000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/12\/1196"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,25]]},"references-count":49,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["e27121196"],"URL":"https:\/\/doi.org\/10.3390\/e27121196","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,25]]}}}