{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:23:53Z","timestamp":1760059433294,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T00:00:00Z","timestamp":1749772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Heating, Ventilation and Air Conditioning (HVAC) systems are significant carbon emitters in buildings, and precise regulation is crucial for achieving carbon neutrality. Computer vision-based occupant behavior prediction provides vital data for demand-driven control strategies. Real-time multi-person pose estimation faces challenges in balancing speed and accuracy, especially in complex environments. Traditional top-down methods become computationally expensive as the number of people increases, while bottom-up methods struggle with key point mismatches in dense crowds. This paper introduces the Efficient-RTMO model, which leverages the Parameter Inverted Image Pyramid (PIIP) with hierarchical multi-scale symmetry for lightweight processing of high-resolution images and a deeper network for low-resolution images. This approach reduces computational complexity, particularly in dense crowd scenarios, and incorporates a dynamic sparse connectivity mechanism via the star-shaped dynamic feed-forward network (StarFFN). By optimizing the symmetry structure, it improves inference efficiency and ensures effective feature fusion. Experimental results on the COCO dataset show that Efficient-RTMO outperforms the baseline RTMO model, achieving more than 2\u00d7 speed improvement and a 0.3 AP increase. Ablation studies confirm that PIIP and StarFFN enhance robustness against occlusions and scale variations, demonstrating their synergistic effectiveness.<\/jats:p>","DOI":"10.3390\/sym17060941","type":"journal-article","created":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T06:19:28Z","timestamp":1749795568000},"page":"941","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Research on Person Pose Estimation Based on Parameter Inverted Pyramid and High-Dimensional Feature Enhancement"],"prefix":"10.3390","volume":"17","author":[{"given":"Guofeng","family":"Ma","sequence":"first","affiliation":[{"name":"College of Economics and Management, Tongji University, 1500 Siping Road, Yangpu District, Shanghai 200092, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qianyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Economics and Management, Tongji University, 1500 Siping Road, Yangpu District, Shanghai 200092, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1016\/j.enbuild.2007.03.007","article-title":"A Review on Buildings Energy Consumption Information","volume":"40","author":"Ortiz","year":"2008","journal-title":"Energy Build."},{"key":"ref_2","unstructured":"(2025, April 13). Standard 55\u2014Thermal Environmental Conditions for Human Occupancy. Available online: https:\/\/www.ashrae.org\/technical-resources\/bookstore\/standard-55-thermal-environmental-conditions-for-human-occupancy."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., and Zhang, L. (2020, January 14\u201319). HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00543"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, Y., Sun, F., Li, D., and Yao, A. (2020, January 23\u201328). Resolution Switchable Networks for Runtime Efficient Image Recognition. Proceedings of the Computer Vision\u2014ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part XV.","DOI":"10.1007\/978-3-030-58555-6_32"},{"key":"ref_5","first-page":"21","article-title":"Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision","volume":"24","author":"Luo","year":"2024","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Fang, H.-S., Xie, S., Tai, Y.-W., and Lu, C. (2017, January 22\u201329). RMPE: Regional Multi-Person Pose Estimation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1109\/TPAMI.2019.2929257","article-title":"OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields","volume":"43","author":"Cao","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lu, P., Jiang, T., Li, Y., Li, X., Chen, K., and Yang, W. (2024, January 16\u201322). RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00148"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pang, B., Li, Y., Li, J., Li, M., Cao, H., and Lu, C. (2020, January 14). TDAF: Top-Down Attention Framework for Vision Tasks. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v35i3.16339"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020, January 23\u201328). Learning Delicate Local Representations for Multi-Person Pose Estimation. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6"},{"key":"ref_11","unstructured":"Su, Z., Ye, M., Zhang, G., Dai, L., and Sheng, J. (2019, January 15\u201321). Cascade Feature Aggregation for Human Pose Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Toshev, A., and Szegedy, C. (2014, January 23\u201328). DeepPose: Human Pose Estimation via Deep Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.214"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep High-Resolution Representation Learning for Human Pose Estimation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020, January 23\u201328). Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2016, January 21\u201326). RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Leibe, B., Matas, J., Sebe, N., and Welling, M. (2016, January 8\u201316). Stacked Hourglass Networks for Human Pose Estimation. Proceedings of the Computer Vision\u2014ECCV 2016, Amsterdam, Netherlands.","DOI":"10.1007\/978-3-319-46484-8"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Jung, T.-W., Jeong, C.-S., Kim, I.-S., Yu, M.-S., Kwon, S.-C., and Jung, K.-D. (2022). Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud. Sensors, 22.","DOI":"10.3390\/s22218166"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1109\/TPAMI.2018.2844175","article-title":"Mask R-CNN","volume":"42","author":"He","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tan, D., Chen, H., Tian, W., and Xiong, L. (2024, January 16\u201322). DiffusionRegPose: Enhancing Multi-Person Pose Estimation Using a Diffusion-Based End-to-End Regression Approach. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00217"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020, January 23\u201328). End-to-End Object Detection with Transformers. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zheng, C., Zhu, S., Mendieta, M., Yang, T., Chen, C., and Ding, Z. (2021, January 10\u201317). 3D Human Pose Estimation with Spatial and Temporal Transformers. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01145"},{"key":"ref_22","unstructured":"Yu, N., Ma, T., Zhang, J., Zhang, Y., Bao, Q., Wei, X., and Yang, X. (November, January 18). Adaptive Vision Transformer for Event-Based Human Pose Estimation. Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Shi, D., Wei, X., Li, L., Ren, Y., and Tan, W. (2022, January 18\u201324). End-to-End Multi-Person Pose Estimation with Transformers. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01079"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, H., and Zheng, Q. (2024, January 19\u201321). Vitcc: A Vision Transformer Coordinate Classification Perspective for Human Pose Estimation. Proceedings of the 2024 International Conference on Intelligent Perception and Pattern Recognition, Qingdao, China.","DOI":"10.1145\/3700035.3700046"},{"key":"ref_25","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021, January 19\u201325). YOLOX: Exceeding YOLO Series in 2021. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ma, X., Dai, X., Bai, Y., Wang, Y., and Fu, Y. (2024, January 16\u201322). Rewrite the Stars. Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00544"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 13\u201319). EfficientDet: Scalable and Efficient Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_30","unstructured":"Singh, B., Najibi, M., and Davis, L.S. (2018, January 18\u201322). SNIPER: Efficient Multi-Scale Training. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA."},{"key":"ref_31","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017, January 21\u201326). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"129866","DOI":"10.1016\/j.neucom.2025.129866","article-title":"SCSA: Exploring the Synergistic Effects between Spatial and Channel Attention","volume":"634","author":"Si","year":"2025","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wang, Y., Dayoub, F., and S\u00fcnderhauf, N. (2021, January 20\u201325). VarifocalNet: An IoU-aware Dense Object Detector. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00841"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the Computer Vision\u2014ECCV 2014, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10605-2"},{"key":"ref_35","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (May, January 30). Mixup: Beyond Empirical Risk Minimization. Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_36","unstructured":"Loshchilov, I., and Hutter, F. (2019, January 6\u20139). Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA."},{"key":"ref_37","unstructured":"(2025, April 13). OpenMMLab Pose Estimation Toolbox and Benchmark. Available online: https:\/\/github.com\/open-mmlab\/mmpose."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Avidan, S., Brostow, G., Ciss\u00e9, M., Farinella, G.M., and Hassner, T. (2022, January 23\u201327). Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation. Proceedings of the Computer Vision\u2014ECCV 2022, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19772-7"},{"key":"ref_39","unstructured":"(2025, April 13). Ultralytics YOLO11. Available online: https:\/\/github.com\/ultralytics\/ultralytics."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Maji, D., Nagori, S., Mathew, M., and Poddar, D. (2022, January 19\u201320). YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00297"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, J., Bian, S., Zeng, A., Wang, C., Pang, B., Liu, W., and Lu, C. (2021, January 10\u201317). Human Pose Regression with Residual Log-likelihood Estimation. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01084"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Avidan, S., Brostow, G., Ciss\u00e9, M., Farinella, G.M., and Hassner, T. (2022, January 23\u201327). SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation. Proceedings of the Computer Vision\u2014ECCV 2022, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20056-4"},{"key":"ref_43","unstructured":"Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., Li, Y., and Chen, K. (2023, January 18\u201322). RTMPose: Real-Time Multi-Person Pose Estimation Based on MMPose. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada."},{"key":"ref_44","unstructured":"Lyu, C., Zhang, W., Huang, H., Zhou, Y., Wang, Y., Liu, Y., Zhang, S., and Chen, K. (2022, January 19\u201324). RTMDet: An Empirical Study of Designing Real-Time Object Detectors. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA."},{"key":"ref_45","unstructured":"Tian, Z., Chen, H., and Shen, C. (2019, January 15\u201321). DirectPose: Direct End-to-End Multi-Person Pose Estimation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Mao, W., Tian, Z., Wang, X., and Shen, C. (2021, January 20\u201325). FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00892"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shi, D., Wei, X., Yu, X., Tan, W., Ren, Y., and Pu, S. (2022, January 18\u201324). InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1145\/3474085.3475447"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). CenterNet: Keypoint Triplets for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, D., and Zhang, S. (2022, January 18\u201324). Contextual Instance Decoupling for Robust Multi-Person Pose Estimation. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA.","DOI":"10.1109\/CVPR52688.2022.01078"},{"key":"ref_50","unstructured":"Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., and Zhang, L. (2023, January 18\u201322). Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation 2023. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/6\/941\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:51:13Z","timestamp":1760032273000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/6\/941"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,13]]},"references-count":50,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["sym17060941"],"URL":"https:\/\/doi.org\/10.3390\/sym17060941","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2025,6,13]]}}}