{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T22:31:19Z","timestamp":1779316279299,"version":"3.51.4"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>\n            Action quality assessment (AQA) has become crucial in video analysis, finding wide applications in various domains, such as healthcare and sports. A significant challenge faced by AQA is the background bias due to the dominance of the background in videos. Especially, the background bias tends to overshadow subtle foreground differences, which is crucial for precise action evaluation. To address the background bias issue, we propose a novel data augmentation method named Scaled Background Swap. First, the background regions between different video samples are swapped to guide models focus toward the dynamic foreground regions and mitigate its sensitivity to the background during training. Second, the video\u2019s foreground region is upscaled to further enhance models\u2019 attention to the critical foreground action information for AQA tasks. In particular, the proposed Scaled Background Swap method can effectively improve models\u2019 accuracy and generalization by prioritizing foreground motion and swapping backgrounds. It can be flexibly applied with various video analysis models. Extensive experiments on AQA benchmarks demonstrate that Scaled Background Swap method achieves better performance than baselines. Specifically, the Spearman\u2019s rank correlation on datasets AQA-7 and MTL-AQA reaches 0.8870 and 0.9526, respectively. The code is available at:\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Emy-cv\/Scaled-Background\">https:\/\/github.com\/Emy-cv\/Scaled-Background<\/jats:ext-link>\n            Swap.\n          <\/jats:p>","DOI":"10.1145\/3737461","type":"journal-article","created":{"date-parts":[[2025,5,29]],"date-time":"2025-05-29T11:16:25Z","timestamp":1748517385000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Scaled Background Swap: Video Augmentation for Action Quality Assessment with Background Debiasing"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2428-9553","authenticated-orcid":false,"given":"Xin","family":"Zhang","sequence":"first","affiliation":[{"name":"Hangzhou Dianzi University, Hangzhou, China, Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, China, Zhoushan Tongbo Marine Electronic Information Research Institute, Hangzhou Dianzi University, China, and Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-0847-1854","authenticated-orcid":false,"given":"Hongzhi","family":"Feng","sequence":"additional","affiliation":[{"name":"Hangzhou Dianzi University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5906-9422","authenticated-orcid":false,"given":"M. Shamim","family":"Hossain","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2558-2682","authenticated-orcid":false,"given":"Yinzhuo","family":"Chen","sequence":"additional","affiliation":[{"name":"Hangzhou Dianzi University, Hangzhou, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5003-3124","authenticated-orcid":false,"given":"Hongbo","family":"Wang","sequence":"additional","affiliation":[{"name":"Hangzhou Dianzi University, Hangzhou, China."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7565-4111","authenticated-orcid":false,"given":"Yuyu","family":"Yin","sequence":"additional","affiliation":[{"name":"Hangzhou Dianzi University, Hangzhou, China, Key Laboratory of Complex Systems Modeling and Simulation Ministry of Education, China, and Zhoushan Tongbo Marine Electronic Information Research Institute, Hangzhou Dianzi University, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,13]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19772-7_25"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2505323.2505330"},{"key":"e_1_3_2_4_2","article-title":"Why can\u2019t I dance in the mall? Learning to mitigate scene bias in action recognition","volume":"32","author":"Choi Jinwoo","year":"2019","unstructured":"Jinwoo Choi, Chen Gao, Joseph C. E. Messou, and Jia-Bin Huang. 2019. Why can\u2019t I dance in the mall? Learning to mitigate scene bias in action recognition. In Advances in Neural Information Processing Systems, Vol. 32.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00012"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00949"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681084"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00805"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.146"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00077"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3633781"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2024.3482560"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3331212"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00294"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the AI-ED","author":"Gordon Andrew S.","year":"1995","unstructured":"Andrew S. Gordon. 1995. Automated video assessment of human performance. In Proceedings of the AI-ED, Vol. 2."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49409-8_2"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-024-05349-6"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45243-0_67"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3592615"},{"key":"e_1_3_2_21_2","unstructured":"Seong Tae Kim and Yong Man Ro. 2017. Evaluationnet: Can human skill be evaluated by deep networks?. arXiv:1705.11077. Retrieved from https:\/\/arxiv.org\/abs\/1705.11077"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40747-022-00892-6"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3698399"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-00767-6_12"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_32"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00539"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681598"},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3617596","article-title":"Relation with free objects for action recognition","volume":"20","author":"Liang Shuang","year":"2024","unstructured":"Shuang Liang, Wentao Ma, and Chi Xie. 2024. Relation with free objects for action recognition. ACM Transactions on Multimedia Computing, Communications and Applications 20 (2024), 1\u201319.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2024.109560"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00643"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2016.7745093"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00161"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00039"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.16"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_36"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.590"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00986"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.158"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475438"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00736"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2024.104213"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451364"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00323"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2927118"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00782"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00612"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-023-09068-w"},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang Hongyi","year":"2018","unstructured":"Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. Mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=r1Ddp1-Rb"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10446438"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3143549"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"4061","DOI":"10.1109\/TMM.2023.3294800","article-title":"Adaptive stage-aware assessment skill transfer for skill determination","volume":"26","author":"Zhang Shao-Jie","year":"2023","unstructured":"Shao-Jie Zhang, Jia-Hui Pan, Jibin Gao, and Wei-Shi Zheng. 2023. Adaptive stage-aware assessment skill transfer for skill determination. IEEE Transactions on Multimedia 26 (2024), 4061\u20134072.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2024\/196"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3281413"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3674979"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3737461","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T12:05:27Z","timestamp":1755086727000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3737461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,13]]},"references-count":55,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3737461"],"URL":"https:\/\/doi.org\/10.1145\/3737461","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,13]]},"assertion":[{"value":"2024-12-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-10","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}