{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T21:24:14Z","timestamp":1768339454776,"version":"3.49.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"Basic Public Welfare Research Project of Zhejiang Province","award":["LGG22F020014"],"award-info":[{"award-number":["LGG22F020014"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072410"],"award-info":[{"award-number":["62072410"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Zhejiang University of Technology school-enterprise cooperation project","award":["KYY-HX-20230493, KYY-HX-20230114"],"award-info":[{"award-number":["KYY-HX-20230493, KYY-HX-20230114"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Although significant progress has been made in automatic diagnosis systems for depression, most of the work focuses on combining features from multiple modalities to improve classification accuracy, which generates a lot of space-time overhead and feature synchronization problems. This research work proposes a unimodal depression detection framework based on facial expression and facial motion features. Firstly, we propose a robust feature extraction method based on the ratio of facial landmark and theoretically prove that this feature has up-down, left-right translation, depth translation, rotation, and flip invariance. The features extracted based on this method maintain the topological structure relationship of facial landmarks in space and maintain the temporal correlation of frames before and after facial landmarks. Then, we provide a novel idea to solve the classification task of large-unit depression videos. The final depression classification result is obtained by decomposing the depression classification task of large-unit videos into the scoring task of multiple short-sequence units and then through the defined score aggregation function. Our key innovations include: (1) theoretically proven invariant facial landmark ratio features, (2) novel video decomposition into short-sequence units with pseudo-labeling, and (3) efficient SRTSNet architecture. On DAIC-WOZ dataset, our framework achieves F1 = 0.85, outperforming all unimodal methods and matching state-of-the-art multimodal approaches while using only facial features.<\/jats:p>","DOI":"10.1145\/3777463","type":"journal-article","created":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T13:05:19Z","timestamp":1763643919000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FaceDepth: A Robust Unimodal Depression Detection Framework Using Invariant Facial Landmark Features"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-9270-3780","authenticated-orcid":false,"given":"Ruiji","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1712-7469","authenticated-orcid":false,"given":"Junhao","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0449-3731","authenticated-orcid":false,"given":"Runzhe","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3953-0370","authenticated-orcid":false,"given":"Guanglin","family":"Dai","sequence":"additional","affiliation":[{"name":"Zhejiang Polytechnic University of Mechanical and Electrical Engineering, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5021-378X","authenticated-orcid":false,"given":"Keji","family":"Mao","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"issue":"2","key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1109\/TCSVT.2022.3204753","article-title":"SNIS: A signal noise separation-based network for post-processed image forgery detection","volume":"33","author":"Chen Jiaxin","year":"2022","unstructured":"Jiaxin Chen, Xin Liao, Wei Wang, Zhenxing Qian, Zheng Qin, and Yaonan Wang. 2022. SNIS: A signal noise separation-based network for post-processed image forgery detection. IEEE Transactions on Circuits and Systems for Video Technology 33, 2 (2022), 935\u2013951.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","unstructured":"Min Chen Wenjing Xiao Miao Li Yixue Hao Long Hu and Guangming Tao. 2022. A multi-feature and time-aware-based stress evaluation mechanism for mental status adjustment. ACM Transactions on Multimedia Computing Communications and Applications 18 1s Article 39 (Jan. 2022) 18 pages. DOI: 10.1145\/3462763","DOI":"10.1145\/3462763"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42761-023-00191-4"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2009.5349358"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jad.2021.09.001"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2021.3114784"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.102161"},{"issue":"4","key":"e_1_3_1_9_2","first-page":"3603","article-title":"WaveRecovery: Screen-shooting watermarking based on wavelet and recovery","volume":"35","author":"Fu Linbo","year":"2024","unstructured":"Linbo Fu, Xin Liao, Jinlin Guo, Li Dong, and Zheng Qin. 2024. WaveRecovery: Screen-shooting watermarking based on wavelet and recovery. IEEE Transactions on Circuits and Systems for Video Technology 35, 4 (2024), 3603\u20133618.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2661806.2661810"},{"key":"e_1_3_1_11_2","first-page":"1716","volume-title":"Proceedings of the Interspeech","author":"Al Hanai Tuka","year":"2018","unstructured":"Tuka Al Hanai, Mohammad Mahdi Ghassemi, and James R. Glass. 2018. Detecting depression with audio\/text sequence modeling of interviews. In Proceedings of the Interspeech. ISCA, Stockholm, 1716\u20131720. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:52191746"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.10.012"},{"issue":"580","key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1192\/bjp.124.3.221","article-title":"Non-verbal behaviour in mental illness","volume":"124","author":"Hill Denis","year":"1974","unstructured":"Denis Hill. 1974. Non-verbal behaviour in mental illness. The British Journal of Psychiatry: The Journal of Mental Science 124, 580 (1974), 221\u2013230.","journal-title":"The British Journal of Psychiatry: The Journal of Mental Science"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","unstructured":"Cong Hu Xiao-Zhong Wei and Xiao-Jun Wu. 2024. DIRformer: A novel image restoration approach based on U-shaped transformer and diffusion models. ACM Transactions on Multimedia Computing Communications and Applications 21 2 Article 57 (Dec. 2024) 23 pages. DOI: 10.1145\/3703632","DOI":"10.1145\/3703632"},{"issue":"1","key":"e_1_3_1_16_2","first-page":"210","article-title":"Beck depression inventory-II","volume":"1","author":"Beck Depression Inventory-II","year":"2010","unstructured":"Beck Depression Inventory-II. 2010. Beck depression inventory-II. The Corsini Encyclopedia of Psychology 1, 1 (2010), 210.","journal-title":"The Corsini Encyclopedia of Psychology"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1046\/j.1525-1497.2001.016009606.x"},{"issue":"9820","key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1016\/S0140-6736(11)60602-8","article-title":"Major depressive disorder: New clinical, neurobiological, and treatment perspectives","volume":"379","author":"Kupfer David J.","year":"2012","unstructured":"David J. Kupfer, Ellen Frank, and Mary L. Phillips. 2012. Major depressive disorder: New clinical, neurobiological, and treatment perspectives. The Lancet 379, 9820 (2012), 1045\u20131055.","journal-title":"The Lancet"},{"key":"e_1_3_1_19_2","first-page":"3946","volume-title":"Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP \u201919)","author":"Lam Genevieve","year":"2019","unstructured":"Genevieve Lam, Huang Dongyan, and Weisi Lin. 2019. Context-aware deep learning for multi-modal depression detection. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP \u201919). IEEE, 3946\u20133950."},{"issue":"3","key":"e_1_3_1_20_2","first-page":"2782","article-title":"CAS(ME)3: A third generation facial spontaneous micro-expression database with depth information and high ecological validity","volume":"45","author":"Li Jingting","year":"2022","unstructured":"Jingting Li, Zizhao Dong, Shaoyuan Lu, Su-Jing Wang, Wen-Jing Yan, Yinhuan Ma, Ye Liu, Changbing Huang, and Xiaolan Fu. 2022. CAS(ME)3: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2022), 2782\u20132800.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2020.2981446"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2024.3415415"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3278310"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2019.8756567"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"1017064","DOI":"10.3389\/fpsyt.2022.1017064","article-title":"Measuring depression severity based on facial expression and body movement using deep convolutional neural network","volume":"13","author":"Liu Dongdong","year":"2022","unstructured":"Dongdong Liu, Bowen Liu, Tao Lin, Guangya Liu, Guoyu Yang, Dezhen Qi, Ye Qiu, Yuer Lu, Qinmei Yuan, Stella C. Shuai, et al. 2022. Measuring depression severity based on facial expression and body movement using deep convolutional neural network. Frontiers in Psychiatry 13 (2022), 1017064.","journal-title":"Frontiers in Psychiatry"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","unstructured":"Hao Liu Zhaoyu Yan Bing Liu Jiaqi Zhao Yong Zhou and Abdulmotaleb El Saddik. 2023. Distilled meta-learning for multi-class incremental learning. ACM Transactions on Multimedia Computing Communications and Applications 19 4 Article 149 (Mar. 2023) 16 pages. DOI: 10.1145\/3576045","DOI":"10.1145\/3576045"},{"key":"e_1_3_1_27_2","volume-title":"Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NEURIPS \u201924)","volume":"41","author":"Longpre Shayne","year":"2024","unstructured":"Shayne Longpre, Robert Mahari, Ariel Lee, Campbell Lund, Hamidah Oderinwale, William Brannon, Nayan Saxena, Naana Obeng-Marnu, Tobin South, Cole Hunter, et al. 2024. Consent in crisis: The rapid decline of the AI data commons. In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NEURIPS \u201924), 41 pages (13 main), 5 figures, 9 tables. Retrieved from https:\/\/hal.science\/hal-04824161"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988267"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.swevo.2023.101248"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Wolfgang Marx Brenda W. J. H. Penninx Marco Solmi Toshi A. Furukawa Joseph Firth Andre F. Carvalho and Michael Berk. 2023. Major depressive disorder. Nature Reviews Disease Primers 9 1 (2023) 44.","DOI":"10.1038\/s41572-023-00454-1"},{"key":"e_1_3_1_31_2","volume-title":"An Approach to Environmental Psychology","author":"Mehrabian Albert","year":"1974","unstructured":"Albert Mehrabian and James A. Russell. 1974. An Approach to Environmental Psychology. MIT Press, Cambridge, MA."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512128"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988261"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC.2017.8037103"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-12420-2"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/IC_ASET49463.2020.9318302"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.116076"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472622"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jad.2020.12.160"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133951"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3462218"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2013.58"},{"issue":"10100","key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"1211","DOI":"10.1016\/S0140-6736(17)32154-2","article-title":"Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990\u20132016: A systematic analysis for the global burden of disease study 2016","volume":"390","author":"Vos Theo","year":"2017","unstructured":"Theo Vos, Amanuel Alemu Abajobir, Kalkidan Hassen Abate, Cristiana Abbafati, Kaja M. Abbas, Foad Abd-Allah, Rizwan Suliankatchi Abdulkader, Abdishakur M. Abdulle, Teshome Abuka Abebo, Semaw Ferede Abera, et al. 2017. Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990\u20132016: A systematic analysis for the global burden of disease study 2016. The Lancet 390, 10100 (2017), 1211\u20131259.","journal-title":"The Lancet"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747232"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1001\/archpsyc.1988.01800320058007"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988263"},{"issue":"3","key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1049\/cit2.12113","article-title":"Automatic depression recognition by intelligent speech signal processing: A systematic survey","volume":"8","author":"Wu Pingping","year":"2023","unstructured":"Pingping Wu, Ruihao Wang, Han Lin, Fanlong Zhang, Juan Tu, and Miao Sun. 2023. Automatic depression recognition by intelligent speech signal processing: A systematic survey. CAAI Transactions on Intelligence Technology 8, 3 (2023), 701\u2013711.","journal-title":"CAAI Transactions on Intelligence Technology"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988263"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-016-0395-y"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","unstructured":"Jeewoo Yoon Chaewon Kang Seungbae Kim and Jinyoung Han. 2022. D-vlog: Multimodal vlog dataset for depression detection. Proceedings of the AAAI Conference on Artificial Intelligence 36 (2022) 12226\u201312234. DOI: 10.1609\/aaai.v36i11.21483","DOI":"10.1609\/aaai.v36i11.21483"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777463","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T14:19:59Z","timestamp":1768313999000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777463"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,13]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3777463"],"URL":"https:\/\/doi.org\/10.1145\/3777463","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,13]]},"assertion":[{"value":"2025-04-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-31","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}