{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T13:36:04Z","timestamp":1767792964143,"version":"3.49.0"},"reference-count":79,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T00:00:00Z","timestamp":1767657600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Nantong Basic Science Research Program","award":["JC2023021"],"award-info":[{"award-number":["JC2023021"]}]},{"name":"Doctoral Research Startup Fund of Nantong University","award":["25B03"],"award-info":[{"award-number":["25B03"]}]},{"name":"Qing Lan Project of Jiangsu Province"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static fusion strategies that fail to dynamically adapt to the complementarity and redundancy among modalities. To address these challenges, this paper proposes a dynamic multimodal depression detection framework, DynMultiDep, which combines multi-scale temporal modeling with an adaptive fusion mechanism. The core innovations of DynMultiDep lie in its Multi-scale Temporal Experts Module (MTEM) and Dynamic Multimodal Fusion module (DynMM). On one hand, MTEM employs Mamba experts to extract long-term trend features and utilizes local-window Transformers to capture short-term dynamic fluctuations, achieving adaptive fusion through a long-short routing mechanism. On the other hand, DynMM introduces modality-level and fusion-level dynamic decision-making, selecting critical modality paths and optimizing cross-modal interaction strategies based on input characteristics. The experimental results demonstrate that DynMultiDep outperforms existing state-of-the-art methods in detection performance on two widely used large-scale depression datasets.<\/jats:p>","DOI":"10.3390\/jimaging12010029","type":"journal-article","created":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T13:49:46Z","timestamp":1767707386000},"page":"29","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["DynMultiDep: A Dynamic Multimodal Fusion and Multi-Scale Time Series Modeling Approach for Depression Detection"],"prefix":"10.3390","volume":"12","author":[{"given":"Jincheng","family":"Li","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Nantong University, Nantong 226019, China"}]},{"given":"Menglin","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Nantong University, Nantong 226019, China"}]},{"given":"Jiongyi","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Nantong University, Nantong 226019, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4766-1461","authenticated-orcid":false,"given":"Yihui","family":"Zhan","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Computer Science, Nantong University, Nantong 226019, China"}]},{"given":"Xing","family":"Xie","sequence":"additional","affiliation":[{"name":"Engineering Training Center, Nantong University, Nantong 226019, China"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/S0022-3999(02)00304-5","article-title":"Depression and public health: An overview","volume":"53","author":"Cassano","year":"2002","journal-title":"J. Psychosom. Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Remes, O., Mendes, J.F., and Templeton, P. (2021). Biological, psychological, and social determinants of depression: A review of recent literature. Brain Sci., 11.","DOI":"10.3390\/brainsci11121633"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/0165-1781(91)90025-K","article-title":"Sleep, depression, and suicide","volume":"36","author":"Sabo","year":"1991","journal-title":"Psychiatry Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Steiger, A., and Pawlowski, M. (2019). Depression and sleep. Int. J. Mol. Sci., 20.","DOI":"10.3390\/ijms20030607"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.jad.2013.01.004","article-title":"Risk factors for suicide in individuals with depression: A systematic review","volume":"147","author":"Hawton","year":"2013","journal-title":"J. Affect. Disord."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1037\/0735-7028.32.1.97","article-title":"Suicide and depression among college students: A decade later","volume":"32","author":"Furr","year":"2001","journal-title":"Prof. Psychol. Res. Pract."},{"key":"ref_7","unstructured":"World Health Organization (2023). Depressive Disorder (Depression), World Health Organization."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhao, Y.-J., Jin, Y., Rao, W.-W., Zhang, Q.-E., Zhang, L., Jackson, T., Su, Z.-H., Xiang, M., Yuan, Z., and Xiang, Y.-T. (2021). Prevalence of major depressive disorder among adults in China: A systematic review and meta-analysis. Front. Psychiatry, 12.","DOI":"10.3389\/fpsyt.2021.659470"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1093\/heapol\/czx137","article-title":"Integrated mental health services in China: Challenges and planning for the future","volume":"33","author":"Liang","year":"2018","journal-title":"Health Policy Plan."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"102975","DOI":"10.1016\/j.ajp.2021.102975","article-title":"The state of mental health care in China","volume":"69","author":"Xu","year":"2022","journal-title":"Asian J. Psychiatry"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1016\/S2215-0366(21)00251-0","article-title":"Prevalence of depressive disorders and treatment in China: A cross-sectional epidemiological study","volume":"8","author":"Lu","year":"2021","journal-title":"Lancet Psychiatry"},{"key":"ref_12","first-page":"369","article-title":"Treatment-resistant depression: Therapeutic trends, challenges, and future directions","volume":"6","year":"2012","journal-title":"Patient Prefer. Adherence"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1146\/annurev.clinpsy.121208.131305","article-title":"Cognition and depression: Current status and future directions","volume":"6","author":"Gotlib","year":"2010","journal-title":"Annu. Rev. Clin. Psychol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1046\/j.1525-1497.1999.03478.x","article-title":"Awareness, diagnosis, and treatment of depression","volume":"14","author":"Goldman","year":"1999","journal-title":"J. Gen. Intern. Med."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1109\/TAFFC.2016.2634527","article-title":"Multimodal depression detection: Fusion analysis of paralinguistic, head pose and eye gaze behaviors","volume":"9","author":"Alghowinem","year":"2016","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"4709","DOI":"10.1007\/s11227-021-04040-8","article-title":"Automatic detection of depression symptoms in twitter using multimodal analysis","volume":"78","author":"Safa","year":"2022","journal-title":"J. Supercomput."},{"key":"ref_17","unstructured":"Gui, T., Zhu, L., Zhang, Q., Peng, M., Zhou, X., Ding, K., and Chen, Z. (February, January 27). Cooperative multimodal approach to depression detection in twitter. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_18","first-page":"3838","article-title":"Depression detection via harvesting social media: A multimodal dictionary learning solution","volume":"2017","author":"Shen","year":"2017","journal-title":"IJCAI"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/j.inffus.2020.01.008","article-title":"Feature-level fusion approaches based on multimodal EEG data for depression recognition","volume":"59","author":"Cai","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"116076","DOI":"10.1016\/j.eswa.2021.116076","article-title":"Audio based depression detection using Convolutional Autoencoder","volume":"189","author":"Sardari","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ma, X., Yang, H., Chen, Q., Huang, D., and Wang, Y. (2016, January 16). Depaudionet: An efficient deep model for audio based depression classification. Proceedings of the 6th International Workshop on Audio\/Visual Emotion Challenge, Amsterdam, The Netherlands.","DOI":"10.1145\/2988257.2988267"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"101181","DOI":"10.1109\/ACCESS.2020.2998532","article-title":"Recognition of audio depression based on convolutional neural network and generative antagonism network model","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"624","DOI":"10.1038\/s41586-024-07805-2","article-title":"Frontostriatal salience network expansion in individuals in depression","volume":"633","author":"Lynch","year":"2024","journal-title":"Nature"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"6208","DOI":"10.1038\/s41467-020-20053-y","article-title":"Longitudinal symptom dynamics of COVID-19 infection","volume":"11","author":"Mizrahi","year":"2020","journal-title":"Nat. Commun."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.copsyc.2017.06.004","article-title":"Emotion dynamics","volume":"17","author":"Kuppens","year":"2017","journal-title":"Curr. Opin. Psychol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","article-title":"A review of affective computing: From unimodal analysis to multimodal fusion","volume":"37","author":"Poria","year":"2017","journal-title":"Inf. Fusion"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency, L.-P. (2017). Tensor fusion network for multimodal sentiment analysis. arXiv.","DOI":"10.18653\/v1\/D17-1115"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.knosys.2018.07.041","article-title":"Multimodal sentiment analysis using hierarchical fusion with context modeling","volume":"161","author":"Majumder","year":"2018","journal-title":"Knowl.-Based Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"207","DOI":"10.31887\/DCNS.2006.8.2\/pbech","article-title":"Rating scales in depression: Limitations and pitfalls","volume":"8","author":"Bech","year":"2006","journal-title":"Dialogues Clin. Neurosci."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sepehry, A.A. (2024). Self-rating depression scale (SDS). Encyclopedia of Quality of Life and Well-Being Research, Springer.","DOI":"10.1007\/978-3-031-17299-1_2641"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/0165-0327(86)90076-5","article-title":"Assessment of depression: A comparison of rating scales","volume":"11","author":"Faravelli","year":"1986","journal-title":"J. Affect. Disord."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1111\/j.2044-8260.1985.tb01312.x","article-title":"Self-report measures of depression: Some psychometric considerations","volume":"24","author":"Boyle","year":"1985","journal-title":"Br. J. Clin. Psychol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1136\/jnnp.23.1.56","article-title":"A rating scale for depression","volume":"23","author":"Hamilton","year":"1960","journal-title":"J. Neurol. Neurosurg. Psychiatry"},{"key":"ref_34","first-page":"138","article-title":"Depression rating scales-benefits and limitations. A literature review","volume":"31","author":"Platona","year":"2023","journal-title":"J. Psychol. Educ. Res."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/B978-0-444-52002-9.00013-9","article-title":"Psychiatric rating scales","volume":"106","author":"Maust","year":"2012","journal-title":"Handb. Clin. Neurol."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1159\/000395072","article-title":"General problems of psychiatric rating scales (especially for depression)","volume":"7","author":"Hamilton","year":"1974","journal-title":"Mod. Probl. Pharmacopsychiatry"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"e36417","DOI":"10.2196\/36417","article-title":"Multimodal assessment of schizophrenia and depression utilizing video, acoustic, locomotor, electroencephalographic, and heart rate technology: Protocol for an observational study","volume":"11","author":"Cotes","year":"2022","journal-title":"JMIR Res. Protoc."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1109\/TNB.2020.2990690","article-title":"An improved classification model for depression detection using EEG and eye tracking data","volume":"19","author":"Zhu","year":"2020","journal-title":"IEEE Trans. Nanobiosci."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"de Melo, W.C., Granger, E., and Hadid, A. (2019, January 14\u201318). Combining global and local convolutional 3d networks for detecting depression from facial expressions. Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France.","DOI":"10.1109\/FG.2019.8756568"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"102128","DOI":"10.1016\/j.compmedimag.2022.102128","article-title":"Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model","volume":"102","author":"Xie","year":"2022","journal-title":"Comput. Med. Imaging Graph."},{"key":"ref_41","unstructured":"Zhang, Q., Wu, H., Zhang, C., Hu, Q., Fu, H., Zhou, J.T., and Peng, X. (2023, January 23\u201329). Provable dynamic fusion for low-quality multimodal data. Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L. (2022). Transformers in time series: A survey. arXiv.","DOI":"10.24963\/ijcai.2023\/759"},{"key":"ref_43","first-page":"1178","article-title":"Super-resolution Reconstruction of Remote Sensing Image Based on Transformer of Multi-scale Feature Fusion","volume":"45","author":"Wang","year":"2024","journal-title":"J. Northeast. Univ. (Nat. Sci.)"},{"key":"ref_44","first-page":"240237-1","article-title":"Multi-scale feature enhanced Transformer network for efficient semantic segmentation","volume":"51","author":"Yan","year":"2024","journal-title":"Opt.-Electron. Eng."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"110987","DOI":"10.1016\/j.knosys.2023.110987","article-title":"MAXFormer: Enhanced transformer for medical image segmentation with multi-attention and multi-scale features fusion","volume":"280","author":"Liang","year":"2023","journal-title":"Knowl.-Based Syst."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2557","DOI":"10.1007\/s40747-023-01279-x","article-title":"Enhanced multi-scale networks for semantic segmentation","volume":"10","author":"Li","year":"2024","journal-title":"Complex Intell. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.-H., Lai, L., Chandra, V., and Pan, D.Z. (2022, January 18\u201324). Multi-scale high-resolution vision transformer for semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01178"},{"key":"ref_48","unstructured":"Yan, H., Zhang, C., and Wu, M. (2022). Lawin transformer: Improving semantic segmentation transformer with multi-scale representations via large window attention. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"108057","DOI":"10.1016\/j.compbiomed.2024.108057","article-title":"MS-TCNet: An effective Transformer\u2013CNN combined network using multi-scale feature learning for 3D medical image segmentation","volume":"170","author":"Ao","year":"2024","journal-title":"Comput. Biol. Med."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"2967","DOI":"10.1109\/JBHI.2024.3366662","article-title":"Spectral graph neural network-based multi-atlas brain network fusion for major depressive disorder diagnosis","volume":"28","author":"Lee","year":"2024","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_51","unstructured":"Chen, P., Zhang, Y., Cheng, Y., Shu, Y., Wang, Y., Wen, Q., Yang, B., and Guo, C. (2024). Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. arXiv."},{"key":"ref_52","unstructured":"Shabani, A., Abdi, A., Meng, L., and Sylvain, T. (2022). Scaleformer: Iterative multi-scale refining transformers for time series forecasting. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Du, D., Su, B., and Wei, Z. (2023, January 4\u201310). Preformer: Predictive transformer with multi-scale segment-wise correlations for long-term time series forecasting. Proceedings of the ICASSP 2023\u20142023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10096881"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1016\/j.comcom.2021.10.036","article-title":"LSTM-MFCN: A time series classifier based on multi-scale spatial\u2013temporal features","volume":"182","author":"Zhao","year":"2022","journal-title":"Comput. Commun."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"108459","DOI":"10.1016\/j.knosys.2022.108459","article-title":"Adaptive feature fusion for time series classification","volume":"243","author":"Wang","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Sun, H., Liu, J., Chai, S., Qiu, Z., Lin, L., Huang, X., and Chen, Y. (2021). Multi-modal adaptive fusion transformer network for the estimation of depression level. Sensors, 21.","DOI":"10.3390\/s21144764"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Ma, L., Yao, Y., Liang, T., and Liu, T. (2024). Multi-scale cooperative multimodal transformers for multimodal sentiment analysis in videos. Australasian Joint Conference on Artificial Intelligence, Springer.","DOI":"10.1007\/978-981-96-0351-0_21"},{"key":"ref_58","unstructured":"Luo, H., Ji, L., Huang, Y., Wang, B., Ji, S., and Li, T. (2021). Scalevlad: Improving multimodal sentiment analysis via multi-scale fusion of locally descriptors. arXiv."},{"key":"ref_59","first-page":"8","article-title":"A machine learning approach to multi-scale sentiment analysis of amharic online posts","volume":"2","author":"Philemon","year":"2014","journal-title":"HiLCoE J. Comput. Sci. Technol."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1905","DOI":"10.1007\/s11280-021-00994-0","article-title":"A distributed learning based sentiment analysis methods with Web applications","volume":"25","author":"Xiong","year":"2022","journal-title":"World Wide Web"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Zhang, M., Liu, Z., Feng, J., Liu, L., and Jiao, L. (2023). Remote sensing image change detection based on deep multi-scale multi-attention Siamese transformer network. Remote Sens., 15.","DOI":"10.3390\/rs15030842"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Zhang, P., Dai, X., Yang, J., Xiao, B., Yuan, L., Zhang, L., and Gao, J. (2021, January 11\u201317). Multi-scale vision longformer: A new vision transformer for high-resolution image encoding. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00299"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Zhu, F., Zhao, S., Wang, P., Wang, H., Yan, H., and Liu, S. (2022, January 18\u201324). Semi-supervised wide-angle portraits correction by multi-scale transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01907"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"8766","DOI":"10.3934\/mbe.2023385","article-title":"A novel sentiment analysis method based on multi-scale deep learning","volume":"20","author":"Xiang","year":"2023","journal-title":"Math. Biosci. Eng."},{"key":"ref_65","unstructured":"Yoon, J., Kang, C., Kim, S., and Han, J. (March, January 22). D-vlog: Multimodal vlog dataset for depression detection. Proceedings of the AAAI Conference on Artificial Intelligence, Online."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"He, L., Chen, K., Zhao, J., Wang, Y., Pei, E., Chen, H., Jiang, J., Zhang, S., Zhang, J., and Wang, Z. (2024). Lmvd: A large-scale multimodal vlog dataset for depression detection in the wild. arXiv.","DOI":"10.36227\/techrxiv.171591570.08868181\/v1"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"2189","DOI":"10.1109\/TAFFC.2025.3557153","article-title":"MCRVT: Multi-Hierarchical Cross-Reconstruction Networks With Versatile Transformer for Speech Emotion Recognition","volume":"16","author":"Li","year":"2025","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_68","first-page":"44025","article-title":"Spike Memory Transformer: An Energy-Efficient Model in Distributed Learning Framework for Autonomous Depression Detection","volume":"12","author":"Yang","year":"2025","journal-title":"IEEE Internet Things J."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Zhou, L., Liu, Z., Shangguan, Z., Yuan, X., Li, Y., and Hu, B. (2023, January 20\u201324). JAMFN: Joint Attention Multi-Scale Fusion Network for Depression Detection. Proceedings of the Interspeech 2023, Dublin, Ireland.","DOI":"10.21437\/Interspeech.2023-183"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Ye, J., Zhang, J., and Shan, H. (2025, January 6\u201311). Depmamba: Progressive fusion mamba for multimodal depression detection. Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India.","DOI":"10.1109\/ICASSP49660.2025.10889975"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1007\/s10489-024-05908-x","article-title":"LMTformer: Facial depression recognition with lightweight multi-scale transformer from videos","volume":"55","author":"He","year":"2025","journal-title":"Appl. Intell."},{"key":"ref_72","first-page":"15908","article-title":"Transformer in transformer","volume":"34","author":"Han","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"1235","DOI":"10.1162\/neco_a_01199","article-title":"A review of recurrent neural networks: LSTM cells and network architectures","volume":"31","author":"Yu","year":"2019","journal-title":"Neural Comput."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Siami-Namini, S., Tavakoli, N., and Namin, A.S. (2019, January 9\u201312). The performance of LSTM and BiLSTM in forecasting time series. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9005997"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Dey, R., and Salem, F.M. (2017, January 6\u20139). Gate-variants of gated recurrent unit (GRU) neural networks. Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA.","DOI":"10.1109\/MWSCAS.2017.8053243"},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"13109","DOI":"10.1007\/s00521-021-05958-z","article-title":"Parallel spatio-temporal attention-based TCN for multivariate time series prediction","volume":"35","author":"Fan","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_77","first-page":"8291","article-title":"Vision gnn: An image is worth graph of nodes","volume":"35","author":"Han","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhan, L., Thompson, P., and Zhou, J. (2020, January 6\u201310). Multimodal learning with incomplete modalities by knowledge distillation. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual.","DOI":"10.1145\/3394486.3403234"},{"key":"ref_79","unstructured":"(2024, March 16). Evidently AI. Accuracy, Precision, and Recall in Multi-Class Classification. Available online: https:\/\/www.evidentlyai.com\/classification-metrics\/multi-class-metrics."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/12\/1\/29\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T09:55:06Z","timestamp":1767779706000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/12\/1\/29"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,6]]},"references-count":79,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["jimaging12010029"],"URL":"https:\/\/doi.org\/10.3390\/jimaging12010029","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,6]]}}}