{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:00:53Z","timestamp":1775228453228,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,10,3]],"date-time":"2024-10-03T00:00:00Z","timestamp":1727913600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,3]],"date-time":"2024-10-03T00:00:00Z","timestamp":1727913600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100012352","name":"Universit\u00e1 degli Studi di Milano","doi-asserted-by":"publisher","award":["Advanced methods for sound and music computing"],"award-info":[{"award-number":["Advanced methods for sound and music computing"]}],"id":[{"id":"10.13039\/100012352","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Healthc Inform Res"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The evaluation of an individual\u2019s mental health and behavioral functioning, known as psychological assessment, is generally conducted by a mental health professional. This process aids in diagnosing mental health conditions, identifying suitable treatment options, and assessing progress during treatment. Currently, national health systems are unable to cope with the constantly growing demand for such services. To address and expedite the diagnosis process, this study suggests an AI-powered tool capable of delivering understandable predictions through the automated processing of the captured speech signals. To this end, we employed a Siamese neural network (SNN) elaborating on standardized speech representations free of domain expert knowledge. Such an SNN-based framework is able to address multiple downstream tasks using the same latent representation. Interestingly, it has been applied both for classifying speech depression as well as assessing its severity. After extensive experiments on a publicly available dataset following a standardized protocol, it is shown to significantly outperform the state of the art with respect to both tasks. Last but not least, the present solution offers interpretable predictions, while being able to meaningfully interact with the medical experts.<\/jats:p>","DOI":"10.1007\/s41666-024-00175-4","type":"journal-article","created":{"date-parts":[[2024,10,3]],"date-time":"2024-10-03T15:01:49Z","timestamp":1727967709000},"page":"577-593","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Siamese Neural Network for Speech-Based Depression Classification and Severity Assessment"],"prefix":"10.1007","volume":"8","author":[{"given":"Stavros","family":"Ntalampiras","sequence":"first","affiliation":[]},{"given":"Wen","family":"Qi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,3]]},"reference":[{"key":"175_CR1","doi-asserted-by":"publisher","unstructured":"Zhang X, Shen J, Din Z.u, Liu J, Wang G, Hu B, (2019) Multimodal depression detection: fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble. IEEE J Biomed Health Inf 23(6):2265\u20132275. https:\/\/doi.org\/10.1109\/JBHI.2019.2938247","DOI":"10.1109\/JBHI.2019.2938247"},{"key":"175_CR2","doi-asserted-by":"publisher","unstructured":"Trautmann S, Rehm J, Wittchen H (2016) The economic costs of mental disorders: Do our societies react appropriately to the burden of mental disorders? EMBO Rep 17(9):1245\u20131249. https:\/\/doi.org\/10.15252\/embr.201642951","DOI":"10.15252\/embr.201642951"},{"issue":"1","key":"175_CR3","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1002\/lio2.354","volume":"5","author":"DM Low","year":"2020","unstructured":"Low DM, Bentley KH, Ghosh SS (2020) Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Invest Otolaryngol 5(1):96\u2013116. https:\/\/doi.org\/10.1002\/lio2.354","journal-title":"Laryngoscope Invest Otolaryngol"},{"issue":"4","key":"175_CR4","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1049\/iet-spr.2019.0487","volume":"14","author":"S Ntalampiras","year":"2020","unstructured":"Ntalampiras S (2020) Collaborative framework for automatic classification of respiratory sounds. IET Signal Process 14(4):223\u2013228. https:\/\/doi.org\/10.1049\/iet-spr.2019.0487","journal-title":"IET Signal Process"},{"key":"175_CR5","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/978-3-031-15937-4_4","volume-title":"Artificial neural networks and machine learning - ICANN 2022","author":"AM Poir\u00e8","year":"2022","unstructured":"Poir\u00e8 AM, Simonetta F, Ntalampiras S (2022) Deep feature learning for medical acoustics. In: Pimenidis E, Angelov P, Jayne C, Papaleonidas A, Aydin M (eds) Artificial neural networks and machine learning - ICANN 2022. Springer, Cham, pp 39\u201350"},{"key":"175_CR6","doi-asserted-by":"publisher","unstructured":"Conversano V, Ntalampiras S (2023) Ensemble learning for cough-based subject-independent COVID-19 detection. In: Marsico MD, Baja GS, Fred ALN (eds) Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2023, Lisbon, Portugal, 22-24 February 2023, pp. 798\u2013805. SCITEPRESS, ???. https:\/\/doi.org\/10.5220\/0011651700003411","DOI":"10.5220\/0011651700003411"},{"issue":"3","key":"175_CR7","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1049\/cit2.12113","volume":"8","author":"P Wu","year":"2022","unstructured":"Wu P, Wang R, Lin H, Zhang F, Tu J, Sun M (2022) Automatic depression recognition by intelligent speech signal processing: a systematic survey. CAAI Trans Intell Technol 8(3):701\u2013711. https:\/\/doi.org\/10.1049\/cit2.12113","journal-title":"CAAI Trans Intell Technol"},{"key":"175_CR8","doi-asserted-by":"publisher","unstructured":"Shen Y, Yang H, Lin L (2022) Automatic depression detection: an emotional audio-textual corpus and a gru\/bilstm-based model. In: ICASSP, pp 6247\u20136251. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746569","DOI":"10.1109\/ICASSP43922.2022.9746569"},{"issue":"2","key":"175_CR9","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1109\/TAFFC.2017.2766145","volume":"11","author":"N Cummins","year":"2020","unstructured":"Cummins N, Sethu V, Epps J, Williamson JR, Quatieri TF, Krajewski J (2020) Generalized two-stage rank regression framework for depression score prediction from speech. IEEE Trans Affect Comput 11(2):272\u2013283. https:\/\/doi.org\/10.1109\/TAFFC.2017.2766145","journal-title":"IEEE Trans Affect Comput"},{"key":"175_CR10","doi-asserted-by":"crossref","unstructured":"Huang Z, Epps J, Joachim D, Chen M (2018) Depression detection from short utterances via diverse smartphones in natural environmental conditions. In: Interspeech. https:\/\/api.semanticscholar.org\/CorpusID:52191650","DOI":"10.21437\/Interspeech.2018-1743"},{"key":"175_CR11","doi-asserted-by":"publisher","unstructured":"Li Y, Lin Y, Ding H, Li C (2019) Speech databases for mental disorders: a systematic review. Gen Psychiatry 32(3). https:\/\/doi.org\/10.1136\/gpsych-2018-100022https:\/\/gpsych.bmj.com\/content\/32\/3\/e100022.full.pdf","DOI":"10.1136\/gpsych-2018-100022"},{"issue":"6","key":"175_CR12","doi-asserted-by":"publisher","first-page":"2294","DOI":"10.1109\/JBHI.2019.2913590","volume":"23","author":"EW McGinnis","year":"2019","unstructured":"McGinnis EW, Anderau SP, Hruschak J, Gurchiek RD, Lopez-Duran NL, Fitzgerald K, Rosenblum KL, Muzik M, McGinnis RS (2019) Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood. IEEE J Biomed Health Inf 23(6):2294\u20132301. https:\/\/doi.org\/10.1109\/JBHI.2019.2913590","journal-title":"IEEE J Biomed Health Inf"},{"issue":"11","key":"175_CR13","doi-asserted-by":"publisher","first-page":"4793","DOI":"10.1109\/tnnls.2020.3027314","volume":"32","author":"E Tjoa","year":"2021","unstructured":"Tjoa E, Guan C (2021) A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 32(11):4793\u20134813. https:\/\/doi.org\/10.1109\/tnnls.2020.3027314","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"5","key":"175_CR14","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1109\/tetci.2021.3100641","volume":"5","author":"Y Zhang","year":"2021","unstructured":"Zhang Y, Tino P, Leonardis A, Tang K (2021) A survey on neural network interpretability. IEEE Trans Emerg Top Comput Intell 5(5):726\u2013742. https:\/\/doi.org\/10.1109\/tetci.2021.3100641","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"key":"175_CR15","doi-asserted-by":"publisher","unstructured":"Hajduska-D\u00e9r B, Kiss G, Sztah\u00f3 D, Vicsi K, Simon L (2022) The applicability of the beck depression inventory and Hamilton depression scale in the automatic recognition of depression based on speech signal processing. Frontiers in Psychiatry 13. https:\/\/doi.org\/10.3389\/fpsyt.2022.879896","DOI":"10.3389\/fpsyt.2022.879896"},{"issue":"4","key":"175_CR16","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1093\/occmed\/kqv043","volume":"65","author":"R Sharp","year":"2015","unstructured":"Sharp R (2015) The Hamilton rating scale for depression. Occup Med 65(4):340\u2013340. https:\/\/doi.org\/10.1093\/occmed\/kqv043","journal-title":"Occup Med"},{"issue":"1\/2","key":"175_CR17","doi-asserted-by":"publisher","first-page":"7","DOI":"10.17743\/jaes.2019.0045","volume":"68","author":"S Ntalampiras","year":"2020","unstructured":"Ntalampiras S (2020) Toward language-agnostic speech emotion recognition. J Audio Eng Soc 68(1\/2):7\u201313","journal-title":"J Audio Eng Soc"},{"key":"175_CR18","doi-asserted-by":"publisher","unstructured":"Ntalampiras S (2017) A transfer learning framework for predicting the emotional content of generalized sound events. J Acoust Soc Am 141(3):1694\u20131701. https:\/\/doi.org\/10.1121\/1.4977749https:\/\/pubs.aip.org\/asa\/jasa\/article-pdf\/141\/3\/1694\/15322960\/1694_1_online.pdf","DOI":"10.1121\/1.4977749"},{"key":"175_CR19","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.patrec.2021.01.018","volume":"144","author":"S Ntalampiras","year":"2021","unstructured":"Ntalampiras S (2021) Speech emotion recognition via learning analogies. Pattern Recogn Lett 144:21\u201326. https:\/\/doi.org\/10.1016\/j.patrec.2021.01.018","journal-title":"Pattern Recogn Lett"},{"key":"175_CR20","doi-asserted-by":"publisher","unstructured":"Ntalampiras S (2020) Deep learning of attitude in children\u2019s emotional speech. In: IEEE CIVEMSA, pp 1\u20135. https:\/\/doi.org\/10.1109\/CIVEMSA48639.2020.9132743","DOI":"10.1109\/CIVEMSA48639.2020.9132743"},{"key":"175_CR21","doi-asserted-by":"publisher","unstructured":"Ntalampiras S (2021) One-shot learning for acoustic diagnosis of industrial machines. Expert Syst Appl 178. https:\/\/doi.org\/10.1016\/j.eswa.2021.114984","DOI":"10.1016\/j.eswa.2021.114984"},{"issue":"10","key":"175_CR22","doi-asserted-by":"publisher","first-page":"4728","DOI":"10.1109\/JBHI.2023.3299341","volume":"27","author":"S Ntalampiras","year":"2023","unstructured":"Ntalampiras S (2023) Explainable Siamese neural network for classifying pediatric respiratory sounds. IEEE J Biomed Health Inf 27(10):4728\u20134735. https:\/\/doi.org\/10.1109\/JBHI.2023.3299341","journal-title":"IEEE J Biomed Health Inf"},{"key":"175_CR23","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1109\/TMM.2020.2978636","volume":"23","author":"S Tian","year":"2021","unstructured":"Tian S, Liu X, Liu M, Li S, Yin B (2021) Siamese tracking network with informative enhanced loss. IEEE Trans Multimed 23:120\u2013132. https:\/\/doi.org\/10.1109\/TMM.2020.2978636","journal-title":"IEEE Trans Multimed"},{"issue":"2","key":"175_CR24","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1109\/JSTSP.2019.2908700","volume":"13","author":"H Purwins","year":"2019","unstructured":"Purwins H, Li B, Virtanen T, Schl\u00fcter J, Chang S, Sainath T (2019) Deep learning for audio signal processing. IEEE J Sel Top Signal Process 13(2):206\u2013219. https:\/\/doi.org\/10.1109\/JSTSP.2019.2908700","journal-title":"IEEE J Sel Top Signal Process"},{"key":"175_CR25","doi-asserted-by":"crossref","unstructured":"Srivastava S, Wu HH, Rulff J, Fuentes M, Cartwright M, Silva C, Arora A, Bello JP (2022) A study on robustness to perturbations for representations of environmental sound. In: EUSIPCO, pp 125\u2013129","DOI":"10.23919\/EUSIPCO55093.2022.9909557"},{"issue":"41\u201342","key":"175_CR26","doi-asserted-by":"publisher","first-page":"30387","DOI":"10.1007\/s11042-020-09430-3","volume":"79","author":"S Ntalampiras","year":"2020","unstructured":"Ntalampiras S (2020) Emotional quantification of soundscapes by learning between samples. Multimed Tool Appl 79(41\u201342):30387\u201330395. https:\/\/doi.org\/10.1007\/s11042-020-09430-3","journal-title":"Multimed Tool Appl"},{"key":"175_CR27","doi-asserted-by":"publisher","unstructured":"Fedele A, Guidotti R, Pedreschi D (2022) Explaining Siamese networks in few-shot learning for audio data. In: Discovery Science, Springer, ???, pp 509\u2013524. https:\/\/doi.org\/10.1007\/978-3-031-18840-4_36","DOI":"10.1007\/978-3-031-18840-4_36"},{"key":"175_CR28","doi-asserted-by":"publisher","unstructured":"Heggan C, Budgett S, Hospedales T, Yaghoobi M (2022) MetaAudio: a few-shot audio classification benchmark. In: LNCS, Springer, ???, pp 219\u2013230. https:\/\/doi.org\/10.1007\/978-3-031-15919-0_19","DOI":"10.1007\/978-3-031-15919-0_19"},{"key":"175_CR29","unstructured":"Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistical Data Analysis Based on the L1-Norm and Related Methods, North-Holland, ??? pp 405\u2013416"},{"key":"175_CR30","unstructured":"Theodoridis S, Koutroumbas K pattern recognition, Third Edition. Academic Press, Inc., Orlando, FL, USA"},{"key":"175_CR31","doi-asserted-by":"publisher","unstructured":"Muzammel M, Salam H, Othmani A (2021) End-to-end multimodal clinical depression recognition using deep neural networks: a comparative analysis. Comput Method Programs Biomed 211. https:\/\/doi.org\/10.1016\/j.cmpb.2021.106433","DOI":"10.1016\/j.cmpb.2021.106433"},{"key":"175_CR32","doi-asserted-by":"publisher","unstructured":"Rehr R, Gerkmann T (2015) Cepstral noise subtraction for robust automatic speech recognition. In: ICASSP, pp 375\u2013378. https:\/\/doi.org\/10.1109\/ICASSP.2015.7177994","DOI":"10.1109\/ICASSP.2015.7177994"},{"key":"175_CR33","doi-asserted-by":"publisher","unstructured":"Sztah\u00f3 D, G\u00e1bor K, G\u00e1briel T (2021) Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning. In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. SCITEPRESS - Science and Technology Publications, ???. https:\/\/doi.org\/10.5220\/0010193101350141","DOI":"10.5220\/0010193101350141"},{"key":"175_CR34","doi-asserted-by":"publisher","unstructured":"Davey CG, Harrison BJ (2022) The self on its axis: a framework for understanding depression. Transl Psychiatry 12(1). https:\/\/doi.org\/10.1038\/s41398-022-01790-8","DOI":"10.1038\/s41398-022-01790-8"},{"key":"175_CR35","doi-asserted-by":"publisher","unstructured":"Egas-L\u00f3pez JV, Kiss G, Sztah\u00f3 D, Gosztolya G (2022) Automatic assessment of the degree of clinical depression from speech using x-vectors. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8502\u20138506. https:\/\/doi.org\/10.1109\/ICASSP43922.2022.9746068","DOI":"10.1109\/ICASSP43922.2022.9746068"},{"issue":"14","key":"175_CR36","doi-asserted-by":"publisher","first-page":"3046","DOI":"10.3390\/jcm10143046","volume":"10","author":"D Shin","year":"2021","unstructured":"Shin D, Cho W, Park C, Rhee S, Kim M, Lee H, Kim N, Ahn Y (2021) Detection of minor and major depression through voice as a biomarker using machine learning. J Clin Med 10(14):3046. https:\/\/doi.org\/10.3390\/jcm10143046","journal-title":"J Clin Med"},{"key":"175_CR37","doi-asserted-by":"publisher","unstructured":"Helfer BS, Quatieri TF, Williamson JR, Mehta DD, Horwitz R, Yu B (2013) Classification of depression state based on articulatory precision. In: Interspeech, pp 2172\u20132176. https:\/\/doi.org\/10.21437\/Interspeech.2013-513","DOI":"10.21437\/Interspeech.2013-513"},{"issue":"9","key":"175_CR38","doi-asserted-by":"publisher","first-page":"0238726","DOI":"10.1371\/journal.pone.0238726","volume":"15","author":"M Yamamoto","year":"2020","unstructured":"Yamamoto M, Takamiya A, Sawada K, Yoshimura M, Kitazawa M, Kc Liang, Fujita T, Mimura M, Kishimoto T (2020) Using speech recognition technology to investigate the association between timing-related speech features and depression severity. PLoS ONE 15(9):0238726. https:\/\/doi.org\/10.1371\/journal.pone.0238726","journal-title":"PLoS ONE"}],"container-title":["Journal of Healthcare Informatics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41666-024-00175-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41666-024-00175-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41666-024-00175-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T14:03:39Z","timestamp":1729692219000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41666-024-00175-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,3]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["175"],"URL":"https:\/\/doi.org\/10.1007\/s41666-024-00175-4","relation":{},"ISSN":["2509-4971","2509-498X"],"issn-type":[{"value":"2509-4971","type":"print"},{"value":"2509-498X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,3]]},"assertion":[{"value":"17 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 September 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 September 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 October 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval and Consent to Participate"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}