{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:13:58Z","timestamp":1760058838319,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T00:00:00Z","timestamp":1746489600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Pedestrian Detection via Robust Object Appearance Modeling","award":["62276118","61772244"],"award-info":[{"award-number":["62276118","61772244"]}]},{"name":"Visual Tracking via Robust Object Appearance Modeling","award":["62276118","61772244"],"award-info":[{"award-number":["62276118","61772244"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Speech lie detection is a technique that analyzes speech signals in detail to determine whether a speaker is lying. It has significant application value and has attracted attention from various fields. However, existing speech lie detection algorithms still have certain limitations. These algorithms fail to fully explore manually extracted features based on prior knowledge and also neglect the dynamic characteristics of speech as well as the impact of temporal context, resulting in reduced detection accuracy and generalization. To address these issues, this paper proposes a multi-feature speech lie detection algorithm based on the dual-stream deep architecture (DDA-MSLD).This algorithm employs a dual-stream structure to learn different types of features simultaneously. Firstly, it combines a gated recurrent unit (GRU) network with the attention mechanism. This combination enables the network to more comprehensively capture the context of speech signals and focus on the parts that are more critical for lie detection. It can perform in-depth sequence pattern analysis on manually extracted static prosodic features and nonlinear dynamic features, obtaining high-order dynamic features related to lies. Secondly, the encoder part of the transformer is used to simultaneously capture the macroscopic structure and microscopic details of speech signals, specifically for high-precision feature extraction of Mel spectrogram features of speech signals, obtaining deep features related to lies. This dual-stream structure processes various features of speech simultaneously, describing the subjective state of speech signals from different perspectives and thereby improving detection accuracy and generalization. Experiments were conducted on the multi-person scenario lie detection dataset CSC, and the results show that this algorithm outperformed existing state-of-the-art algorithms in detection performance. Considering the significant differences in lie speech in different lying scenarios, and to further evaluate the algorithm\u2019s generalization performance, a single-person scenario Chinese lie speech dataset Local was constructed, and experiments were conducted on it. The results indicate that the algorithm has a strong generalization ability in different scenarios.<\/jats:p>","DOI":"10.3390\/info16050386","type":"journal-article","created":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T09:08:56Z","timestamp":1746522536000},"page":"386","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DDA-MSLD: A Multi-Feature Speech Lie Detection Algorithm Based on a Dual-Stream Deep Architecture"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-0716-6302","authenticated-orcid":false,"given":"Pengfei","family":"Guo","sequence":"first","affiliation":[{"name":"School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5435-5961","authenticated-orcid":false,"given":"Shucheng","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7417-5265","authenticated-orcid":false,"given":"Mingxing","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, Jingjiang College, Jiangsu University, Zhenjiang 212013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1016\/j.neucom.2014.04.083","article-title":"Deception detecting from speech signal using relevance vector machine and non-linear dynamics features","volume":"151","author":"Zhou","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1037\/h0069988","article-title":"Changes of blood pressure and respiration during deception","volume":"6","author":"Landis","year":"1926","journal-title":"J. Comp. Psychol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1177\/1529100610390861","article-title":"Pitfalls and opportunities in nonverbal and verbal lie detection","volume":"11","author":"Vrij","year":"2010","journal-title":"Psychol. Sci. Public Interest"},{"key":"ref_4","unstructured":"Graciarena, M., Shriberg, E., Stolcke, A., Enos, F., Hirschberg, J., and Kajarekar, S. (2006, January 14\u201319). Combining prosodic lexical and cepstral systems for deceptive speech detection. Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Toulouse, France."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/BF00998267","article-title":"Invited article: Face, voice, and body in detecting deceit","volume":"15","author":"Ekman","year":"1991","journal-title":"J. Nonverbal Behav."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1037\/0033-2909.129.1.74","article-title":"Cues to deception","volume":"129","author":"DePaulo","year":"2003","journal-title":"Psychol. Bull."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.arcontrol.2005.01.001","article-title":"Control of chaos: Methods and applications in engineering","volume":"29","author":"Fradkov","year":"2005","journal-title":"Annu. Rev. Control"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Krajewski, J., and Kr\u00f6ger, B.J. (2007, January 27\u201331). Using prosodic and spectral characteristics for sleepiness detection. Proceedings of the INTERSPEECH, Antwerp, Belgium.","DOI":"10.21437\/Interspeech.2007-513"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Zhao, H., and Pan, X. (2015, January 20\u201323). Lie detection from speech analysis based on k\u2013svd deep belief network model. Proceedings of the Intelligent Computing Theories and Methodologies: 11th International Conference, ICIC 2015, Fuzhou, China.","DOI":"10.1007\/978-3-319-22180-9_19"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Srivastava, N., and Dubey, S. (2018, January 29\u201331). Deception detection using artificial neural network and support vector machine. Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India.","DOI":"10.1109\/ICECA.2018.8474706"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1016\/j.neucom.2017.07.050","article-title":"Speech emotion recognition based on feature selection and extreme learning machine decision tree","volume":"273","author":"Liu","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Levitan, S.I., An, G., Wang, M., Mendels, G., Hirschberg, J., Levine, M., and Rosenberg, A. (2015, January 13). Cross-cultural production and detection of deception from speech. Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, Seattle, WA, USA.","DOI":"10.1145\/2823465.2823468"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mannepalli, K., Sastry, P.N., and Suman, M. (2018). Analysis of emotion recognition system for Telugu using prosodic and formant features. Speech and Language Processing for Human-Machine Communications: Proceedings of CSI 2015, Springer.","DOI":"10.1007\/978-981-10-6626-9_15"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dai, J.B., Sun, L.X., and Shen, X.B. (2021, January 18\u201320). Research on speech spoofing detection based on big data and machine learning. Proceedings of the 2021 2nd International Conference on Artificial Intelligence and Education (ICAIE), Dali, China.","DOI":"10.1109\/ICAIE53562.2021.00036"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/lcrp.12148","article-title":"\u2018Language of lies\u2019: Urgent issues and prospects in verbal lie detection research","volume":"24","author":"Nahari","year":"2019","journal-title":"Legal Criminol. Psychol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Merkx, D., Frank, S.L., and Ernestus, M. (2019). Language learning using speech to image retrieval. arXiv.","DOI":"10.21437\/Interspeech.2019-3067"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gopalan, K., and Wenndt, S. (2007, January 1\u20134). Speech analysis using modulation-based features for detecting deception. Proceedings of the 2007 15th International Conference on Digital Signal Processing, Cardiff, UK.","DOI":"10.1109\/ICDSP.2007.4288658"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mathur, L., and Matari\u0107, M.J. (2021, January 15\u201318). Affect\u2013aware deep belief network representations for multimodal unsupervised deception detection. Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India.","DOI":"10.1109\/FG52635.2021.9667050"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1109\/89.506935","article-title":"Feature analysis and neural network-based classification of speech under stress","volume":"4","author":"Hansen","year":"1996","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Levitan, S.I., Maredia, A., and Hirschberg, J. (2018). Linguistic cues to deception and perceived deception in interview dialogues. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/N18-1176"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mittal, T., Bhattacharya, U., Chandra, R., Bera, A., and Manocha, D. (2020, January 12\u201316). Emotions don\u2019t lie: An audio\u2013visual deepfake detection method using affective cues. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413570"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1162\/tacl_a_00311","article-title":"Acoustic-prosodic and lexical cues to deception and trust: Deciphering how people detect lies","volume":"8","author":"Chen","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Levitan, S.I., Maredia, A., and Hirschberg, J. (2018, January 2\u20136). Acoustic\u2013Prosodic Indicators of Deception and Trust in Interview Dialogues. Proceedings of the Interspeech 2018, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-2443"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1177\/0963721410391245","article-title":"Anders; Mann, S.; Leal, S. Outsmarting the liars: Toward a cognitive lie detection approach","volume":"20","author":"Vrij","year":"2011","journal-title":"Curr. Dir. Psychol. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"622","DOI":"10.1587\/transfun.2020EAL2051","article-title":"A novel hybrid network model based on attentional multi-feature fusion for deception detection","volume":"104","author":"Fang","year":"2021","journal-title":"IEICE Trans. Fundam. Electron. Commun. Comput. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fu, H., Lei, P., Tao, H., Zhao, L., and Yang, J. (2019). Improved semi-supervised autoencoder for deception detection. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0223361"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Mendels, G., Levitan, S.I., Lee, K.-Z., and Hirschberg, J. (2017, January 20\u201324). Hybrid Acoustic\u2013Lexical Deep Learning Approach for Deception Detection. Proceedings of the Interspeech 2017, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-1723"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1002\/acp.3288","article-title":"Baselining as a lie detection method","volume":"30","author":"Vrij","year":"2016","journal-title":"Appl. Cogn. Psychol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"76527","DOI":"10.1109\/ACCESS.2018.2882917","article-title":"Convolutional bidirectional long short-term memory for deception detection with acoustic features","volume":"6","author":"Xie","year":"2018","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1109\/TASLP.2017.2759338","article-title":"Semisupervised autoencoders for speech emotion recognition","volume":"26","author":"Deng","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2697","DOI":"10.1109\/TASLP.2020.3023632","article-title":"Semi-supervised speech emotion recognition with ladder networks","volume":"28","author":"Parthasarathy","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/s11460-005-0023-7","article-title":"Selection of embedding dimension and delay time in phase space reconstruction","volume":"1","author":"Ma","year":"2006","journal-title":"Front. Electr. Electron. Eng. China"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1016\/j.jarmac.2021.06.002","article-title":"Deception and lie detection in the courtroom: The effect of defendants wearing medical face masks","volume":"10","author":"Vrij","year":"2021","journal-title":"J. Appl. Res. Mem. Cogn."},{"key":"ref_34","unstructured":"Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., and Kaiser, L. (2020). Rethinking attention with performers. arXiv."},{"key":"ref_35","unstructured":"Sun, M., Gao, M., Kang, X., Wang, S., Du, J., Yao, D., and Wang, S.-J. (2023). CDSD: Chinese Dysarthria Speech Database. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Nugroho, R.H., Nasrun, M., and Setianingsih, C. (2017, January 26\u201328). Lie detector with pupil dilation and eye blinks using hough transform and frame difference method with fuzzy logic. Proceedings of the 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC), Yogyakarta, Indonesia.","DOI":"10.1109\/ICCEREC.2017.8226697"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Huang, C.-H., Chou, H.-C., Wu, Y.-T., Lee, C.-C., and Liu, Y.-W. (2019, January 15\u201319). Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game. Proceedings of the Interspeech 2019, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-2216"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"e5","DOI":"10.1017\/ATSIP.2021.6","article-title":"Automatic deception detection using multiple speech and language communicative descriptors in dialogs","volume":"10","author":"Chou","year":"2021","journal-title":"APSIPA Trans. Signal Inf. Process."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Levitan, S.I., An, G., Ma, M., Levitan, R., Rosenberg, A., and Hirschberg, J. (2016, January 8\u201312). Combining Acoustic\u2013Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. Proceedings of the Interspeech 2016, San Francisco, CA, USA.","DOI":"10.21437\/Interspeech.2016-1519"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/386\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:27:47Z","timestamp":1760030867000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/386"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,6]]},"references-count":39,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["info16050386"],"URL":"https:\/\/doi.org\/10.3390\/info16050386","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,5,6]]}}}