{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T23:43:33Z","timestamp":1780616613223,"version":"3.54.1"},"reference-count":63,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2024,2,26]],"date-time":"2024-02-26T00:00:00Z","timestamp":1708905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research programme","award":["769661"],"award-info":[{"award-number":["769661"]}]},{"name":"Health Research Board, Ireland","award":["769661"],"award-info":[{"award-number":["769661"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The ubiquity of digital technology has facilitated detailed recording of human behaviour. Ambient technology has been used to capture behaviours in a broad range of applications ranging from healthcare and monitoring to assessment of cooperative work. However, existing systems often face challenges in terms of autonomy, usability, and privacy. This paper presents a portable, easy-to-use and privacy-preserving system for capturing behavioural signals unobtrusively in home or in office settings. The system focuses on the capture of audio, video, and depth imaging. It is based on a device built on a small-factor platform that incorporates ambient sensors which can be integrated with the audio and depth video hardware for multimodal behaviour tracking. The system can be accessed remotely and integrated into a network of sensors. Data are encrypted in real time to ensure safety and privacy. We illustrate uses of the device in two different settings, namely, a healthy-ageing IoT application, where the device is used in conjunction with a range of IoT sensors to monitor an older person\u2019s mental well-being at home, and a healthcare communication quality assessment application, where the device is used to capture a patient\u2013clinician interaction for consultation quality appraisal. CUSCO can automatically detect active speakers, extract acoustic features, record video and depth streams, and recognise emotions and cognitive impairment with promising accuracy.<\/jats:p>","DOI":"10.3390\/s24051506","type":"journal-article","created":{"date-parts":[[2024,2,26]],"date-time":"2024-02-26T06:50:23Z","timestamp":1708930223000},"page":"1506","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["CUSCO: An Unobtrusive Custom Secure Audio-Visual Recording System for Ambient Assisted Living"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5297-3945","authenticated-orcid":false,"given":"Pierre","family":"Albert","sequence":"first","affiliation":[{"name":"National Institute for Public Health and the Environment, 3721 MA Bilthoven, The Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5150-3359","authenticated-orcid":false,"given":"Fasih","family":"Haider","sequence":"additional","affiliation":[{"name":"School of Engineering, The University of Edinburgh, Edinburgh EH9 3JW, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8430-7875","authenticated-orcid":false,"given":"Saturnino","family":"Luz","sequence":"additional","affiliation":[{"name":"Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh EH8 9YL, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,2,26]]},"reference":[{"key":"ref_1","first-page":"5","article-title":"Sensors, vision and networks: From video surveillance to activity recognition and health monitoring","volume":"11","author":"Prati","year":"2019","journal-title":"J. Ambient Intell. Smart Environ."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1038\/s41586-020-2669-y","article-title":"Illuminating the dark spaces of healthcare with ambient intelligence","volume":"585","author":"Haque","year":"2020","journal-title":"Nature"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"e115","DOI":"10.1016\/S2589-7500(20)30275-2","article-title":"Ethical issues in using ambient intelligence in health-care settings","volume":"3","author":"Luo","year":"2021","journal-title":"Lancet Digit. Health"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3447242","article-title":"A survey of ambient intelligence","volume":"54","author":"Dunne","year":"2021","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Paraschivoiu, I., Sypniewski, J., Lupp, A., G\u00e4rtner, M., Miteva, N., and Gospodinova, Z. (2020, January 25\u201329). Coaching Older Adults: Persuasive and Multimodal Approaches to Coaching for Daily Living. Proceedings of the Companion Publication of the 2020 International Conference on Multimodal Interaction, ICMI \u201920 Companion, New York, NY, USA.","DOI":"10.1145\/3395035.3425312"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"l161","DOI":"10.1136\/bmj.l161","article-title":"Using artificial intelligence to assess clinicians\u2019 communication skills","volume":"364","author":"Ryan","year":"2019","journal-title":"BMJ"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"e026254","DOI":"10.1136\/bmjopen-2018-026254","article-title":"Protocol for a conversation-based analysis study: PREVENT-ED investigates dialogue features that may help predict dementia onset in later life","volume":"9","author":"Ritchie","year":"2019","journal-title":"BMJ Open"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.pmcj.2009.04.001","article-title":"Ambient intelligence: Technologies, applications, and opportunities","volume":"5","author":"Cook","year":"2009","journal-title":"Pervasive Mob. Comput."},{"key":"ref_9","unstructured":"Renals, S. (2010, January 2\u20134). Recognition and understanding of meetings. Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Los Angeles, CA, USA."},{"key":"ref_10","first-page":"271","article-title":"Computer-Supported Human-Human Multilingual Communication","volume":"Volume 4850","author":"Lungarella","year":"2006","journal-title":"50 Years of Artificial Intelligence"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Praharaj, S., Scheffel, M., Schmitz, M., Specht, M., and Drachsler, H. (2021). Towards Automatic Collaboration Analytics for Group Speech Data Using Learning Analytics. Sensors, 21.","DOI":"10.3390\/s21093156"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"11435","DOI":"10.3390\/s120911435","article-title":"Enhancing health care delivery through ambient intelligence applications","volume":"12","author":"Kartakis","year":"2012","journal-title":"Sensors"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dawadi, P., Cook, D., Parsey, C., Schmitter-Edgecombe, M., and Schneider, M. (2011, January 21). An Approach to Cognitive Assessment in Smart Home. Proceedings of the 2011 Workshop on Data Mining for Medicine and Healthcare, DMMH \u201911, New York, NY, USA.","DOI":"10.1145\/2023582.2023592"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1049\/iet-cvi.2017.0119","article-title":"Event-driven system for fall detection using body-worn accelerometer and depth sensor","volume":"12","author":"Kepski","year":"2017","journal-title":"IET Comput. Vis."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Park, C., Mishra, R., Sharafkhaneh, A., Bryant, M.S., Nguyen, C., Torres, I., Naik, A.D., and Najafi, B. (2021). Digital Biomarker Representing Frailty Phenotypes: The Use of Machine Learning and Sensor-Based Sit-to-Stand Test. Sensors, 21.","DOI":"10.3390\/s21093258"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tegou, T., Kalamaras, I., Tsipouras, M., Giannakeas, N., Votis, K., and Tzovaras, D. (2019). A low-cost indoor activity monitoring system for detecting frailty in older adults. Sensors, 19.","DOI":"10.3390\/s19030452"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1002\/lio2.354","article-title":"Automated assessment of psychiatric disorders using speech: A systematic review","volume":"5","author":"Low","year":"2020","journal-title":"Laryngoscope Investig. Otolaryngol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"e595","DOI":"10.3399\/bjgp19X704573","article-title":"Comparing the content and quality of video, telephone, and face-to-face consultations: A non-randomised, quasi-experimental, exploratory study in UK primary care","volume":"69","author":"Hammersley","year":"2019","journal-title":"Br. J. Gen. Pract."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Habib, M., Faris, M., Qaddoura, R., Alomari, M., Alomari, A., and Faris, H. (2021). Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors, 21.","DOI":"10.3390\/s21093279"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"780169","DOI":"10.3389\/fcomp.2021.780169","article-title":"Alzheimer\u2019s Dementia Recognition through Spontaneous Speech","volume":"3","author":"Luz","year":"2021","journal-title":"Front. Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1007\/s00779-014-0821-0","article-title":"LAB-IN-A-BOX: Semi-automatic tracking of activity in the medical office","volume":"19","author":"Weibel","year":"2015","journal-title":"Pers. Ubiquitous Comput."},{"key":"ref_22","unstructured":"The European Parliament and the Council of the European Union (2024, February 20). Regulation (EU) 2016\/679 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95\/46\/EC (General Data Protection Regulation). Available online: https:\/\/eur-lex.europa.eu\/legal-content\/EN\/TXT\/PDF\/?uri=CELEX:32016R0679."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xing, X., Wu, H., Wang, L., Stenson, I., Yong, M., Ser, J.D., Walsh, S., and Yang, G. (ACM Comput. Surv., 2023). Non-imaging medical data synthesis for trustworthy AI: A comprehensive survey, ACM Comput. Surv., in press.","DOI":"10.1145\/3614425"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zheng, M., Xu, D., Jiang, L., Gu, C., Tan, R., and Cheng, P. (2019, January 10). Challenges of privacy-preserving machine learning in IoT. Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things, New York, NY, USA.","DOI":"10.1145\/3363347.3363357"},{"key":"ref_25","unstructured":"Biester, L., Demszky, D., Jin, Z., Sachan, M., Tetreault, J., Wilson, S., Xiao, L., and Zhao, J. (2022). Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI), Abu Dhabi, United Arab Emirates, 7 December 2022, Association for Computational Linguistics."},{"key":"ref_26","unstructured":"Stross-Radschinski, A.C. (2024, February 20). Python Brochure Vol. 1. Available online: https:\/\/brochure.getpython.info\/media\/releases\/psf-python-brochure-vol.-i-final-download.pdf\/view."},{"key":"ref_27","unstructured":"Cao, M., Tso, T.Y., Pulavarty, B., Bhattacharya, S., Dilger, A., and Tomas, A. (2005, January 20\u201323). State of the art: Where we are with the ext3 filesystem. Proceedings of the Ottawa Linux Symposium (OLS), Citeseer, Ottawa, ON, Canada."},{"key":"ref_28","unstructured":"NIST (2024, February 20). Descriptions of SHA-256, SHA-384, and SHA-512. Available online: https:\/\/eips.ethereum.org\/assets\/eip-2680\/sha256-384-512.pdf."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Park, T.J., Kanda, N., Dimitriadis, D., Han, K.J., Watanabe, S., and Narayanan, S. (2021). A Review of Speaker Diarization: Recent Advances with Deep Learning. arXiv.","DOI":"10.1016\/j.csl.2021.101317"},{"key":"ref_30","unstructured":"NIST (2024, February 20). Rich Transcription Evaluation Project, Available online: https:\/\/www.nist.gov\/itl\/iad\/mig\/rich-transcription-evaluation\/."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1023\/A:1007506220214","article-title":"Statistical models for text segmentation","volume":"34","author":"Beeferman","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1162\/089120102317341756","article-title":"A critique and improvement of an evaluation metric for text segmentation","volume":"28","author":"Pevzner","year":"2002","journal-title":"Comput. Linguist."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"17:1","DOI":"10.1145\/2328967.2328970","article-title":"The non-Verbal Structure of Patient Case Discussions in Multidisciplinary Medical Team Meetings","volume":"30","author":"Luz","year":"2012","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Haider, F., and Luz, S. (2019, January 12\u201317). Attitude Recognition Using Multi-resolution Cochleagram Features. Proceedings of the ICASSP 2019\u20142019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682974"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1109\/TAFFC.2015.2457417","article-title":"The Geneva minimalistic acoustic parameter set GeMAPS for voice research and affective computing","volume":"7","author":"Eyben","year":"2016","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Hantke, S., Weninger, F., Kurle, R., Ringeval, F., Batliner, A., Mousa, A.-D., and Schuller, B. (2016). I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0154486"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1109\/JSTSP.2019.2955022","article-title":"An assessment of paralinguistic acoustic features for detection of Alzheimer\u2019s dementia in spontaneous speech","volume":"14","author":"Haider","year":"2019","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_38","unstructured":"Eyben, F., Weninger, F., Gro\u00df, F., and Schuller, B. (2013). Proceedings of the 21st ACM International Conference on Multimedia, ACM."},{"key":"ref_39","unstructured":"(2023, December 06). Auditok, an AUDIo TOKenization Tool\u2014Auditok v0.2.0 Documentation. Available online: https:\/\/pypi.org\/project\/auditok\/."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"636","DOI":"10.3390\/ai2040038","article-title":"User identity protection in automatic emotion recognition through disguised speech","volume":"2","author":"Haider","year":"2021","journal-title":"AI"},{"key":"ref_41","first-page":"260","article-title":"Computer-based evaluation of Alzheimer\u2019s disease and mild cognitive impairment patients during a picture description task","volume":"10","year":"2018","journal-title":"Alzheimer\u2019s Dement. Diagn. Asses. Dis. Mon."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Mirheidari, B., Blackburn, D., Walker, T., Venneri, A., Reuber, M., and Christensen, H. (2018, January 2\u20136). Detecting Signs of Dementia Using Word Vector Representations. Proceedings of the INSTERSPEECH, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1764"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"407","DOI":"10.3233\/JAD-150520","article-title":"Linguistic features identify Alzheimer\u2019s disease in narrative speech","volume":"49","author":"Fraser","year":"2016","journal-title":"J. Alzheimer\u2019s Dis."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Yancheva, M., and Rudzicz, F. (2016, January 7\u201312). Vector-space topic models for detecting Alzheimer\u2019s disease. Proceedings of the ACL, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1221"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"87687F","DOI":"10.1117\/12.2009926","article-title":"Kinect based body posture detection and recognition system","volume":"Volume 8768","author":"Pisharady","year":"2013","journal-title":"Proceedings of the International Conference on Graphic and Image Processing (ICGIP 2012)"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1357","DOI":"10.1109\/TCYB.2013.2275945","article-title":"Real-Time Posture Reconstruction for Microsoft Kinect","volume":"43","author":"Shum","year":"2013","journal-title":"IEEE Trans. Cybern."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Patsadu, O., Nukoolkit, C., and Watanapa, B. (June, January 30). Human gesture recognition using Kinect camera. Proceedings of the 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand.","DOI":"10.1109\/JCSSE.2012.6261920"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Martin, M., Camp, F.v.d., and Stiefelhagen, R. (2014, January 8\u201311). Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras. Proceedings of the 2014 2nd International Conference on 3D Vision, Tokyo, Japan.","DOI":"10.1109\/3DV.2014.54"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"841","DOI":"10.3758\/BRM.41.3.841","article-title":"Coding gestural behavior with the NEUROGES-ELAN system","volume":"41","author":"Lausberg","year":"2009","journal-title":"Behav. Res. Methods"},{"key":"ref_50","unstructured":"(2019). MATLAB, The MathWorks Inc.. version 9.6 (R2019a)."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Truong, K.P., van Leeuwen, D.A., Neerincx, M.A., and de Jong, F.M.G. (2009, January 6\u201310). Arousal and valence prediction in spontaneous emotional speech: Felt versus perceived emotion. Proceedings of the Interspeech 2009, Brighton, UK.","DOI":"10.21437\/Interspeech.2009-583"},{"key":"ref_52","unstructured":"Meignier, S., and Merlin, T. (2010, January 20). LIUM SpkDiarization: An open source toolkit for diarization. Proceedings of the CMU SPUD Workshop, Dallas, TX, USA."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"e209644","DOI":"10.1001\/jamanetworkopen.2020.9644","article-title":"Evaluation of a Patient-Collected Audio Audit and Feedback Quality Improvement Program on Clinician Attention to Patient Life Context and Health Care Costs in the Veterans Affairs Health Care System","volume":"3","author":"Weiner","year":"2020","journal-title":"JAMA Netw. Open"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/S0738-3991(98)00063-9","article-title":"Patient-physician communication assessment instruments: 1986 to 1996 in review","volume":"35","author":"Boon","year":"1998","journal-title":"Patient Educ. Couns."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1016\/j.pec.2016.12.029","article-title":"Does patient coaching make a difference in patient-physician communication during specialist consultations? A systematic review","volume":"100","author":"Alders","year":"2017","journal-title":"Patient Educ. Couns."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"e121","DOI":"10.2196\/resprot.7735","article-title":"Sharing annotated audio recordings of clinic visits with patients\u2014Development of the open recording automated logging system (ORALS): Study protocol","volume":"6","author":"Barr","year":"2017","journal-title":"JMIR Res. Protoc."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"e35325","DOI":"10.2196\/35325","article-title":"Audio Recording Patient-Nurse Verbal Communications in Home Health Care Settings: Pilot Feasibility and Usability Study","volume":"9","author":"Zolnoori","year":"2022","journal-title":"JMIR Hum. Factors"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"e11308","DOI":"10.2196\/11308","article-title":"Audio-videorecording clinic visits for patient\u2019s personal use in the United States: Cross-sectional survey","volume":"20","author":"Barr","year":"2018","journal-title":"J. Med. Internet Res."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1001\/jama.2017.7511","article-title":"Can Patients Make Recordings of Medical Encounters?: What Does the Law Say?","volume":"318","author":"Elwyn","year":"2017","journal-title":"JAMA"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"e345","DOI":"10.3399\/bjgp17X690521","article-title":"The \u2018One in a Million\u2019 study: Creating a database of UK primary care consultations","volume":"67","author":"Jepson","year":"2017","journal-title":"Br. J. Gen. Pract."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1109\/OJSP.2023.3267269","article-title":"Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection","volume":"4","author":"Sharma","year":"2023","journal-title":"IEEE Open J. Signal Process."},{"key":"ref_62","unstructured":"Cohen-Cole, S.A. (1991). The Medical Interview: The Three-Function Approach, Karger Publishers."},{"key":"ref_63","first-page":"327","article-title":"Practitioners\u2019 use of non-verbal behaviour in real consultations","volume":"30","author":"Byrne","year":"1980","journal-title":"J. R. Coll. Gen. Pract."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/5\/1506\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:05:00Z","timestamp":1760105100000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/5\/1506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,26]]},"references-count":63,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["s24051506"],"URL":"https:\/\/doi.org\/10.3390\/s24051506","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,26]]}}}