{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T17:09:29Z","timestamp":1768410569621,"version":"3.49.0"},"reference-count":62,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T00:00:00Z","timestamp":1757548800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:p>Developing high-quality training data is essential for tailoring large language models (LLMs) to specialized applications like mental health. To address privacy and legal constraints associated with real patient data, we designed a synthetic patient and interview generation framework that can be tailored to regional patient demographics. This system employs two locally run instances of Llama 3.3:70B: one as the interviewer and the other as the patient. These models produce contextually rich interview transcripts, structured by a customizable question bank, with lexical diversity similar to normal human conversation. We calculate median Distinct-1 scores of 0.44 and 0.33 for the patient and interview assistant model outputs respectively compared to 0.50\u2009\u00b1\u20090.11 as the average for 10,000 episodes of a radio program dialog. Central to this approach is the patient generation process, which begins with a locally run Llama 3.3:70B model. Given the full question bank, the model generates a detailed profile template, combining predefined variables (e.g., demographic data or specific conditions) with LLM-generated content to fill in contextual details. This hybrid method ensures that each patient profile is both diverse and realistic, providing a strong foundation for generating dynamic interactions. Demographic distributions of generated patient profiles were not significantly different from real-world population data and exhibited expected variability. Additionally, for the patient profiles we assessed LLM metrics and found an average Distinct-1 score of 0.8 (max\u2009=\u20091) indicating diverse word usage. By integrating detailed patient generation with dynamic interviewing, the framework produces synthetic datasets that may aid the adoption and deployment of LLMs in mental health settings.<\/jats:p>","DOI":"10.3389\/fdgth.2025.1625444","type":"journal-article","created":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T05:24:52Z","timestamp":1757568292000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Synthetic patient and interview transcript creator: an essential tool for LLMs in mental health"],"prefix":"10.3389","volume":"7","author":[{"given":"Aleyna","family":"Warner","sequence":"first","affiliation":[]},{"given":"Jeffrey","family":"LeDue","sequence":"additional","affiliation":[]},{"given":"Yutong","family":"Cao","sequence":"additional","affiliation":[]},{"given":"Joseph","family":"Tham","sequence":"additional","affiliation":[]},{"given":"Timothy H.","family":"Murphy","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,9,11]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"57","DOI":"10.3390\/informatics11030057","article-title":"Large language models in healthcare and medical domain: a review","volume":"11","author":"Nazi","year":"2024","journal-title":"Informatics"},{"key":"B2","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1002\/hcs2.61","article-title":"Large language models in health care: development, applications, and challenges","volume":"2","author":"Yang","year":"2023","journal-title":"Health Care Sci"},{"key":"B3","doi-asserted-by":"publisher","first-page":"e58418","DOI":"10.2196\/58418","article-title":"Aligning large language models for enhancing psychiatric interviews through symptom delineation and summarization: pilot study","volume":"8","author":"So","year":"2024","journal-title":"JMIR Form Res"},{"key":"B4","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1038\/s41746-024-01157-x","article-title":"The ethics of ChatGPT in medicine and healthcare: a systematic review on large language models (LLMs)","volume":"7","author":"Haltaufderheide","year":"2024","journal-title":"NPJ Digit Med"},{"key":"B5","doi-asserted-by":"publisher","first-page":"2892","DOI":"10.1016\/j.csbj.2024.07.005","article-title":"Synthetic data generation methods in healthcare: a review on open-source tools and methods","volume":"23","author":"Pezoulas","year":"2024","journal-title":"Comput Struct Biotechnol J"},{"key":"B6","doi-asserted-by":"publisher","first-page":"105763","DOI":"10.1016\/j.ijmedinf.2024.105763","article-title":"Synthetic data generation in healthcare: a scoping review of reviews on domains, motivations, and future applications","volume":"195","author":"Rujas","year":"2025","journal-title":"Int J Med Inf"},{"key":"B7","article-title":"Indigenous identity by Registered or Treaty Indian status: Canada, provinces and territories, census divisions and census subdivisions","year":"2022"},{"key":"B8","article-title":"Visible minority and population group by generation status: Canada, provinces and territories, census metropolitan areas and census agglomerations with parts","year":"2022"},{"key":"B9","article-title":"Population by five-year age groups and gender, Metro Vancouver A (Regional district electoral area), 2021","year":"2022"},{"key":"B10","article-title":"Marital status, age group and gender: Canada, provinces and territories and economic regions","year":"2023"},{"key":"B11","article-title":"Chapter 5\u2014Living apart is increasingly common among couples. The Vanier Institute of the Family","year":"2024"},{"key":"B12","article-title":"Census families by age of older partner or parent and number of children","year":""},{"key":"B13","article-title":"Population estimates on July 1, by age and gender","year":""},{"key":"B14","article-title":"Right-handed, left-handed or ambidextrous?","year":""},{"key":"B15","article-title":"Doc Deficits: Half of Canadians either can\u2019t find a doctor or can't get a timely appointment with the one they have","year":""},{"key":"B16","article-title":"Health fact sheets","year":""},{"key":"B17","doi-asserted-by":"publisher","first-page":"E136","DOI":"10.1097\/HTR.0000000000000534","article-title":"Self-reported lifetime concussion among adults: comparison of 3 different survey questions","volume":"35","author":"Daugherty","year":"2020","journal-title":"J Head Trauma Rehabil"},{"key":"B18","article-title":"Canadian Epilepsy Alliance","year":""},{"key":"B19","article-title":"Profile table, Census Profile, 2021 Census of Population\u2014Vancouver, City (CY) [Census subdivision], British Columbia","year":"2022"},{"key":"B20","article-title":"Census Profile, 2016 Census\u2014Vancouver [Census metropolitan area], British Columbia and British Columbia [Province]","year":"2017"},{"key":"B21","article-title":"New data on disability in Canada, 2022","year":"2023"},{"key":"B22","article-title":"Canadian Survey on Disability, 2017\u20132022","year":"2023"},{"key":"B23","article-title":"Canada Disability Benefit","author":"Vision","year":""},{"key":"B24","article-title":"B.C. Public School Results School District: Foundation Skills Assessment","year":""},{"key":"B25","article-title":"B.C. Public School Results School District: Graduation Assessments","year":""},{"key":"B26","article-title":"Distribution of the population aged 25\u201364 by highest certificate, diploma or degree, Greater Vancouver [CD], British Columbia [PR] and Canada, 2021","year":"2022"},{"key":"B27","article-title":"Focus on Geography Series, 2021 Census\u2014Vancouver (Census metropolitan area)","year":"2022"},{"key":"B28","article-title":"Distribution (in percentage) of marital status, total population aged 15 and older, Vancouver (CMA), 2021","year":"2022"},{"key":"B29","article-title":"Data Catalogue","year":""},{"key":"B30","article-title":"Faker","author":"Faraglia","year":"2025"},{"key":"B31","article-title":"A Guide to Controlling LLM Model Output: Exploring Top-k, Top-p, and Temperature Parameters. Medium","author":"Singh","year":"2023"},{"key":"B32","article-title":"The Curious Case of Neural Text Degeneration","author":"Holtzman","year":"2019"},{"key":"B33","article-title":"Language Models are Few-Shot Learners","author":"Brown","year":"2020"},{"key":"B34","article-title":"Setting Top-K, Top-P and Temperature in LLMs. Medium","year":"2024"},{"key":"B35","first-page":"110","article-title":"A diversity-promoting objective function for neural conversation models","author":"Li","year":"2016"},{"key":"B36","article-title":"INTERVIEW: NPR Media Dialog Transcripts","author":"Li","year":"2020"},{"key":"B37","doi-asserted-by":"publisher","first-page":"e24164","DOI":"10.1016\/j.heliyon.2024.e24164","article-title":"Identifying and handling data bias within primary healthcare data using synthetic data generators","volume":"10","author":"Draghia","year":"2024","journal-title":"Heliyon"},{"key":"B38","article-title":"The Llama 3 herd of models","author":"Grattafiori","year":"2024"},{"key":"B39","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1038\/s41386-020-0767-z","article-title":"Deep learning for small and big data in psychiatry","volume":"46","author":"Koppe","year":"2021","journal-title":"Neuropsychopharmacology"},{"key":"B40","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.euroneuro.2022.08.001","article-title":"Ethical considerations for precision psychiatry: a roadmap for research and clinical practice","volume":"63","author":"Fusar-Poli","year":"2022","journal-title":"Eur Neuropsychopharmacol"},{"key":"B41","doi-asserted-by":"publisher","first-page":"50","DOI":"10.3389\/fpsyt.2016.00050","article-title":"Detecting neuroimaging biomarkers for psychiatric disorders: sample size matters","volume":"7","author":"Schnack","year":"2016","journal-title":"Front Psychiatry"},{"key":"B42","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1176\/appi.ajp.21070758","article-title":"Lack of representation in psychiatric research: a data-driven example from scientific articles published in 2019 and 2020 in the American Journal of Psychiatry","volume":"179","author":"Pedersen","year":"2022","journal-title":"Am J Psychiatry"},{"key":"B43","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1177\/2515245917749652","article-title":"Enabling open-science initiatives in clinical psychology and psychiatry without sacrificing patients\u2019 privacy: current practices and future challenges","volume":"1","author":"Walsh","year":"2018","journal-title":"Adv Methods Pract Psychol Sci"},{"key":"B44","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1186\/s12874-020-00977-1","article-title":"Generation and evaluation of synthetic patient data","volume":"20","author":"Goncalves","year":"2020","journal-title":"BMC Med Res Methodol"},{"key":"B45","doi-asserted-by":"publisher","first-page":"e28071","DOI":"10.1371\/journal.pone.0028071","article-title":"A systematic review of re-identification attacks on health data","volume":"6","author":"El Emam","year":"2011","journal-title":"PLoS One"},{"key":"B46","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1038\/s41746-023-00927-3","article-title":"Harnessing the power of synthetic data in healthcare: innovation, application, and privacy","volume":"6","author":"Giuffr\u00e9","year":"2023","journal-title":"NPJ Digit Med"},{"key":"B47","doi-asserted-by":"publisher","first-page":"3069","DOI":"10.1038\/s41467-019-10933-3","article-title":"Estimating the success of re-identifications in incomplete datasets using generative models","volume":"10","author":"Rocher","year":"2019","journal-title":"Nat Commun"},{"key":"B48","doi-asserted-by":"publisher","first-page":"3909","DOI":"10.3390\/electronics13193909","article-title":"Bias mitigation via synthetic data generation: a review","volume":"13","author":"Shahul Hameed","year":"2024","journal-title":"Electronics (Basel)"},{"key":"B49","article-title":"ClinicalBERT: Modeling clinical notes and predicting hospital readmission","author":"Huang","year":"2019"},{"key":"B50","article-title":"Towards expert-level medical question answering with large language models","author":"Singhal","year":"2023"},{"key":"B51","doi-asserted-by":"publisher","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"B52","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"key":"B53","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1136\/bmjinnov-2023-001110","article-title":"Conversational AI facilitates mental health assessments and is associated with improved recovery rates","volume":"10","author":"Rollwage","year":"2024","journal-title":"BMJ Innov"},{"key":"B54","article-title":"Human-AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support","author":"Sharma","year":"2022"},{"key":"B55","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1145\/3701041","article-title":"Improving workplace well-being in modern organizations: a review of large language model-based mental health chatbots","volume":"16","author":"Yuan","year":"2024","journal-title":"ACM Trans Manag Inf Syst"},{"key":"B56","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41586-025-08866-7","article-title":"Towards conversational diagnostic artificial intelligence","volume":"642","author":"Tu","year":"2025","journal-title":"Nature"},{"key":"B57","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1080\/15265161.2022.2048739","article-title":"Conversational artificial intelligence in psychotherapy: a new therapeutic tool or agent?","volume":"23","author":"Sedlakova","year":"2023","journal-title":"Am J Bioeth"},{"key":"B58","doi-asserted-by":"publisher","first-page":"e57400","DOI":"10.2196\/57400","article-title":"Large language models for mental health applications: systematic review","volume":"11","author":"Guo","year":"2024","journal-title":"JMIR Ment Health"},{"key":"B59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3232077","article-title":"Trusting virtual agents: the effect of personality","volume":"9","author":"Zhou","year":"2019","journal-title":"ACM Trans Interact Intell Syst"},{"key":"B60","doi-asserted-by":"publisher","first-page":"e0000082","DOI":"10.1371\/journal.pdig.0000082","article-title":"Synthetic data in health care: a narrative review","volume":"2","author":"Gonzales","year":"2023","journal-title":"PLOS Digit Health"},{"key":"B61","article-title":"LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation","author":"Chen","year":"2023"},{"key":"B62","first-page":"95","article-title":"A review of synthetic data generation methods for privacy preserving data publishing","volume":"6","author":"Surendra","year":"2017","journal-title":"Int J Sci Technol Res"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1625444\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T05:24:53Z","timestamp":1757568293000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1625444\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,11]]},"references-count":62,"alternative-id":["10.3389\/fdgth.2025.1625444"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1625444","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,11]]},"article-number":"1625444"}}