{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T06:47:42Z","timestamp":1770706062864,"version":"3.49.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686080","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T00:00:00Z","timestamp":1754524800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,7]]},"abstract":"<jats:p>Developmental and Epileptic Encephalopathies (DEEs) are rare, severe conditions often discussed by families on social media, offering valuable insights into their experiences. Identifying these messages amidst unrelated content is crucial but challenging due to data imbalance. This study evaluates different uses of generative large language models (LLMs) for binary classification of DEE-related experiences within social media posts. Using CamemBERT as a baseline, we compared two strategies: zero-shot prompt-based classification and synthetic data generation for minority class augmentation. While zero-shot prompting underperformed, the addition of 2% synthetic data improved all metrics (macro\/positive F1, precision and recall). Higher proportions of synthetic data led to decreased precision. These findings underscore the potential of hybrid approaches combining fine-tuning and domain-specific synthetic data for addressing data imbalance in rare disease contexts. Further validation across models and datasets is needed.<\/jats:p>","DOI":"10.3233\/shti250927","type":"book-chapter","created":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T11:35:30Z","timestamp":1754566530000},"source":"Crossref","is-referenced-by-count":1,"title":["Can Generative LLMs Help Classify Imbalanced Real-World Data? Exploring Rare Diseases on Social Media"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-0958-6949","authenticated-orcid":false,"given":"Emma","family":"Le Priol","sequence":"first","affiliation":[{"name":"Clinical Bio-Informatics Laboratory, Universit\u00e9 Paris Cit\u00e9, INSERM UMR 1163, Imagine Institute, Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-5160-7927","authenticated-orcid":false,"given":"Juliette","family":"Potier","sequence":"additional","affiliation":[{"name":"Clinical Bio-Informatics Laboratory, Universit\u00e9 Paris Cit\u00e9, INSERM UMR 1163, Imagine Institute, Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6855-4366","authenticated-orcid":false,"given":"Anita","family":"Burgun","sequence":"additional","affiliation":[{"name":"Clinical Bio-Informatics Laboratory, Universit\u00e9 Paris Cit\u00e9, INSERM UMR 1163, Imagine Institute, Paris, France"},{"name":"Department of Medical Informatics, Necker Hospital, AP-HP"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","MEDINFO 2025 \u2014 Healthcare Smart \u00d7 Medicine Deep"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI250927","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T11:35:30Z","timestamp":1754566530000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI250927"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,7]]},"ISBN":["9781643686080"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti250927","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"value":"0926-9630","type":"print"},{"value":"1879-8365","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,7]]}}}