{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T02:19:19Z","timestamp":1767925159243,"version":"3.49.0"},"reference-count":39,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T00:00:00Z","timestamp":1660780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Social media has become an important resource for discussing, sharing, and seeking information pertinent to rare diseases by patients and their families, given the low prevalence in the extraordinarily sparse populations. In our previous study, we identified prevalent topics from Reddit via topic modeling for cystic fibrosis (CF). While we were able to derive\/access concerns\/needs\/questions of patients with CF, we observed challenges and issues with the traditional techniques of topic modeling, e.g., Latent Dirichlet Allocation (LDA), for fulfilling the task of topic extraction. Thus, here we present our experiments to extend the previous study with an aim of improving the performance of topic modeling, by experimenting with LDA model optimization and examination of the Top2Vec model with different embedding models. With the demonstrated results with higher coherence and qualitatively higher human readability of derived topics, we implemented the Top2Vec model with doc2vec as the embedding model as our final model to extract topics from a subreddit of CF (\u201cr\/CysticFibrosis\u201d) and proposed to expand its use with other types of social media data for other rare diseases for better assessing patients' needs with social media data.<\/jats:p>","DOI":"10.3389\/frai.2022.948313","type":"journal-article","created":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T07:18:50Z","timestamp":1660807130000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Experiments with LDA and Top2Vec for embedded topic discovery on social media data\u2014A case study of cystic fibrosis"],"prefix":"10.3389","volume":"5","author":[{"given":"Bradley","family":"Karas","sequence":"first","affiliation":[]},{"given":"Sue","family":"Qu","sequence":"additional","affiliation":[]},{"given":"Yanji","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Qian","family":"Zhu","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2022,8,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1016\/j.hlpt.2020.10.014","article-title":"Twitter vs. Zika\u2014The role of social media in epidemic outbreaks surveillance","volume":"10","author":"Abouzahra","year":"2021","journal-title":"Health Policy Technol"},{"key":"B2","volume-title":"Top2vec: Distributed Representations of Topics","author":"Angelov","year":"2020"},{"key":"B3","first-page":"830","article-title":"The pushshift reddit dataset,","volume-title":"Proceedings of the International AAAI Conference on Web and Social Media","author":"Baumgartner","year":"2020"},{"key":"B4","doi-asserted-by":"publisher","first-page":"738513","DOI":"10.3389\/fpubh.2021.738513","article-title":"Examining cannabis, tobacco, and vaping discourse on reddit: an exploratory approach using natural language processing","volume":"9","author":"Benson","year":"2021","journal-title":"Front. Public Health"},{"key":"B5","volume-title":"Algorithms for Hyper-Parameter Optimization","author":"Bergstra","year":"2011"},{"key":"B6","first-page":"115","article-title":"Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures,","volume-title":"International Conference on Machine Learning","author":"Bergstra","year":"2013"},{"key":"B7","volume-title":"Natural Language Processing with Python: Analyzing Text With the Natural Language Toolkit","author":"Bird","year":"2009"},{"key":"B8","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"B9","first-page":"31","article-title":"Normalized (pointwise) mutual information in collocation extraction","volume":"30","author":"Bouma","year":"2009","journal-title":"Proc. GSCL."},{"key":"B10","volume-title":"Universal Sentence Encoder","author":"Cer","year":"2018"},{"key":"B11","doi-asserted-by":"crossref","DOI":"10.1007\/978-981-15-5679-1_57","article-title":"Influence of followers on twitter sentiments about rare disease medications,","volume-title":"Intelligent Data Engineering and Analytics","author":"Choudhury","year":"2021"},{"key":"B12","doi-asserted-by":"publisher","first-page":"886498","DOI":"10.3389\/fsoc.2022.886498","article-title":"A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts","volume":"7","author":"Egger","year":"2022","journal-title":"Front. Sociol."},{"key":"B13","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1016\/j.jad.2019.11.007","article-title":"Who says what? Content and participation characteristics in an online depression community","volume":"263","author":"Feldhege","year":"2020","journal-title":"J. Affect. Disord"},{"key":"B14","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1037\/0033-295X.114.2.211","article-title":"Topics in semantic representation","volume":"114","author":"Griffiths","year":"2007","journal-title":"Psychological review."},{"key":"B15","article-title":"Online learning for latent dirichlet allocation","volume-title":"advances in neural information processing systems.","author":"Hoffman","year":"2010"},{"key":"B16","doi-asserted-by":"publisher","first-page":"e15700","DOI":"10.2196\/15700","article-title":"Exploring abnormal behavior patterns of online users with emotional eating behavior: topic modeling study","volume":"22","author":"Hwang","year":"2020","journal-title":"J. Med. Internet Res."},{"key":"B17","doi-asserted-by":"publisher","first-page":"e12480","DOI":"10.2196\/12480","article-title":"Characterizing trends in human Papillomavirus vaccine discourse on reddit (2007\u20132015): an observational study","volume":"5","author":"Lama","year":"2019","journal-title":"JMIR Public Health Surveill."},{"key":"B18","first-page":"1188","article-title":"Distributed representations of sentences and documents, International conference on machine learning","author":"Le","year":"2014"},{"key":"B19","volume-title":"Social Media in Medical and Health Care: Opportunities and Challenges","author":"Lim","year":"2016"},{"key":"B20","unstructured":"Cystic Fibrosis2021"},{"key":"B21","first-page":"221","article-title":"Use of two topic modeling methods to investigate covid vaccine hesitancy","volume-title":"Int. Conf. ICT Soc. Hum. Beings","author":"Ma","year":"2021"},{"key":"B22","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1007\/s40506-020-00244-3","article-title":"Use of \u201cSocial Media\u201d\u2014an option for spreading awareness in infection prevention","volume":"13","author":"Madhumathi","year":"2021","journal-title":"Curr. Treat. Options Infect. Dis."},{"key":"B23","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1080\/19312458.2018.1430754","article-title":"Applying LDA topic modeling in communication research: toward a valid and reliable methodology","volume":"12","author":"Maier","year":"2018","journal-title":"Commun. Methods Meas."},{"key":"B24","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1016\/j.xkme.2019.06.006","article-title":"Precision medicine diagnostics for rare kidney disease: twitter as a tool in clinical genomic translation","volume":"1","author":"Mallett","year":"2019","journal-title":"Kidney Med."},{"key":"B25","doi-asserted-by":"publisher","first-page":"1505","DOI":"10.1016\/j.jiph.2021.08.010","article-title":"Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: a call to action for strengthening vaccine confidence","volume":"14","author":"Melton","year":"2021","journal-title":"J. Infect. Public Health."},{"key":"B26","first-page":"32","article-title":"How social media can be used to understand what matters to people with rare diseases","volume":"32","author":"Merinopoulou","year":"2019","journal-title":"Rare Dis."},{"key":"B27","article-title":"Efficient estimation of word representations in vector space","volume-title":"arXiv preprint arXiv:","author":"Mikolov","year":"2013"},{"key":"B28","article-title":"Software framework for topic modelling with large Corpora,","volume-title":"Proceedings of the LREC 2010 workshop on new challenges for NLP","author":"Rehurek","year":"2010"},{"key":"B29","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1093\/ibd\/izy280","article-title":"Social media use and preferences in patients with inflammatory bowel disease","volume":"25","author":"Reich","year":"2019","journal-title":"Inflamm. Bowel Dis."},{"key":"B30","volume-title":"Sentence-bert: Sentence embeddings using siamese bert-networks","author":"Reimers","year":"2019"},{"key":"B31","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.emnlp-main.365","volume-title":"Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation","author":"Reimers","year":"2020"},{"key":"B32","doi-asserted-by":"crossref","DOI":"10.1145\/2684822.2685324","article-title":"Exploring the space of topic coherence measures,","volume-title":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","author":"R\u00f6der","year":"2015"},{"key":"B33","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1186\/s13011-022-00442-w","article-title":"Concerns among people who use opioids during the COVID-19 pandemic: a natural language processing analysis of social media posts","volume":"17","author":"Sarker","year":"2022","journal-title":"Subst. Abuse Treat. Prev. Policy."},{"key":"B34","volume-title":"Rare Diseases: Common Issues in Drug Development Guidance for Industry","year":"2019"},{"key":"B35","volume-title":"Attention Is All you Need","author":"Vaswani","year":"2017"},{"key":"B36","volume-title":"Why Priors Matter","author":"Wallach","year":"2009"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1177\/0020294020932347","article-title":"A new automatic machine learning based hyperparameter optimization for workpiece quality prediction","volume":"53","author":"Wen","year":"2020","journal-title":"Meas. Control"},{"key":"B38","volume-title":"Multilingual Universal Sentence Encoder for Semantic Retrieval","author":"Yang","year":"2019"},{"key":"B39","first-page":"2618","article-title":"Better Understand Rare Disease Patients' Needs by Analyzing Social Media Data\u2013a Case Study of Cystic Fibrosis,","author":"Zhu","year":"2021"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.948313\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T07:19:13Z","timestamp":1660807153000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.948313\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,18]]},"references-count":39,"alternative-id":["10.3389\/frai.2022.948313"],"URL":"https:\/\/doi.org\/10.3389\/frai.2022.948313","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,18]]},"article-number":"948313"}}