{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:35:08Z","timestamp":1772120108021,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002241","name":"Japan Science and Technology Agency","doi-asserted-by":"crossref","award":["JPMJCR21M1"],"award-info":[{"award-number":["JPMJCR21M1"]}],"id":[{"id":"10.13039\/501100002241","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soc. Netw. Anal. Min."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Blood donation is crucial for healthcare systems, yet maintaining an adequate supply is a persistent challenge. Traditional methods to understand public sentiment and donor behavior are often limited. Social media, particularly \u201cX\u201d (formerly Twitter), offers a promising alternative for real-time insights. This study explores the viability of using \u201cX\u201d data to analyze blood donation sentiment in Japan, considering the evolving perspectives of younger generations. We replicated previous study results using the Tohoku BERT model and tested a refined blood donation tweets for user classification (BDT-UC) dataset and another customized version of the model for better classification. We also compared various topic modeling methods, including latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), and BERT-based models, using two different preprocessing techniques. Finally, we integrated the classification into the Topic Modeling process, to explore the possible impact of the previous steps in such execution, for a final evaluation. Our findings indicate that although the refined dataset has an overall lower classification performance, it improved the implementation results, ensuring more balanced labeling across the data. Our refined model had a small reduction in overall precision (from 78.4% in the best evaluated model to 75.8% in the refined model). However, we improved the implementation results, ensuring more balanced labeling across the data. For topic modeling, BERT-based topic models, particularly those preprocessed with the MeCab library, achieved higher coherence and diversity scores than traditional methods. Additionally, there were significant differences when the dataset was processed following the categories of the BDT-UC study, which used specific categories related to the tweets role in blood donation. There was increased coherence and diversity for one of the categories but notably lower coherence values for the others. This study underscores the significance of initial classification and preprocessing for effective topic modeling approach when working with Japanese text, which impacts the viability of extracting insights from Japanese social media data. The developed methodologies could support more effective analysis of blood donation groups, and better targeted donation campaigns in Japan.<\/jats:p>","DOI":"10.1007\/s13278-025-01437-8","type":"journal-article","created":{"date-parts":[[2025,4,4]],"date-time":"2025-04-04T18:48:50Z","timestamp":1743792530000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Leveraging social media for public health: NLP implementations for blood donation data analysis in Japan"],"prefix":"10.1007","volume":"15","author":[{"given":"Roberto","family":"Espinoza","sequence":"first","affiliation":[]},{"given":"Kazumasa","family":"Kishimoto","sequence":"additional","affiliation":[]},{"given":"Chang","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Luciano","family":"H.O. Santos","sequence":"additional","affiliation":[]},{"given":"Yukiko","family":"Mori","sequence":"additional","affiliation":[]},{"given":"Goshiro","family":"Yamamoto","sequence":"additional","affiliation":[]},{"given":"Tomohiro","family":"Kuroda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,3]]},"reference":[{"key":"1437_CR1","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993\u20131022","journal-title":"J Mach Learn Res"},{"key":"1437_CR2","unstructured":"colorfulscoop\/sbert-base-ja. Hugging Face. Accessed 2 January 2024. https:\/\/huggingface.co\/colorfulscoop\/sbert-base-ja"},{"key":"1437_CR3","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"1437_CR4","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1162\/tacl_a_00325","volume":"8","author":"AB Dieng","year":"2020","unstructured":"Dieng AB, Ruiz FJR, Blei DM (2020) Topic modeling in embedding spaces. Trans Assoc Comput Linguist 8:439\u2013453. https:\/\/doi.org\/10.1162\/tacl_a_00325","journal-title":"Trans Assoc Comput Linguist"},{"key":"1437_CR5","doi-asserted-by":"publisher","DOI":"10.3389\/fsoc.2022.886498","volume":"7","author":"R Egger","year":"2022","unstructured":"Egger R, Yu J (2022) A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front Sociol 7:886498","journal-title":"Front Sociol"},{"key":"1437_CR6","doi-asserted-by":"crossref","unstructured":"Espinoza R, Liu C, Kishimoto K, Yamamoto G, Mori Y, Santos L, Kuroda T (2023) Adjusting Twitter data as a source for blood donation analysis: BDT-UC dataset and BERT implementations. In: 2023 IEEE EMBS special topic conference on data science and engineering in healthcare, medicine and biology. IEEE, pp 27\u201328","DOI":"10.1109\/IEEECONF58974.2023.10404778"},{"key":"1437_CR7","doi-asserted-by":"crossref","unstructured":"Gan L, Yang T, Huang Y, Yang B, Luo YY, Richard LWC, Guo D (2023) Experimental comparison of three topic modeling methods with LDA, Top2Vec and BERTopic. In: International symposium on artificial intelligence and robotics. Springer, pp 376\u2013391","DOI":"10.1007\/978-981-99-9109-9_37"},{"key":"1437_CR8","unstructured":"Grootendorst M (2022) BERTopic: neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794"},{"key":"1437_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.socscimed.2022.115485","volume":"315","author":"S Harrell","year":"2022","unstructured":"Harrell S, Simons AM, Clasen P (2022) Promoting blood donation through social media: evidence from Brazil, India and the USA. Soc Sci Med 315:115485","journal-title":"Soc Sci Med"},{"issue":"1","key":"1437_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0192-5","volume":"6","author":"JM Johnson","year":"2019","unstructured":"Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1\u201354","journal-title":"J Big Data"},{"issue":"6755","key":"1437_CR11","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1038\/44565","volume":"401","author":"DD Lee","year":"1999","unstructured":"Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788\u2013791","journal-title":"Nature"},{"issue":"2","key":"1437_CR12","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1080\/19312458.2021.1965973","volume":"16","author":"F Lind","year":"2022","unstructured":"Lind F, Eberl J-M, Eisele O, Heidenreich T, Galyga S, Boomgaarden HG (2022) Building the bridge: topic modeling for comparative research. Commun Methods Meas 16(2):96\u2013114. https:\/\/doi.org\/10.1080\/19312458.2021.1965973","journal-title":"Commun Methods Meas"},{"issue":"5","key":"1437_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3347711","volume":"52","author":"AC Lorena","year":"2019","unstructured":"Lorena AC, Garcia LP, Lehmann J, Souto MC, Ho TK (2019) How complex is your classification problem? A survey on measuring classification complexity. ACM Comput Surv (CSUR) 52(5):1\u201334","journal-title":"ACM Comput Surv (CSUR)"},{"key":"1437_CR14","doi-asserted-by":"crossref","unstructured":"Martin L, Muller B, Su\u00e1rez PJO, Dupont Y, Romary L, La\u00a0Clergerie \u00c9V, Seddah D, Sagot B (2019) CamemBERT: a tasty French language model. arXiv preprint arXiv:1911.03894","DOI":"10.18653\/v1\/2020.acl-main.645"},{"key":"1437_CR15","unstructured":"MeCab (2024) Yet another part-of-speech and morphological analyzer. Accessed 2 January. https:\/\/taku910.github.io\/mecab\/"},{"key":"1437_CR16","doi-asserted-by":"crossref","unstructured":"Munikar M, Shakya, S, Shrestha A (2019) Fine-grained sentiment classification using BERT. In: 2019 Artificial Intelligence for Transforming Business and Society (AITB), vol 1, pp 1\u20135. IEEE","DOI":"10.1109\/AITB48515.2019.8947435"},{"key":"1437_CR17","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.csl.2016.03.004","volume":"40","author":"Z Qin","year":"2016","unstructured":"Qin Z, Cong Y, Wan T (2016) Topic modeling of Chinese language beyond a bag-of-words. Comput Speech Lang 40:60\u201378. https:\/\/doi.org\/10.1016\/j.csl.2016.03.004","journal-title":"Comput Speech Lang"},{"key":"1437_CR18","doi-asserted-by":"crossref","unstructured":"R\u00f6der M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 399\u2013408","DOI":"10.1145\/2684822.2685324"},{"key":"1437_CR19","doi-asserted-by":"crossref","unstructured":"Ruder S, Peters ME, Swayamdipta S, Wolf T (2019) Transfer learning in natural language processing. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp 15\u201318","DOI":"10.18653\/v1\/N19-5004"},{"key":"1437_CR20","unstructured":"sentence-transformers\/paraphrase-multilingual-MiniLM-L12-v2. Hugging Face. Accessed 2 January 2024. https:\/\/huggingface.co\/sentence-transformers\/paraphrase-multilingual-MiniLM-L12-v2"},{"key":"1437_CR21","unstructured":"sentence-transformers\/paraphrase-multilingual-mpnet-base-v2 (2024) Accessed 2 January. https:\/\/huggingface.co\/sentence-transformers\/paraphrase-multilingual-mpnet-base-v2"},{"key":"1437_CR22","unstructured":"sonoisa\/sentence-bert-base-ja-mean-tokens-v2. Hugging Face. Accessed 2 January 2024. https:\/\/huggingface.co\/sonoisa\/sentence-bert-base-ja-mean-tokens-v2"},{"key":"1437_CR23","doi-asserted-by":"crossref","unstructured":"Terragni S, Fersini E, Galuzzi BG, Tropeano P, Candelieri A (2021) OCTIS: comparing and optimizing topic models is simple! In: Proceedings of the 16th conference of the European chapter of the association for computational linguistics: system demonstrations, pp 263\u2013270","DOI":"10.18653\/v1\/2021.eacl-demos.31"},{"key":"1437_CR24","unstructured":"tohoku-nlp\/bert-base-japanese-v3 (2024) Hugging Face. Accessed 2 January. https:\/\/huggingface.co\/tohoku-nlp\/bert-base-japanese-v3"},{"issue":"9","key":"1437_CR25","doi-asserted-by":"publisher","first-page":"26513","DOI":"10.2196\/26513","volume":"5","author":"AB Tuck","year":"2021","unstructured":"Tuck AB, Thompson RJ (2021) Social networking site use during the Covid-19 pandemic and its associations with social and emotional well-being in college students: survey study. JMIR Formative Res 5(9):26513","journal-title":"JMIR Formative Res"},{"key":"1437_CR26","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/B978-0-12-411519-4.00003-3","volume-title":"The art and science of analyzing software data","author":"S Wagner","year":"2015","unstructured":"Wagner S, Fern\u00e1ndez DM (2015) Chapter 3\u2014Analyzing text in software projects. In: Bird C, Menzies T, Zimmermann T (eds) The art and science of analyzing software data. Morgan Kaufmann, Boston, pp 39\u201372. https:\/\/doi.org\/10.1016\/B978-0-12-411519-4.00003-3"},{"key":"1437_CR27","doi-asserted-by":"crossref","unstructured":"Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2020)Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38\u201345","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"issue":"2","key":"1437_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3057270","volume":"50","author":"A Yadollahi","year":"2017","unstructured":"Yadollahi A, Shahraki AG, Zaiane OR (2017) Current state of text sentiment analysis from opinion to emotion mining. ACM Comput Surv (CSUR) 50(2):1\u201333","journal-title":"ACM Comput Surv (CSUR)"},{"key":"1437_CR29","doi-asserted-by":"publisher","first-page":"1359362","DOI":"10.3389\/fpubh.2024.1359362","volume":"12","author":"Z Zhang","year":"2024","unstructured":"Zhang Z, Liu Q (2024) Rational or altruistic: the impact of social media information exposure on Chinese youth\u2019s willingness to donate blood. Front Public Health 12:1359362","journal-title":"Front Public Health"},{"key":"1437_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-020-1079-2","volume":"20","author":"T Zhang","year":"2020","unstructured":"Zhang T, Wang Y, Wang X, Yang Y, Ye Y (2020) Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine. BMC Med Inform Decis Making 20:1\u201317","journal-title":"BMC Med Inform Decis Making"},{"key":"1437_CR31","doi-asserted-by":"crossref","unstructured":"Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, Zou W (2015) A heuristic approach to determine an appropriate number of topics in topic modeling. In: BMC bioinformatics, vol 16. Springer, pp 1\u201310","DOI":"10.1186\/1471-2105-16-S13-S8"}],"container-title":["Social Network Analysis and Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-025-01437-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13278-025-01437-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-025-01437-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T08:28:24Z","timestamp":1765960104000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13278-025-01437-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,3]]},"references-count":31,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1437"],"URL":"https:\/\/doi.org\/10.1007\/s13278-025-01437-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-5000403\/v1","asserted-by":"object"}]},"ISSN":["1869-5469"],"issn-type":[{"value":"1869-5469","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,3]]},"assertion":[{"value":"30 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 December 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 April 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}}],"article-number":"32"}}