{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T18:11:09Z","timestamp":1776708669339,"version":"3.51.2"},"reference-count":54,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T00:00:00Z","timestamp":1630627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2021R1C1C1009436"],"award-info":[{"award-number":["2021R1C1C1009436"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Schizophrenia is a severe mental disorder that ranks among the leading causes of disability worldwide. However, many cases of schizophrenia remain untreated due to failure to diagnose, self-denial, and social stigma. With the advent of social media, individuals suffering from schizophrenia share their mental health problems and seek support and treatment options. Machine learning approaches are increasingly used for detecting schizophrenia from social media posts. This study aims to determine whether machine learning could be effectively used to detect signs of schizophrenia in social media users by analyzing their social media texts. To this end, we collected posts from the social media platform Reddit focusing on schizophrenia, along with non-mental health related posts (fitness, jokes, meditation, parenting, relationships, and teaching) for the control group. We extracted linguistic features and content topics from the posts. Using supervised machine learning, we classified posts belonging to schizophrenia and interpreted important features to identify linguistic markers of schizophrenia. We applied unsupervised clustering to the features to uncover a coherent semantic representation of words in schizophrenia. We identified significant differences in linguistic features and topics including increased use of third person plural pronouns and negative emotion words and symptom-related topics. We distinguished schizophrenic from control posts with an accuracy of 96%. Finally, we found that coherent semantic groups of words were the key to detecting schizophrenia. Our findings suggest that machine learning approaches could help us understand the linguistic characteristics of schizophrenia and identify schizophrenia or otherwise at-risk individuals using social media texts.<\/jats:p>","DOI":"10.3390\/s21175924","type":"journal-article","created":{"date-parts":[[2021,9,6]],"date-time":"2021-09-06T13:18:26Z","timestamp":1630934306000},"page":"5924","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":67,"title":["Schizophrenia Detection Using Machine Learning Approach from Social Media Content"],"prefix":"10.3390","volume":"21","author":[{"given":"Yi Ji","family":"Bae","sequence":"first","affiliation":[{"name":"Department of Software Convergence, Kyung Hee University, Yongin 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Midan","family":"Shim","sequence":"additional","affiliation":[{"name":"Department of Software Convergence, Kyung Hee University, Yongin 17104, Korea"},{"name":"Department of Biology, Kyung Hee University, Seoul 02447, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Won Hee","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Software Convergence, Kyung Hee University, Yongin 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1016\/S0140-6736(13)61611-6","article-title":"Global burden of disease attributable to mental and substance use disorders: Findings from the global burden of disease study 2010","volume":"382","author":"Whiteford","year":"2013","journal-title":"Lancet"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1002\/wps.20491","article-title":"Prediction of psychosis across protocols and risk cohorts using automated language analysis","volume":"17","author":"Corcoran","year":"2018","journal-title":"World Psychiatry"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sher, L., and Kahn, R.S. (2019). Suicide in Schizophrenia: An Educational Overview. Medicina, 55.","DOI":"10.3390\/medicina55070361"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"383","DOI":"10.1093\/schbul\/sbn135","article-title":"Psychiatric comorbidities and schizophrenia","volume":"35","author":"Buckley","year":"2009","journal-title":"Schizophr. Bull."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1107458.1107463","article-title":"Neo-tribes: The power and potential of online communities in health care","volume":"49","author":"Johnson","year":"2006","journal-title":"Commun. ACM"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.chb.2018.05.035","article-title":"Mental distress and language use: Linguistic analysis of discussion forum posts","volume":"87","author":"Lyons","year":"2018","journal-title":"Comput. Hum. Behav."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"e289","DOI":"10.2196\/jmir.7956","article-title":"A Collaborative Approach to Identifying Social Media Markers of Schizophrenia by Employing Machine Learning and Clinical Appraisals","volume":"19","author":"Birnbaum","year":"2017","journal-title":"J. Med. Internet Res."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"e121","DOI":"10.2196\/jmir.8219","article-title":"Harnessing Reddit to Understand the Written-Communication Challenges Experienced by Individuals With Mental Health Disorders: Analysis of Texts From Mental Health Communities","volume":"20","author":"Park","year":"2018","journal-title":"J. Med. Internet Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"e22635","DOI":"10.2196\/22635","article-title":"Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study","volume":"22","author":"Low","year":"2020","journal-title":"J. Med. Internet Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.copsyc.2016.01.004","article-title":"Social Media, Big Data, and Mental Health: Current Advances and Ethical Implications","volume":"9","author":"Conway","year":"2016","journal-title":"Curr. Opin. Psychol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"455","DOI":"10.31887\/DCNS.2014.16.4\/fmcmahon","article-title":"Prediction of treatment outcomes in psychiatry--where do we stand ?","volume":"16","author":"McMahon","year":"2014","journal-title":"Dialogues Clin. Neurosci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1002\/wps.20882","article-title":"The promise of machine learning in predicting treatment outcomes in psychiatry","volume":"20","author":"Chekroud","year":"2021","journal-title":"World Psychiatry"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Nagarhalli, T.P., Vaze, V., and Rana, N.K. (2021, January 4\u20136). Impact of Machine Learning in Natural Language Processing: A Review. Proceedings of the 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India.","DOI":"10.1109\/ICICV50876.2021.9388380"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chancellor, S., and De Choudhury, M. (2020). Methods in predictive techniques for mental health status on social media: A critical review. NPJ Digit. Med., 3.","DOI":"10.1038\/s41746-020-0233-7"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.3758\/s13428-019-01235-z","article-title":"Predicting future mental illness from social media: A big-data approach","volume":"51","author":"Thorstad","year":"2019","journal-title":"Behav. Res. Methods"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"45141","DOI":"10.1038\/srep45141","article-title":"Characterisation of mental health conditions in social media using Informed Deep Learning","volume":"7","author":"Gkotsis","year":"2017","journal-title":"Sci. Rep."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zomick, J., Levitan, S.I., and Serper, M. (2019). Linguistic Analysis of Schizophrenia in Reddit Posts, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W19-3009"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mitchell, M., Hollingshead, K., and Coppersmith, G. (2015). Quantifying the Language of Schizophrenia in Social Media, Association for Computational Linguistics.","DOI":"10.3115\/v1\/W15-1202"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Coppersmith, G., Dredze, M., Harman, C., and Hollingshead, K. (2015). From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses, Association for Computational Linguistics.","DOI":"10.3115\/v1\/W15-1201"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Loveys, K., Crutchley, P., Wyatt, E., and Coppersmith, G. (2017). Small but Mighty: Affective micropatterns for Quantifying Mental Health from Social Media Language, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W17-3110"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kirinde Gamaarachchige, P., and Inkpen, D. (2019). Multi-Task, Multi-Channel, Multi-Input Learning for Mental Illness Detection Using Social Media Text, Association for Computational Linguistics.","DOI":"10.18653\/v1\/D19-6208"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ive, J., Gkotsis, G., Dutta, R., Stewart, R., and Velupillai, S. (2018). Hierarchical Neural Model with Attention Mechanisms for the Classification of Social Media Text Related to Mental Health, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W18-0607"},{"key":"ref_23","first-page":"122","article-title":"Mining Twitter Data to Improve Detection of Schizophrenia","volume":"2015","author":"McManus","year":"2015","journal-title":"AMIA Jt. Summits Transl. Sci. Proc."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Benton, A., Mitchell, M., and Hovy, D. (2017). Multitask Learning for Mental Health Conditions with Limited Social Media Data, Association for Computational Linguistics.","DOI":"10.18653\/v1\/E17-1015"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1177\/0261927X09351676","article-title":"The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods","volume":"29","author":"Tausczik","year":"2010","journal-title":"J. Lang. Soc. Psychol."},{"key":"ref_26","unstructured":"Pushshift.io Reddit API (2020, September 03). GitHub. Available online: https:\/\/github.com\/pushshift\/api."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bird, S. (2004). NLTK: The Natural Language Toolkit. arXiv, Available online: https:\/\/www.nltk.org.","DOI":"10.3115\/1219044.1219075"},{"key":"ref_28","unstructured":"Pennebaker, J.W., Booth, R.J., Boyd, R.L., and Francis, M.E. (2015). LIWC 2015 Operator\u2019s Manual, Pennebaker Conglomerates Inc."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"774","DOI":"10.21105\/joss.00774","article-title":"quanteda: An R package for the quantitative analysis of textual data","volume":"3","author":"Benoit","year":"2018","journal-title":"J. Open Source Softw."},{"key":"ref_30","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_31","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics), Springer."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"111270","DOI":"10.1016\/j.pscychresns.2021.111270","article-title":"Brain age prediction in schizophrenia: Does the choice of machine learning algorithm matter?","volume":"310","author":"Lee","year":"2021","journal-title":"Psychiatry Res. Neuroimaging"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"104320","DOI":"10.1016\/j.compbiomed.2021.104320","article-title":"Radiomics-based machine learning model for efficiently classifying transcriptome subtypes in glioblastoma patients from MRI","volume":"132","author":"Le","year":"2021","journal-title":"Comput. Biol. Med."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ho Thanh Lam, L., Le, N.H., Van Tuan, L., Tran Ban, H., Nguyen Khanh Hung, T., Nguyen, N.T.K., Huu Dang, L., and Le, N.Q.K. (2020). Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences. Biology, 9.","DOI":"10.3390\/biology9100325"},{"key":"ref_35","unstructured":"Lundberg, S., and Lee, S.-I. (2017). A unified approach to interpreting model predictions. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"LIII. On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"Lond. Edinb. Dublin Philos. Mag. J. Sci."},{"key":"ref_37","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","unstructured":"Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, AAAI Press."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1097\/NMD.0000000000000354","article-title":"Lexical Characteristics of Emotional Narratives in Schizophrenia: Relationships With Symptoms, Functioning, and Social Cognition","volume":"203","author":"Buck","year":"2015","journal-title":"J. Nerv. Ment. Dis."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1192\/bjp.bp.113.140046","article-title":"Word use in first-person accounts of schizophrenia","volume":"206","author":"Fineberg","year":"2015","journal-title":"Brit. J. Psychiat."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.jpsychires.2015.02.024","article-title":"Lexical analysis in schizophrenia: How emotion and social word use informs our understanding of clinical presentation","volume":"64","author":"Minor","year":"2015","journal-title":"J. Psychiatr. Res."},{"key":"ref_42","unstructured":"APA, A.P.A. (2013). Diagnostic and Statistical Manual of Mental Disorders, American Psychiatric Publishing. [5th ed.]."},{"key":"ref_43","unstructured":"De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. (2013, January 8\u201311). Predicting Depression via Social Media. Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Shen, J.H., and Rudzicz, F. (2017). Detecting Anxiety through Reddit, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W17-3107"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sekulic, I., Gjurkovi\u0107, M., and \u0160najder, J. (2018). Not Just Depressed: Bipolar Disorder Prediction on Reddit, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W18-6211"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning\u2013based Text Classification","volume":"54","author":"Minaee","year":"2021","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daum\u00e9, H. (2015). Deep Unordered Composition Rivals Syntactic Methods for Text Classification, Association for Computational Linguistics.","DOI":"10.3115\/v1\/P15-1162"},{"key":"ref_48","unstructured":"Joulin, A., Grave, E., Bojanowski, P., Douze, M., J\u00e9gou, H., and Mikolov, T. (2016). FastText.zip: Compressing text classification models. arXiv, Available online: https:\/\/fasttext.cc."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Tai, K.S., Socher, R., and Manning, C.D. (2015). Improved Semantic Representations from Tree-Structured Long Short-Term Memory Networks, Association for Computational Linguistics.","DOI":"10.3115\/v1\/P15-1150"},{"key":"ref_50","unstructured":"Zhu, X., Sobhani, P., and Guo, H. (2015, January 6\u201311). Long short-term memory over recursive structures. Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Kalchbrenner, N., Grefenstette, E., and Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences, Association for Computational Linguistics.","DOI":"10.3115\/v1\/P14-1062"},{"key":"ref_52","unstructured":"Kim, Y. (2016). Convolutional Neural Networks for Sentence Classification. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Peng, H., Li, J., He, Y., Liu, Y., Bao, M., Wang, L., Song, Y., and Yang, Q. (2018, January 23\u201327). Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN. Proceedings of the 2018 World Wide Web Conference, Lyon, France.","DOI":"10.1145\/3178876.3186005"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Yao, L., Mao, C., and Luo, Y. (2019). Graph Convolutional Networks for Text Classification, AAAI.","DOI":"10.1609\/aaai.v33i01.33017370"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/17\/5924\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:55:44Z","timestamp":1760165744000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/17\/5924"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,3]]},"references-count":54,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["s21175924"],"URL":"https:\/\/doi.org\/10.3390\/s21175924","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,3]]}}}