{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T06:28:04Z","timestamp":1775024884621,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T00:00:00Z","timestamp":1643241600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000025","name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health","doi-asserted-by":"publisher","award":["R01MH117599"],"award-info":[{"award-number":["R01MH117599"]}],"id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000025","name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health","doi-asserted-by":"publisher","award":["R01MH117599"],"award-info":[{"award-number":["R01MH117599"]}],"id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000025","name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health","doi-asserted-by":"publisher","award":["R01MH117599"],"award-info":[{"award-number":["R01MH117599"]}],"id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000025","name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health","doi-asserted-by":"publisher","award":["R01MH117599"],"award-info":[{"award-number":["R01MH117599"]}],"id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006367","name":"Tommy Fuss Fund","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006367","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health"},{"name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health"},{"name":"U.S. Department of Health & Human Services | NIH | National Institute of Mental Health"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Clinical risk prediction models powered by electronic health records (EHRs) are becoming increasingly widespread in clinical practice. With suicide-related mortality rates rising in recent years, it is becoming increasingly urgent to understand, predict, and prevent suicidal behavior. Here, we compare the predictive value of structured and unstructured EHR data for predicting suicide risk. We find that Naive Bayes Classifier (NBC) and Random Forest (RF) models trained on structured EHR data perform better than those based on unstructured EHR data. An NBC model trained on both structured and unstructured data yields similar performance (AUC\u2009=\u20090.743) to an NBC model trained on structured data alone (0.742,\n                    <jats:italic>p<\/jats:italic>\n                    \u2009=\u20090.668), while an RF model trained on both data types yields significantly better results (AUC\u2009=\u20090.903) than an RF model trained on structured data alone (0.887,\n                    <jats:italic>p<\/jats:italic>\n                    \u2009&lt;\u20090.001), likely due to the RF model\u2019s ability to capture interactions between the two data types. To investigate these interactions, we propose and implement a general framework for identifying specific structured-unstructured feature pairs whose interactions differ between case and non-case cohorts, and thus have the potential to improve predictive performance and increase understanding of clinical risk. We find that such feature pairs tend to capture heterogeneous pairs of general concepts, rather than homogeneous pairs of specific concepts. These findings and this framework can be used to improve current and future EHR-based clinical modeling efforts.\n                  <\/jats:p>","DOI":"10.1038\/s41746-022-00558-0","type":"journal-article","created":{"date-parts":[[2022,1,27]],"date-time":"2022-01-27T06:03:15Z","timestamp":1643263395000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Predictive structured\u2013unstructured interactions in EHR models: A case study of suicide prediction"],"prefix":"10.1038","volume":"5","author":[{"given":"Ilkin","family":"Bayramli","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7390-6354","authenticated-orcid":false,"given":"Victor","family":"Castro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9603-4728","authenticated-orcid":false,"given":"Yuval","family":"Barak-Corren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emily M.","family":"Madsen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew K.","family":"Nock","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jordan W.","family":"Smoller","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9908-5523","authenticated-orcid":false,"given":"Ben Y.","family":"Reis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,1,27]]},"reference":[{"key":"558_CR1","doi-asserted-by":"publisher","first-page":"ooab011","DOI":"10.1093\/jamiaopen\/ooab011","volume":"4","author":"FR Tsui","year":"2021","unstructured":"Tsui, F. R. et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open 4, ooab011 (2021).","journal-title":"JAMIA Open"},{"key":"558_CR2","doi-asserted-by":"publisher","first-page":"1064","DOI":"10.1001\/jamapsychiatry.2016.2172","volume":"73","author":"TH McCoy Jr.","year":"2016","unstructured":"McCoy, T. H. Jr., Castro, V. M., Roberson, A. M., Snapper, L. A. & Perlis, R. H. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 73, 1064\u20131071 (2016).","journal-title":"JAMA Psychiatry"},{"key":"558_CR3","doi-asserted-by":"publisher","first-page":"S176","DOI":"10.1016\/j.amepre.2014.06.004","volume":"47","author":"CR Glenn","year":"2014","unstructured":"Glenn, C. R. & Nock, M. K. Improving the short-term prediction of suicidal behavior. Am. J. Prev. Med. 47, S176\u2013S180 (2014).","journal-title":"Am. J. Prev. Med."},{"key":"558_CR4","doi-asserted-by":"publisher","first-page":"e85733","DOI":"10.1371\/journal.pone.0085733","volume":"9","author":"C Poulin","year":"2014","unstructured":"Poulin, C. et al. Predicting the risk of suicide by analyzing the text of clinical notes. PLoS One 9, e85733 (2014).","journal-title":"PLoS One"},{"key":"558_CR5","unstructured":"Gulati, G., Cullen, W. & Kelly, B. Psychiatry Algorithms for Primary Care (John Wiley & Sons, 2021)."},{"key":"558_CR6","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1056\/NEJMra1902944","volume":"382","author":"S Fazel","year":"2020","unstructured":"Fazel, S. & Runeson, B. Suicide. N. Engl. J. Med. 382, 266\u2013274 (2020).","journal-title":"N. Engl. J. Med."},{"key":"558_CR7","unstructured":"Hedegaard, H., Curtin, S. C. & Warner, M. Suicide rates in the United States continue to increase. NCHS Data Brief (309) 1\u20138 (2018)."},{"key":"558_CR8","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1177\/0956797610364762","volume":"21","author":"MK Nock","year":"2010","unstructured":"Nock, M. K. et al. Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychol. Sci. 21, 511\u2013517 (2010).","journal-title":"Psychol. Sci."},{"key":"558_CR9","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1176\/appi.ajp.2016.16010077","volume":"174","author":"Y Barak-Corren","year":"2017","unstructured":"Barak-Corren, Y. et al. Predicting suicidal behavior from longitudinal electronic health records. Am. J. Psychiatry 174, 154\u2013162 (2017).","journal-title":"Am. J. Psychiatry"},{"key":"558_CR10","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1093\/jamia\/ocab225","volume":"29","author":"I Bayramli","year":"2021","unstructured":"Bayramli, I., Castro, V., Barak-Corren, Y., Madsen, E. M., Nock, M. K., Smoller, J. W. & Reis, B. Y. Temporally informed random forests for suicide risk prediction. J. Am. Med. Inform. Assoc. 29, 62\u201371 (2021).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"558_CR11","doi-asserted-by":"publisher","first-page":"103361","DOI":"10.1016\/j.jbi.2019.103361","volume":"102","author":"Z Xu","year":"2020","unstructured":"Xu, Z. et al. Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J. Biomed. Inform. 102, 103361 (2020).","journal-title":"J. Biomed. Inform."},{"key":"558_CR12","doi-asserted-by":"publisher","DOI":"10.1038\/s41398-020-0780-3","volume":"10","author":"C Su","year":"2020","unstructured":"Su, C., Xu, Z., Pathak, J. & Wang, F. Deep learning in mental health outcome research: a scoping review. Transl. Psychiatry 10, 116 (2020).","journal-title":"Transl. Psychiatry"},{"key":"558_CR13","doi-asserted-by":"publisher","first-page":"e12239","DOI":"10.2196\/12239","volume":"7","author":"S Sheikhalishahi","year":"2019","unstructured":"Sheikhalishahi, S. et al. Natural language processing of clinical notes on chronic diseases: Systematic review. JMIR Med Inf. 7, e12239 (2019).","journal-title":"JMIR Med Inf."},{"key":"558_CR14","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1038\/s41746-020-0301-z","volume":"3","author":"I Landi","year":"2020","unstructured":"Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 96 (2020).","journal-title":"npj Digit. Med."},{"key":"558_CR15","doi-asserted-by":"publisher","first-page":"26094","DOI":"10.1038\/srep26094","volume":"6","author":"R Miotto","year":"2016","unstructured":"Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).","journal-title":"Sci. Rep."},{"key":"558_CR16","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1038\/s41746-018-0029-1","volume":"1","author":"A Rajkomar","year":"2018","unstructured":"Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).","journal-title":"NPJ Digit. Med."},{"key":"558_CR17","doi-asserted-by":"publisher","first-page":"e0211116","DOI":"10.1371\/journal.pone.0211116","volume":"14","author":"NJ Carson","year":"2019","unstructured":"Carson, N. J. et al. Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLoS One 14, e0211116 (2019).","journal-title":"PLoS One"},{"key":"558_CR18","first-page":"1044","volume":"2006","author":"R Nalichowski","year":"2006","unstructured":"Nalichowski, R., Keogh, D., Chueh, H. C. & Murphy, S. N. Calculating the benefits of a research patient data repository. AMIA Annu. Symp. Proc.2006, 1044 (2006).","journal-title":"AMIA Annu. Symp. Proc."},{"key":"558_CR19","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkh061","volume":"32","author":"O Bodenreider","year":"2004","unstructured":"Bodenreider, O. The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 32, D267\u2013D270 (2004).","journal-title":"Nucleic Acids Res."},{"key":"558_CR20","unstructured":"Ross, J. Psychiatric Phenotyping Using Symptom Profiles: Can Self-Report Symptoms Inform a New Psychiatric Taxonomy? (UCSF, 2018)."},{"key":"558_CR21","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1016\/j.biopsych.2018.01.011","volume":"83","author":"TH McCoy Jr.","year":"2018","unstructured":"McCoy, T. H. Jr. et al. High throughput phenotyping for dimensional psychopathology in electronic health records. Biol. Psychiatry 83, 997\u20131004 (2018).","journal-title":"Biol. Psychiatry"},{"key":"558_CR22","doi-asserted-by":"publisher","DOI":"10.1186\/1472-6947-6-30","volume":"6","author":"QT Zeng","year":"2006","unstructured":"Zeng, Q. T. et al. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med. Inform. Decis. Mak. 6, 30 (2006).","journal-title":"BMC Med. Inform. Decis. Mak."},{"key":"558_CR23","doi-asserted-by":"crossref","unstructured":"Chapman, W., Dowling, J. & Chu, D. ConText: An algorithm for identifying contextual features from clinical text. In Biological, Translational, and Clinical Language Processing 81\u201388 (Association for Computational Linguistics, 2007).","DOI":"10.3115\/1572392.1572408"},{"key":"558_CR24","doi-asserted-by":"publisher","first-page":"e201262","DOI":"10.1001\/jamanetworkopen.2020.1262","volume":"3","author":"Y Barak-Corren","year":"2020","unstructured":"Barak-Corren, Y. et al. Validation of an electronic health record-based suicide risk prediction modeling approach across multiple health care systems. JAMA Netw. Open 3, e201262\u2013e201262 (2020).","journal-title":"JAMA Netw. Open"},{"key":"558_CR25","doi-asserted-by":"publisher","first-page":"b3677","DOI":"10.1136\/bmj.b3677","volume":"339","author":"BY Reis","year":"2009","unstructured":"Reis, B. Y., Kohane, I. S. & Mandl, K. D. Longitudinal histories as predictors of future diagnoses of domestic abuse: Modelling study. BMJ 339, b3677 (2009).","journal-title":"BMJ"},{"key":"558_CR26","unstructured":"Chao, C., Liaw, A. & Breiman, L. Using random forest to learn imbalanced data. Berkeley Statistics Report No. 666. 1\u201312 (University of California Berkley, USA, 2004). https:\/\/statistics.berkeley.edu\/tech-reports\/666"},{"key":"558_CR27","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman, L. Random forests. Mach. Learn. 45, 5\u201332 (2001).","journal-title":"Mach. Learn."},{"key":"558_CR28","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1111\/j.1469-1809.1955.tb01348.x","volume":"19","author":"B Woolf","year":"1955","unstructured":"Woolf, B. On estimating the relation between blood group and disease. Ann. Hum. Genet. 19, 251\u2013253 (1955).","journal-title":"Ann. Hum. Genet."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00558-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00558-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00558-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,25]],"date-time":"2022-11-25T03:23:11Z","timestamp":1669346591000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00558-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,27]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["558"],"URL":"https:\/\/doi.org\/10.1038\/s41746-022-00558-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.08.10.21261831","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,27]]},"assertion":[{"value":"6 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Dr. Smoller reported serving as an unpaid member of the Bipolar\/Depression Research Community Advisory Panel of 23andMe and a member of the Leon Levy Foundation Neuroscience Advisory Board, and receiving an honorarium for an internal seminar at Biogen Inc. Dr. Nock receives textbook royalties from Macmillan and Pearson publishers and has been a paid consultant in the past year for Microsoft and for a legal case regarding a death by suicide. He is an unpaid scientific advisor for TalkLife and Empatica. The remaining authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"15"}}