{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T22:49:48Z","timestamp":1775774988515,"version":"3.50.1"},"reference-count":59,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T00:00:00Z","timestamp":1755129600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T00:00:00Z","timestamp":1755129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. A powerful representation approach can unleash the potential of discovering new medical patterns underlying the EHR. Here, we introduce an unsupervised method for embedding high-dimensional EHR data at the patient level to characterize heterogeneity in complex diseases and identify novel disease patterns linked to disparities in clinical outcomes. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,740 patients in the Electronic Medical Records and GEnomics (eMERGE) Network. The model achieved strong predictive performance in predicting future disease (median AUROC\u2009=\u20090.87 within one year) and bulk phenotyping (median AUROC\u2009=\u20090.84). Notably, these patient embeddings revealed diverse comorbidity profiles and health outcomes, including distinct subtypes and progression patterns in colorectal cancer and systemic lupus erythematosus.<\/jats:p>","DOI":"10.1038\/s41746-025-01872-z","type":"journal-article","created":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T07:52:13Z","timestamp":1755157933000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Transformer patient embedding using electronic health records enables patient stratification and progression analysis"],"prefix":"10.1038","volume":"8","author":[{"given":"Su","family":"Xian","sequence":"first","affiliation":[]},{"given":"Monika E.","family":"Grabowska","sequence":"additional","affiliation":[]},{"given":"Iftikhar J.","family":"Kullo","sequence":"additional","affiliation":[]},{"given":"Yuan","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Jordan W.","family":"Smoller","sequence":"additional","affiliation":[]},{"given":"Theresa L.","family":"Walunas","sequence":"additional","affiliation":[]},{"given":"Wei-Qi","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Gail P.","family":"Jarvik","sequence":"additional","affiliation":[]},{"given":"Sean D.","family":"Mooney","sequence":"additional","affiliation":[]},{"given":"David R.","family":"Crosslin","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,8,14]]},"reference":[{"key":"1872_CR1","doi-asserted-by":"publisher","first-page":"1142","DOI":"10.1093\/jamia\/ocx080","volume":"24","author":"J Adler-Milstein","year":"2017","unstructured":"Adler-Milstein, J. et al. Electronic health record adoption in US hospitals: the emergence of a digital \u2018advanced use\u2019 divide. J. Am. Med. Inform. Assoc. 24, 1142\u20131148 (2017).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1872_CR2","doi-asserted-by":"publisher","first-page":"1709","DOI":"10.1001\/jama.2010.1497","volume":"304","author":"AK Jha","year":"2010","unstructured":"Jha, A. K. Meaningful use of electronic health records: the road ahead. JAMA 304, 1709\u20131710 (2010).","journal-title":"JAMA"},{"key":"1872_CR3","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s12911-018-0672-0","volume":"18","author":"T Bai","year":"2018","unstructured":"Bai, T., Chanda, A. K., Egleston, B. L. & Vucetic, S. EHR phenotyping via jointly embedding medical concepts and words into a unified vector space. BMC Med. Inform. Decis. Mak. 18, 15\u201325 (2018).","journal-title":"BMC Med. Inform. Decis. Mak."},{"issue":"2","key":"1872_CR4","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1093\/jamia\/ocac216","volume":"30","author":"S Yang","year":"2023","unstructured":"Yang, S., Varghese, P., Stephenson, E., Tu, K., & Gronsbell, J. Machine learning approaches for electronic health records phenotyping: a methodical review Abstract. J Am Med Inform Assoc 30(2), 367\u2013381, https:\/\/doi.org\/10.1093\/jamia\/ocac216 (2023).","journal-title":"J Am Med Inform Assoc"},{"key":"1872_CR5","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1109\/TCBB.2018.2849968","volume":"16","author":"Zeng Zexian","year":"2019","unstructured":"Zexian, Zeng., Yu, Deng., Xiaoyu, Li., Tristan, Naumann. & Yuan, Luo. Natural Language Processing for EHR-BasedComputational Phenotyping. IEEE\/ACM Trans Comput Biol Bioinform 16, 139\u2013153, https:\/\/doi.org\/10.1109\/TCBB.2018.2849968 (2019).","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"1872_CR6","doi-asserted-by":"publisher","first-page":"1046","DOI":"10.1093\/jamia\/ocv202","volume":"23","author":"JC Kirby","year":"2016","unstructured":"Kirby, J. C. et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J. Am. Med. Inform. Assoc. 23, 1046\u20131052 (2016).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1872_CR7","doi-asserted-by":"publisher","first-page":"791","DOI":"10.1016\/j.ajhg.2016.08.012","volume":"99","author":"PL Auer","year":"2016","unstructured":"Auer, P. L. et al. Guidelines for large-scale sequence-based complex trait association studies: lessons learned from the NHLBI Exome Sequencing Project. Am. J. Hum. Genet. 99, 791 (2016).","journal-title":"Am. J. Hum. Genet."},{"key":"1872_CR8","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1016\/j.jbi.2014.07.007","volume":"52","author":"PL Peissig","year":"2014","unstructured":"Peissig, P. L. et al. Relational machine learning for electronic health record-driven phenotyping. J. Biomed. Inform. 52, 260\u2013270 (2014).","journal-title":"J. Biomed. Inform."},{"key":"1872_CR9","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1146\/annurev-biodatasci-080917-013315","volume":"1","author":"JM Banda","year":"2018","unstructured":"Banda, J. M., Seneviratne, M., Hernandez-Boussard, T. & Shah, N. H. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu. Rev. Biomed. Data Sci. 1, 53\u201368 (2018).","journal-title":"Annu. Rev. Biomed. Data Sci."},{"key":"1872_CR10","doi-asserted-by":"publisher","DOI":"10.1038\/srep26094","volume":"6","author":"R Miotto","year":"2016","unstructured":"Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).","journal-title":"Sci. Rep."},{"key":"1872_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-020-0301-z","volume":"3","author":"I Landi","year":"2020","unstructured":"Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. npj Digit. Med. 3, 1\u201311 (2020).","journal-title":"npj Digit. Med."},{"issue":"8","key":"1872_CR12","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio, Y., Courville, A. & Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798\u20131828, https:\/\/doi.org\/10.1109\/TPAMI.2013.50 (2013).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1872_CR13","doi-asserted-by":"crossref","unstructured":"Wang, Y. et al. Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. J. Biomed. Inform. 102, 103364 (2020).","DOI":"10.1016\/j.jbi.2019.103364"},{"key":"1872_CR14","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1038\/nrg3208","volume":"13","author":"PB Jensen","year":"2012","unstructured":"Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395\u2013405 (2012).","journal-title":"Nat. Rev. Genet."},{"key":"1872_CR15","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1186\/s12890-023-02560-y","volume":"23","author":"NA Sathe","year":"2023","unstructured":"Sathe, N. A. et al. Evaluating construct validity of computable acute respiratory distress syndrome definitions in adults hospitalized with COVID-19: an electronic health records based approach. BMC Pulm. Med. 23, 292 (2023).","journal-title":"BMC Pulm. Med."},{"key":"1872_CR16","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1136\/amiajnl-2012-001145","volume":"20","author":"G Hripcsak","year":"2012","unstructured":"Hripcsak, G. & Albers, D. J. Next-generation phenotyping of electronic health records. J. Am. Med. Inform. Assoc. 20, 117\u2013121 (2012).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1872_CR17","doi-asserted-by":"crossref","unstructured":"Wilcox, A. B. & Hripcsak, G. The role of domain knowledge in automating medical text report classification. J. Am. Med. Inform. Assoc. 10, 330\u2013338 (2003).","DOI":"10.1197\/jamia.M1157"},{"key":"1872_CR18","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1016\/j.jbi.2017.04.009","volume":"70","author":"EHR-based phenotyping","year":"2017","unstructured":"EHR-based phenotyping Bulk learning and evaluation. J. Biomed. Inform. 70, 35\u201351 (2017).","journal-title":"J. Biomed. Inform."},{"key":"1872_CR19","doi-asserted-by":"publisher","first-page":"e54","DOI":"10.1542\/peds.2013-0819","volume":"133","author":"F Doshi-Velez","year":"2014","unstructured":"Doshi-Velez, F., Ge, Y. & Kohane, I. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics 133, e54\u2013e63 (2014).","journal-title":"Pediatrics"},{"key":"1872_CR20","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.aaa9364","author":"L Li","year":"2015","unstructured":"Li, L. et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. https:\/\/doi.org\/10.1126\/scitranslmed.aaa9364 (2015).","journal-title":"Sci. Transl. Med."},{"key":"1872_CR21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-37186-2","volume":"9","author":"X Zhang","year":"2019","unstructured":"Zhang, X. et al. Data-driven subtyping of Parkinson\u2019s disease using longitudinal clinical records: a Cohort study. Sci. Rep. 9, 1\u201312 (2019).","journal-title":"Sci. Rep."},{"key":"1872_CR22","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1494","volume":"13","author":"F Becker","year":"2023","unstructured":"Becker, F., Smilde, A. K. & Acar, E. Unsupervised EHR-based phenotyping via matrix and tensor decompositions. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 13, e1494 (2023).","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"1872_CR23","first-page":"787","volume":"2017","author":"E Choi","year":"2017","unstructured":"Choi, E., Bahadori, M. T., Song, L., Stewart, W. F. & Sun, J. GRAM: Graph-based attention model for healthcare representation learning. KDD 2017, 787\u2013795 (2017).","journal-title":"KDD"},{"key":"1872_CR24","doi-asserted-by":"crossref","unstructured":"Dash, S., Acharya, B. R., Mittal, M., Abraham, A. & Kelemen, A. Deep Learning Techniques for Biomedical and Health Informatics (Springer Nature, 2019).","DOI":"10.1007\/978-3-030-33966-1"},{"key":"1872_CR25","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1109\/JBHI.2017.2767063","volume":"22","author":"B Shickel","year":"2018","unstructured":"Shickel, B., Tighe, P. J., Bihorac, A. & Rashidi, P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inf. 22, 1589\u20131604 (2018).","journal-title":"IEEE J. Biomed. Health Inf."},{"key":"1872_CR26","unstructured":"Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, (2017)."},{"key":"1872_CR27","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)."},{"key":"1872_CR28","doi-asserted-by":"publisher","first-page":"e14325","DOI":"10.2196\/14325","volume":"7","author":"P Wu","year":"2019","unstructured":"Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inf. 7, e14325 (2019).","journal-title":"JMIR Med. Inf."},{"key":"1872_CR29","doi-asserted-by":"publisher","first-page":"e147","DOI":"10.1136\/amiajnl-2012-000896","volume":"20","author":"KM Newton","year":"2013","unstructured":"Newton, K. M. et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J. Am. Med. Inform. Assoc. 20, e147\u2013e154 (2013).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1872_CR30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-56847-4","volume":"10","author":"Y Li","year":"2020","unstructured":"Li, Y. et al. BEHRT: transformer for electronic health records. Sci. Rep. 10, 1\u201312 (2020).","journal-title":"Sci. Rep."},{"key":"1872_CR31","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1016\/S0895-4356(00)00363-2","volume":"54","author":"R Gijsen","year":"2001","unstructured":"Gijsen, R. et al. Causes and consequences of comorbidity: a review. J. Clin. Epidemiol. 54, 661\u2013674 (2001).","journal-title":"J. Clin. Epidemiol."},{"key":"1872_CR32","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1097\/QAI.0000000000001433","volume":"75","author":"TJ O\u2019Neill","year":"2017","unstructured":"O\u2019Neill, T. J., Nguemo, J. D., Tynan, A.-M., Burchell, A. N. & Antoniou, T. Risk of colorectal cancer and associated mortality in HIV: a systematic review and meta-analysis. J. Acquir. Immune Defic. Syndr. 75, 439 (2017).","journal-title":"J. Acquir. Immune Defic. Syndr."},{"key":"1872_CR33","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1038\/sj.bjc.6602273","volume":"92","author":"A Newnham","year":"2005","unstructured":"Newnham, A., Harris, J., Evans, H. S., Evans, B. G. & M\u00f8ller, H. The risk of cancer in HIV-infected people in southeast England: a cohort study. Br. J. Cancer 92, 194 (2005).","journal-title":"Br. J. Cancer"},{"key":"1872_CR34","doi-asserted-by":"crossref","unstructured":"Cooksley, C. D., Hwang, L. Y., Waller, D. K. & Ford, C. E. HIV-related malignancies: community-based study using linkage of cancer registry and HIV registry data. Int. J. STD AIDS 10, (1999).","DOI":"10.1258\/0956462991913574"},{"issue":"1","key":"1872_CR35","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1186\/s12885-015-1099-y","volume":"15","author":"Chen Chang-Hua","year":"2015","unstructured":"Chang-Hua, Chen., Chih-Yuan, Chung., Li-Hsuan, Wang., Che, Lin., Hsiu-Li, Lin. & Hsiu-Chen, Lin. Risk of cancer among HIV-infected patients from a population-based nested case-control study: implications for cancer prevention. BMC Cancer 15(1), 133 (2015).","journal-title":"BMC Cancer"},{"key":"1872_CR36","doi-asserted-by":"publisher","first-page":"5103","DOI":"10.1158\/0008-5472.CAN-15-2980","volume":"76","author":"G Fehringer","year":"2016","unstructured":"Fehringer, G. et al. Cross-cancer genome-wide analysis of lung, ovary, breast, prostate, and colorectal cancer reveals novel pleiotropic associations. Cancer Res. 76, 5103\u20135114 (2016).","journal-title":"Cancer Res."},{"key":"1872_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-020-18246-6","volume":"11","author":"SR Rashkin","year":"2020","unstructured":"Rashkin, S. R. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat. Commun. 11, 1\u201314 (2020).","journal-title":"Nat. Commun."},{"key":"1872_CR38","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1093\/jnci\/djz040","volume":"112","author":"ML Hawkins","year":"2019","unstructured":"Hawkins, M. L. et al. Endocrine and metabolic diseases among colorectal cancer survivors in a population-based cohort. J. Natl Cancer Inst. 112, 78\u201386 (2019).","journal-title":"J. Natl Cancer Inst."},{"key":"1872_CR39","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/s12029-011-9332-7","volume":"43","author":"M Barone","year":"2011","unstructured":"Barone, M. et al. Dietary, endocrine, and metabolic factors in the development of colorectal cancer. J. Gastrointest. Cancer 43, 13\u201319 (2011).","journal-title":"J. Gastrointest. Cancer"},{"key":"1872_CR40","doi-asserted-by":"publisher","unstructured":"Kenzik, K. M. et al. New-onset cardiovascular morbidity in older adults with Stage I to III colorectal cancer. J. Clin. Oncol. https:\/\/doi.org\/10.1200\/JCO.2017.74.9739 (2018).","DOI":"10.1200\/JCO.2017.74.9739"},{"key":"1872_CR41","doi-asserted-by":"crossref","unstructured":"Risk of arterial thromboembolism in patients with cancer. J. Am. Coll. Cardiol. 70, 926\u2013938 (2017).","DOI":"10.1016\/j.jacc.2017.06.047"},{"key":"1872_CR42","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1007\/s10238-017-0479-9","volume":"18","author":"R Dammacco","year":"2018","unstructured":"Dammacco, R. Systemic lupus erythematosus and ocular involvement: an overview. Clin. Exp. Med. 18, 135\u2013149 (2018).","journal-title":"Clin. Exp. Med."},{"key":"1872_CR43","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1007\/s00296-014-3129-5","volume":"35","author":"K Alderaan","year":"2015","unstructured":"Alderaan, K., Sekicki, V., Magder, L. S. & Petri, M. Risk factors for cataracts in systemic lupus erythematosus (SLE). Rheumatol. Int. 35, 701\u2013708 (2015).","journal-title":"Rheumatol. Int."},{"key":"1872_CR44","doi-asserted-by":"crossref","unstructured":"A national study of the complications of lupus in pregnancy. Am. J. Obstet. Gynecol. 199, 127.e1\u2013127.e6 (2008).","DOI":"10.1016\/j.ajog.2008.03.012"},{"key":"1872_CR45","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1097\/OGX.0b013e318239e1ee","volume":"66","author":"AN Baer","year":"2011","unstructured":"Baer, A. N., Witter, F. R. & Petri, M. Lupus and pregnancy. Obstet. Gynecol. Surv. 66, 639\u2013653 (2011).","journal-title":"Obstet. Gynecol. Surv."},{"key":"1872_CR46","doi-asserted-by":"publisher","first-page":"e191742","DOI":"10.1001\/jamaoncol.2019.1742","volume":"5","author":"AE Coghill","year":"2019","unstructured":"Coghill, A. E., Suneja, G., Rositch, A. F., Shiels, M. S. & Engels, E. A. HIV infection, cancer treatment regimens, and cancer outcomes among elderly adults in the United States. JAMA Oncol. 5, e191742\u2013e191742 (2019).","journal-title":"JAMA Oncol."},{"key":"1872_CR47","doi-asserted-by":"publisher","first-page":"934","DOI":"10.1001\/jama.1981.03310340024021","volume":"245","author":"DJ Wallace","year":"1981","unstructured":"Wallace, D. J. et al. Systemic lupus erythematosus\u2013survival patterns. Experience with 609 patients. JAMA 245, 934\u2013938 (1981).","journal-title":"JAMA"},{"key":"1872_CR48","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-023-45171-7","volume":"13","author":"J Mihailovic","year":"2023","unstructured":"Mihailovic, J. et al. Worse cardiovascular and renal outcome in male SLE patients. Sci. Rep. 13, 1\u201312 (2023).","journal-title":"Sci. Rep."},{"key":"1872_CR49","doi-asserted-by":"publisher","unstructured":"Yadav, P., Steinbach, M., Kumar, V. & Simon, G. Mining electronic health records (EHRs): a survey. ACM Comput. Surv. https:\/\/doi.org\/10.1145\/3127881 (2018).","DOI":"10.1145\/3127881"},{"key":"1872_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-021-00455-y","volume":"4","author":"L Rasmy","year":"2021","unstructured":"Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 1\u201313 (2021).","journal-title":"npj Digit. Med."},{"key":"1872_CR51","doi-asserted-by":"publisher","unstructured":"Zhu, Z. et al. Measuring patient similarities via a deep architecture with medical concept embedding. in 2016 IEEE 16th International Conference on Data Mining (ICDM) (IEEE, 2016). https:\/\/doi.org\/10.1109\/icdm.2016.0086.","DOI":"10.1109\/icdm.2016.0086"},{"key":"1872_CR52","doi-asserted-by":"publisher","first-page":"e0175508","DOI":"10.1371\/journal.pone.0175508","volume":"12","author":"W-Q Wei","year":"2017","unstructured":"Wei, W.-Q. et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE 12, e0175508 (2017).","journal-title":"PLoS ONE"},{"issue":"1","key":"1872_CR53","doi-asserted-by":"publisher","first-page":"e25","DOI":"10.1097\/PTS.0000000000001081","volume":"19","author":"A Ram","year":"2023","unstructured":"Ram, A. & Dixit Christian, L. Electronic Health Record Use Issues and Diagnostic Error: A Scoping Review and Framework. J Patient Saf 19(1), e25\u2013e30, https:\/\/doi.org\/10.1097\/PTS.0000000000001081 (2023).","journal-title":"J Patient Saf"},{"key":"1872_CR54","doi-asserted-by":"crossref","unstructured":"Sigall K., Bell Tom, Delbanco Joann G., Elmore Patricia S., Fitzgerald Alan, Fossa Kendall, Harcourt Suzanne G., Leveille Thomas H., Payne Rebecca A., Stametz Jan, Walker Catherine M., DesRoches (2020) Frequency and Types of Patient-Reported Errors in Electronic Health Record Ambulatory Care Notes JAMA Network Open 3(6) e205867-10.1001\/jamanetworkopen.2020.5867","DOI":"10.1001\/jamanetworkopen.2020.5867"},{"key":"1872_CR55","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937","author":"DM Blei","year":"2003","unstructured":"Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. https:\/\/doi.org\/10.5555\/944919.944937 (2003).","journal-title":"J. Mach. Learn. Res."},{"key":"1872_CR56","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1007\/s40471-018-0165-9","volume":"5","author":"J Wong","year":"2018","unstructured":"Wong, J., Murray Horwitz, M., Zhou, L. & Toh, S. Using machine learning to identify health outcomes from electronic health record data. Curr. Epidemiol. Rep. 5, 331\u2013342 (2018).","journal-title":"Curr. Epidemiol. Rep."},{"key":"1872_CR57","unstructured":"Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations 2015, San Diego; arXiv:1412.6980 (2015)."},{"key":"1872_CR58","doi-asserted-by":"publisher","unstructured":"Reimers, N. & Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics, Stroudsburg, PA, USA, 2019). https:\/\/doi.org\/10.18653\/v1\/d19-1410.","DOI":"10.18653\/v1\/d19-1410"},{"key":"1872_CR59","doi-asserted-by":"crossref","unstructured":"Gonzalez-Diaz, R., Guti\u00e9rrez-Naranjo, M. A. & Paluzo-Hidalgo, E. Two-hidden-layer feed-forward networks are universal approximators: a constructive approach. Neural Netw. 131, 29\u201336 (2020).","DOI":"10.1016\/j.neunet.2020.07.021"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01872-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01872-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01872-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T22:19:48Z","timestamp":1757456388000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01872-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,14]]},"references-count":59,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1872"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01872-z","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,14]]},"assertion":[{"value":"19 June 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"521"}}