{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T05:59:49Z","timestamp":1780379989837,"version":"3.54.1"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T00:00:00Z","timestamp":1738022400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T00:00:00Z","timestamp":1738022400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes a BERT-based natural language processing model to extract phenotypes from clinical texts in EHRs and employs five new diagnostic models for differential diagnoses of rare diseases. The AI system was developed and evaluated on diverse, multi-country rare disease datasets, comprising 2271 cases with 431 rare diseases. In 1936 test cases, PhenoBrain achieved an average predicted top-3 recall of 0.513 and a top-10 recall of 0.654, surpassing 13 leading prediction methods. In a human-computer study with 75 cases, PhenoBrain exhibited exceptional performance with a top-3 recall of 0.613 and a top-10 recall of 0.813, surpassing the performance of 50 specialist physicians and large language models like ChatGPT and GPT-4. Combining PhenoBrain\u2019s predictions with specialists increased the top-3 recall to 0.768, demonstrating its potential to enhance diagnostic accuracy in clinical workflows.<\/jats:p>","DOI":"10.1038\/s41746-025-01452-1","type":"journal-article","created":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T15:58:33Z","timestamp":1738079913000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs"],"prefix":"10.1038","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4636-4381","authenticated-orcid":false,"given":"Xiaohao","family":"Mao","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yu","family":"Huang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ye","family":"Jin","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lun","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1214-6130","authenticated-orcid":false,"given":"Xuanzhong","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Honghong","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xinglin","family":"Yang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haopeng","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaodong","family":"Luan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ying","family":"Xiao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Siqin","family":"Feng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiahao","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9684-5643","authenticated-orcid":false,"given":"Xuegong","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rui","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shuyang","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3228-9166","authenticated-orcid":false,"given":"Ting","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,1,28]]},"reference":[{"key":"1452_CR1","doi-asserted-by":"publisher","first-page":"2039","DOI":"10.1016\/S0140-6736(08)60872-7","volume":"371","author":"A Schieppati","year":"2008","unstructured":"Schieppati, A., Henter, J.-I., Daina, E. & Aperia, A. Why rare diseases are an important medical and social issue. Lancet 371, 2039\u20132041 (2008).","journal-title":"Lancet"},{"key":"1452_CR2","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1038\/d41573-019-00180-y","volume":"19","author":"M Haendel","year":"2020","unstructured":"Haendel, M. et al. How many rare diseases are there? Nat. Rev. Drug Discov. 19, 77\u201378 (2020).","journal-title":"Nat. Rev. Drug Discov."},{"key":"1452_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13023-022-02299-5","volume":"17","author":"G Yang","year":"2022","unstructured":"Yang, G. et al. The national economic burden of rare disease in the United States in 2019. Orphanet J. Rare Dis. 17, 1\u201311 (2022).","journal-title":"Orphanet J. Rare Dis."},{"key":"1452_CR4","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1007\/s40273-023-01262-x","volume":"41","author":"GR Currie","year":"2023","unstructured":"Currie, G. R. et al. Developing a framework of cost elements of socioeconomic burden of rare disease: a scoping review. Pharmacoeconomics 41, 803\u2013818 (2023).","journal-title":"Pharmacoeconomics"},{"key":"1452_CR5","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1093\/jamiaopen\/ooaa030","volume":"3","author":"YR Rubinstein","year":"2020","unstructured":"Rubinstein, Y. R. et al. The case for open science: rare diseases. JAMIA open 3, 472\u2013486 (2020).","journal-title":"JAMIA open"},{"key":"1452_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13073-022-01026-w","volume":"14","author":"S Marwaha","year":"2022","unstructured":"Marwaha, S., Knowles, J. W. & Ashley, E. A. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med. 14, 1\u201322 (2022).","journal-title":"Genome Med."},{"key":"1452_CR7","doi-asserted-by":"crossref","unstructured":"Evans, W. R. Dare to think rare. Diagnostic delay and rare diseases. Br. J. Gen. Pract. 68, 224\u2013225 (2018).","DOI":"10.3399\/bjgp18X695957"},{"key":"1452_CR8","doi-asserted-by":"publisher","first-page":"D514","DOI":"10.1093\/nar\/gki033","volume":"33","author":"A Hamosh","year":"2005","unstructured":"Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514\u2013D517 (2005).","journal-title":"Nucleic Acids Res."},{"key":"1452_CR9","first-page":"46","volume":"672","author":"S Ayme","year":"2003","unstructured":"Ayme, S. Orphanet, an information site on rare diseases. Soins 672, 46\u201347 (2003).","journal-title":"Soins"},{"key":"1452_CR10","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1016\/j.ajhg.2008.09.017","volume":"83","author":"PN Robinson","year":"2008","unstructured":"Robinson, P. N. et al. The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 83, 610\u2013615 (2008).","journal-title":"Am. J. Hum. Genet."},{"key":"1452_CR11","doi-asserted-by":"publisher","first-page":"D865","DOI":"10.1093\/nar\/gkw1039","volume":"45","author":"S K\u00f6hler","year":"2017","unstructured":"K\u00f6hler, S. et al. The human phenotype ontology in 2017. Nucleic Acids Res. 45, D865\u2013D876 (2017).","journal-title":"Nucleic Acids Res."},{"key":"1452_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13023-021-02130-7","volume":"16","author":"J Guo","year":"2021","unstructured":"Guo, J. et al. National Rare Diseases Registry System (NRDRS): China\u2019s first nation-wide rare diseases demographic analyses. Orphanet J. Rare Dis. 16, 1\u20137 (2021).","journal-title":"Orphanet J. Rare Dis."},{"key":"1452_CR13","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad172","volume":"24","author":"W Zhai","year":"2023","unstructured":"Zhai, W., Huang, X., Shen, N. & Zhu, S. Phen2Disease: a phenotype-driven model for disease and gene prioritization by bidirectional maximum matching semantic similarities. Brief. Bioinform. 24, bbad172 (2023).","journal-title":"Brief. Bioinform."},{"key":"1452_CR14","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/j.ajhg.2020.06.021","volume":"107","author":"PN Robinson","year":"2020","unstructured":"Robinson, P. N. et al. Interpretable clinical genomics with a likelihood ratio paradigm. Am. J. Hum. Genet. 107, 403\u2013417 (2020).","journal-title":"Am. J. Hum. Genet."},{"key":"1452_CR15","doi-asserted-by":"publisher","first-page":"2126","DOI":"10.1038\/s41436-019-0439-8","volume":"21","author":"Q Li","year":"2019","unstructured":"Li, Q., Zhao, K., Bustamante, C. D., Ma, X. & Wong, W. H. Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis. Genet. Med. 21, 2126\u20132134 (2019).","journal-title":"Genet. Med."},{"key":"1452_CR16","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1038\/s41436-018-0072-y","volume":"21","author":"KA Jagadeesh","year":"2019","unstructured":"Jagadeesh, K. A. et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization. Genet. Med. 21, 464\u2013470 (2019).","journal-title":"Genet. Med."},{"key":"1452_CR17","doi-asserted-by":"publisher","DOI":"10.1093\/nargab\/lqaa032","volume":"2","author":"M Zhao","year":"2020","unstructured":"Zhao, M. et al. Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genomics Bioinform. 2, lqaa032 (2020).","journal-title":"NAR Genomics Bioinform."},{"key":"1452_CR18","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1016\/j.ajhg.2018.05.010","volume":"103","author":"JH Son","year":"2018","unstructured":"Son, J. H. et al. Deep phenotyping on electronic health records facilitates genetic diagnosis by clinical exomes. Am. J. Hum. Genet. 103, 58\u201373 (2018).","journal-title":"Am. J. Hum. Genet."},{"key":"1452_CR19","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.aau9113","volume":"12","author":"J Birgmeier","year":"2020","unstructured":"Birgmeier, J. et al. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci. Transl. Med. 12, eaau9113 (2020).","journal-title":"Sci. Transl. Med."},{"key":"1452_CR20","doi-asserted-by":"publisher","first-page":"1585","DOI":"10.1038\/s41436-018-0381-1","volume":"21","author":"CA Deisseroth","year":"2019","unstructured":"Deisseroth, C. A. et al. ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis. Genet. Med. 21, 1585\u20131593 (2019).","journal-title":"Genet. Med."},{"key":"1452_CR21","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1016\/j.ajhg.2009.09.003","volume":"85","author":"S K\u00f6hler","year":"2009","unstructured":"K\u00f6hler, S. et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457\u2013464 (2009).","journal-title":"Am. J. Hum. Genet."},{"key":"1452_CR22","doi-asserted-by":"publisher","first-page":"2502","DOI":"10.1093\/bioinformatics\/bts471","volume":"28","author":"S Bauer","year":"2012","unstructured":"Bauer, S., K\u00f6hler, S., Schulz, M. H. & Robinson, P. N. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics 28, 2502\u20132508 (2012).","journal-title":"Bioinformatics"},{"key":"1452_CR23","doi-asserted-by":"publisher","first-page":"528","DOI":"10.1016\/j.ijmedinf.2013.01.005","volume":"82","author":"R Dragusin","year":"2013","unstructured":"Dragusin, R. et al. FindZebra: a search engine for rare diseases. Int. J. Med. Inform. 82, 528\u2013538 (2013).","journal-title":"Int. J. Med. Inform."},{"key":"1452_CR24","doi-asserted-by":"publisher","first-page":"1057","DOI":"10.1002\/humu.22347","volume":"34","author":"M Girdea","year":"2013","unstructured":"Girdea, M. et al. PhenoTips: patient phenotyping software for clinical and research use. Hum. Mutat. 34, 1057\u20131065 (2013).","journal-title":"Hum. Mutat."},{"key":"1452_CR25","doi-asserted-by":"crossref","unstructured":"Peng, J. et al. Measuring phenotype semantic similarity using human phenotype ontology. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 763\u2013766 (2016).","DOI":"10.1109\/BIBM.2016.7822617"},{"key":"1452_CR26","doi-asserted-by":"publisher","first-page":"1184","DOI":"10.1109\/TII.2017.2686380","volume":"13","author":"M Pinol","year":"2017","unstructured":"Pinol, M. et al. Rare disease discovery: an optimized disease ranking system. IEEE Trans. Ind. Inform. 13, 1184\u20131192 (2017).","journal-title":"IEEE Trans. Ind. Inform."},{"key":"1452_CR27","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1186\/s12859-018-2064-y","volume":"19","author":"X Gong","year":"2018","unstructured":"Gong, X., Jiang, J., Duan, Z. & Lu, H. A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology. BMC Bioinformatics 19, 111\u2013119 (2018).","journal-title":"BMC Bioinformatics"},{"key":"1452_CR28","doi-asserted-by":"publisher","first-page":"587","DOI":"10.3389\/fgene.2018.00587","volume":"9","author":"J Jia","year":"2018","unstructured":"Jia, J. et al. RDAD: a machine learning system to support phenotype-based rare disease diagnosis. Front. Genet. 9, 587 (2018).","journal-title":"Front. Genet."},{"key":"1452_CR29","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1038\/s41436-018-0050-4","volume":"21","author":"J Chen","year":"2019","unstructured":"Chen, J. et al. Novel phenotype\u2013disease matching tool for rare genetic diseases. Genet. Med. 21, 339\u2013346 (2019).","journal-title":"Genet. Med."},{"key":"1452_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13023-019-1040-6","volume":"14","author":"S Ronicke","year":"2019","unstructured":"Ronicke, S. et al. Can a decision support system accelerate rare disease diagnosis? Evaluating the potential impact of Ada DX in a retrospective study. Orphanet J. Rare Dis. 14, 1\u201312 (2019).","journal-title":"Orphanet J. Rare Dis."},{"key":"1452_CR31","doi-asserted-by":"crossref","unstructured":"Chen, Z., Balan, M. M. & Brown, K. Boosting transformers and language models for clinical prediction in immunotherapy. arXiv https:\/\/aclanthology.org\/2023.acl-industry.32.pdf (2023).","DOI":"10.18653\/v1\/2023.acl-industry.32"},{"key":"1452_CR32","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","volume":"388","author":"P Lee","year":"2023","unstructured":"Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 1233\u20131239 (2023).","journal-title":"N. Engl. J. Med."},{"key":"1452_CR33","doi-asserted-by":"publisher","unstructured":"Reese, J. T. et al. On the limitations of large language models in clinical diagnosis. medRxiv https:\/\/doi.org\/10.1101\/2023.07.13.23292613 (2023).","DOI":"10.1101\/2023.07.13.23292613"},{"key":"1452_CR34","doi-asserted-by":"publisher","first-page":"1884","DOI":"10.1093\/bioinformatics\/btab019","volume":"37","author":"L Luo","year":"2021","unstructured":"Luo, L. et al. PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology. Bioinformatics 37, 1884\u20131890 (2021).","journal-title":"Bioinformatics"},{"key":"1452_CR35","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.1109\/TCBB.2022.3170301","volume":"20","author":"Y Feng","year":"2022","unstructured":"Feng, Y., Qi, L. & Tian, W. PhenoBERT: a combined deep learning method for automated recognition of human phenotype ontology. IEEE\/ACM Trans. Comput. Biol. Bioinform. 20, 1269\u20131277 (2022).","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"1452_CR36","doi-asserted-by":"publisher","first-page":"6154","DOI":"10.1073\/pnas.1516510113","volume":"113","author":"K Deng","year":"2016","unstructured":"Deng, K., Bol, P. K., Li, K. J. & Liu, J. S. On the unsupervised analysis of domain-specific Chinese texts. Proc. Natl Acad. Sci. 113, 6154\u20136159 (2016).","journal-title":"Proc. Natl Acad. Sci."},{"key":"1452_CR37","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv https:\/\/arxiv.org\/abs\/1810.04805 (2018)."},{"key":"1452_CR38","unstructured":"Lan, Z. et al. Albert: A lite bert for self-supervised learning of language representations. arXiv https:\/\/arxiv.org\/abs\/1909.11942 (2019)."},{"key":"1452_CR39","doi-asserted-by":"crossref","unstructured":"Hu, J., Lu, J. & Tan, Y.-P. Discriminative deep metric learning for face verification in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 1875\u20131882 (2014).","DOI":"10.1109\/CVPR.2014.242"},{"key":"1452_CR40","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1038\/nbt1203","volume":"24","author":"S Aerts","year":"2006","unstructured":"Aerts, S. et al. Gene prioritization through genomic data fusion. Nat. Biotechnol. 24, 537\u2013544 (2006).","journal-title":"Nat. Biotechnol."},{"key":"1452_CR41","doi-asserted-by":"publisher","first-page":"E1081","DOI":"10.1002\/humu.21169","volume":"31","author":"T T\u00f6pel","year":"2010","unstructured":"T\u00f6pel, T., Scheible, D., Trefz, F. & Hofest\u00e4dt, R. RAMEDIS: a comprehensive information system for variations and corresponding phenotypes of rare metabolic diseases. Hum. Mutat. 31, E1081\u2013E1088 (2010).","journal-title":"Hum. Mutat."},{"key":"1452_CR42","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1002\/humu.22850","volume":"36","author":"OJ Buske","year":"2015","unstructured":"Buske, O. J. et al. The Matchmaker Exchange API: automating patient matching through the exchange of structured phenotypic and genotypic profiles. Hum. Mutat. 36, 922\u2013927 (2015).","journal-title":"Hum. Mutat."},{"key":"1452_CR43","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkh061","volume":"32","author":"O Bodenreider","year":"2004","unstructured":"Bodenreider, O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267\u2013D270 (2004).","journal-title":"Nucleic Acids Res."},{"key":"1452_CR44","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1016\/j.ajhg.2015.06.009","volume":"97","author":"JX Chong","year":"2015","unstructured":"Chong, J. X. et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am. J. Hum. Genet. 97, 199\u2013215 (2015).","journal-title":"Am. J. Hum. Genet."},{"key":"1452_CR45","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1038\/nmeth.3484","volume":"12","author":"H Yang","year":"2015","unstructured":"Yang, H., Robinson, P. N. & Wang, K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 12, 841\u2013843 (2015).","journal-title":"Nat. Methods"},{"key":"1452_CR46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12920-018-0372-8","volume":"11","author":"A Rao","year":"2018","unstructured":"Rao, A. et al. Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks. BMC Med. Genomics 11, 1\u201312 (2018).","journal-title":"BMC Med. Genomics"},{"key":"1452_CR47","first-page":"1","volume":"5","author":"J Yang","year":"2023","unstructured":"Yang, J. et al. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. Patterns 5, 1 (2023).","journal-title":"Patterns"},{"key":"1452_CR48","doi-asserted-by":"publisher","first-page":"D1018","DOI":"10.1093\/nar\/gky1105","volume":"47","author":"S K\u00f6hler","year":"2019","unstructured":"K\u00f6hler, S. et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47, D1018\u2013D1027 (2019).","journal-title":"Nucleic Acids Res."},{"key":"1452_CR49","doi-asserted-by":"publisher","first-page":"D1207","DOI":"10.1093\/nar\/gkaa1043","volume":"49","author":"S K\u00f6hler","year":"2021","unstructured":"K\u00f6hler, S. et al. The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207\u2013D1217 (2021).","journal-title":"Nucleic Acids Res."},{"key":"1452_CR50","unstructured":"Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv https:\/\/arxiv.org\/abs\/1710.09412 (2017)."},{"key":"1452_CR51","unstructured":"Rennie, J. D., Shih, L., Teevan, J. & Karger, D. R. Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (ICML-03). 616\u2013623 (2003)."},{"key":"1452_CR52","unstructured":"Hinton, G. E. Connectionist learning procedures. In: Machine learning. 555\u2013610 (1990)."},{"key":"1452_CR53","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1126\/science.1087447","volume":"302","author":"JM Stuart","year":"2003","unstructured":"Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. science 302, 249\u2013255 (2003).","journal-title":"science"},{"key":"1452_CR54","doi-asserted-by":"publisher","first-page":"e1000443","DOI":"10.1371\/journal.pcbi.1000443","volume":"5","author":"C Pesquita","year":"2009","unstructured":"Pesquita, C., Faria, D., Falcao, A. O., Lord, P. & Couto, F. M. Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5, e1000443 (2009).","journal-title":"PLoS Comput. Biol."},{"key":"1452_CR55","unstructured":"Lin, D. An information-theoretic definition of similarity. In: Icml 98, 296\u2013304 (1998)."},{"key":"1452_CR56","unstructured":"Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv https:\/\/arxiv.org\/abs\/cmp-lg\/9709008 (1997)."},{"key":"1452_CR57","first-page":"38","volume":"37","author":"C Pesquita","year":"2007","unstructured":"Pesquita, C., Faria, D., Bastos, H., Falcao, A. & Couto, F. Evaluating GO-based semantic similarity measures. Proc. 10th Annu. Bio-Ontologies Meet. 37, 38 (2007).","journal-title":"Proc. 10th Annu. Bio-Ontologies Meet."},{"key":"1452_CR58","unstructured":"Gentleman, R. Visualizing and distances using GO. http:\/\/www.bioconductor.org\/docs\/vignettes.html, 38 (2005)."},{"key":"1452_CR59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-9-327","volume":"9","author":"M Mistry","year":"2008","unstructured":"Mistry, M. & Pavlidis, P. Gene Ontology term overlap as a measure of gene functional similarity. BMC Bioinformatics 9, 1\u201311 (2008).","journal-title":"BMC Bioinformatics"},{"key":"1452_CR60","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1136\/jamia.2009.002733","volume":"17","author":"AR Aronson","year":"2010","unstructured":"Aronson, A. R. & Lang, F.-M. An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229\u2013236 (2010).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1452_CR61","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1136\/jamia.2009.001560","volume":"17","author":"GK Savova","year":"2010","unstructured":"Savova, G. K. et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507\u2013513 (2010).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1452_CR62","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.nurpra.2008.09.020","volume":"5","author":"J DiSantostefano","year":"2009","unstructured":"DiSantostefano, J. International classification of diseases 10th revision (ICD-10). J. Nurse Pract. 5, 56\u201357 (2009).","journal-title":"J. Nurse Pract."},{"key":"1452_CR63","doi-asserted-by":"publisher","first-page":"741","DOI":"10.4065\/81.6.741","volume":"81","author":"PL Elkin","year":"2006","unstructured":"Elkin, P. L. et al. Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. Mayo Clin. Proc. 81, 741\u2013748 (2006).","journal-title":"Mayo Clin. Proc."},{"key":"1452_CR64","first-page":"265","volume":"88","author":"CE Lipscomb","year":"2000","unstructured":"Lipscomb, C. E. Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88, 265 (2000).","journal-title":"Bull. Med. Libr. Assoc."},{"key":"1452_CR65","doi-asserted-by":"publisher","first-page":"471","DOI":"10.1162\/tacl_a_00074","volume":"5","author":"R Dror","year":"2017","unstructured":"Dror, R., Baumer, G., Bogomolov, M. & Reichart, R. Replicability analysis for natural language processing: testing significance with multiple datasets. Trans. Assoc. Comput. Linguist. 5, 471\u2013486 (2017).","journal-title":"Trans. Assoc. Comput. Linguist."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01452-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01452-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01452-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,29]],"date-time":"2025-01-29T03:22:56Z","timestamp":1738120976000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01452-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,28]]},"references-count":65,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1452"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01452-1","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,28]]},"assertion":[{"value":"6 September 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"68"}}