{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T23:20:51Z","timestamp":1776468051436,"version":"3.51.2"},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T00:00:00Z","timestamp":1621468800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T00:00:00Z","timestamp":1621468800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP170668"],"award-info":[{"award-number":["RP170668"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP160015"],"award-info":[{"award-number":["RP160015"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP170668"],"award-info":[{"award-number":["RP170668"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["19GPSGC35180031"],"award-info":[{"award-number":["19GPSGC35180031"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["19GPSGC35180031"],"award-info":[{"award-number":["19GPSGC35180031"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21\u20136.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.<\/jats:p>","DOI":"10.1038\/s41746-021-00455-y","type":"journal-article","created":{"date-parts":[[2021,5,20]],"date-time":"2021-05-20T10:03:08Z","timestamp":1621504988000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":679,"title":["Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction"],"prefix":"10.1038","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2644-4908","authenticated-orcid":false,"given":"Laila","family":"Rasmy","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1395-6805","authenticated-orcid":false,"given":"Yang","family":"Xiang","sequence":"additional","affiliation":[]},{"given":"Ziqian","family":"Xie","sequence":"additional","affiliation":[]},{"given":"Cui","family":"Tao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7754-1890","authenticated-orcid":false,"given":"Degui","family":"Zhi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,5,20]]},"reference":[{"key":"455_CR1","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1136\/svn-2017-000101","volume":"2","author":"F Jiang","year":"2017","unstructured":"Jiang, F. et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2, 230\u2013243 (2017).","journal-title":"Stroke Vasc. Neurol."},{"key":"455_CR2","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1038\/s41551-018-0305-z","volume":"2","author":"K-H Yu","year":"2018","unstructured":"Yu, K.-H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719\u2013731 (2018).","journal-title":"Nat. Biomed. Eng."},{"key":"455_CR3","doi-asserted-by":"publisher","first-page":"8869","DOI":"10.1109\/ACCESS.2017.2694446","volume":"5","author":"M Chen","year":"2017","unstructured":"Chen, M., Hao, Y., Hwang, K., Wang, L. & Wang, L. Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869\u20138879 (2017).","journal-title":"IEEE Access"},{"key":"455_CR4","doi-asserted-by":"publisher","first-page":"1968","DOI":"10.1109\/TCBB.2018.2827029","volume":"15","author":"H Wang","year":"2018","unstructured":"Wang, H. et al. Predicting hospital readmission via cost-sensitive deep learning. IEEE\/ACM Trans. Comput. Biol. Bioinforma. 15, 1968\u20131978 (2018).","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinforma."},{"key":"455_CR5","doi-asserted-by":"publisher","first-page":"94","DOI":"10.7861\/futurehosp.6-2-94","volume":"6","author":"T Davenport","year":"2019","unstructured":"Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Healthc. J. 6, 94 (2019).","journal-title":"Future Healthc. J."},{"key":"455_CR6","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1007\/s41649-019-00096-0","volume":"11","author":"T Lysaght","year":"2019","unstructured":"Lysaght, T., Lim, H. Y., Xafis, V. & Ngiam, K. Y. AI-assisted decision-making in healthcare. Asian Bioeth. Rev. 11, 299\u2013314 (2019).","journal-title":"Asian Bioeth. Rev."},{"key":"455_CR7","doi-asserted-by":"publisher","unstructured":"Ahmed, Z., Mohamed, K., Zeeshan, S. & Dong, X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database 2020, baaa010 (2020). https:\/\/doi.org\/10.1093\/database\/baaa010.","DOI":"10.1093\/database\/baaa010"},{"key":"455_CR8","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1504\/IJAIP.2018.089494","volume":"10","author":"G Manogaran","year":"2018","unstructured":"Manogaran, G. & Lopez, D. Health data analytics using scalable logistic regression with stochastic gradient descent. Int. J. Adv. Intell. Paradig. 10, 118\u2013132 (2018).","journal-title":"Int. J. Adv. Intell. Paradig."},{"key":"455_CR9","first-page":"306","volume":"15","author":"T Keerthika","year":"2019","unstructured":"Keerthika, T. & Premalatha, K. An effective feature selection for heart disease prediction with aid of hybrid kernel SVM. Int. J. Bus. Intell. Data Min. 15, 306\u2013326 (2019).","journal-title":"Int. J. Bus. Intell. Data Min."},{"key":"455_CR10","first-page":"1","volume":"3","author":"RM Sadek","year":"2019","unstructured":"Sadek, R. M. et al. Parkinson\u2019s disease prediction using artificial neural network. Int. J. Academic Health Med. Res. 3, 1\u20138 (2019).","journal-title":"Int. J. Academic Health Med. Res."},{"key":"455_CR11","unstructured":"Payan, A. & Montana, G. Predicting Alzheimer\u2019s disease: a neuroimaging study with 3D convolutional neural networks. Preprint at http:\/\/arxiv.org\/abs\/1502.02506 (2015)."},{"key":"455_CR12","unstructured":"Choi, E. et al. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. Adv. Neural Inf. Process. Syst. 29, 3504\u20133512 (2016)"},{"key":"455_CR13","unstructured":"Choi, E., Bahadori, M. T., Schuetz, A., Stewart, W. F. & Sun, J. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. In Machine Learning for Healthcare Conference, 301\u2013318 (MLHC, 2016)."},{"key":"455_CR14","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-018-0029-1","volume":"1","author":"A Rajkomar","year":"2018","unstructured":"Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digital Med. 1, 18 (2018).","journal-title":"NPJ Digital Med."},{"key":"455_CR15","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1038\/nature21056","volume":"542","author":"A Esteva","year":"2017","unstructured":"Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115\u2013118 (2017).","journal-title":"Nature"},{"key":"455_CR16","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1038\/s41551-018-0195-0","volume":"2","author":"R Poplin","year":"2018","unstructured":"Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158 (2018).","journal-title":"Nat. Biomed. Eng."},{"key":"455_CR17","doi-asserted-by":"publisher","first-page":"1559","DOI":"10.1038\/s41591-018-0177-5","volume":"24","author":"N Coudray","year":"2018","unstructured":"Coudray, N. et al. Classification and mutation prediction from non\u2013small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559\u20131567 (2018).","journal-title":"Nat. Med."},{"key":"455_CR18","doi-asserted-by":"publisher","first-page":"468","DOI":"10.1080\/17453674.2018.1453714","volume":"89","author":"SW Chung","year":"2018","unstructured":"Chung, S. W. et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 89, 468\u2013473 (2018).","journal-title":"Acta Orthop."},{"key":"455_CR19","doi-asserted-by":"publisher","first-page":"e10010","DOI":"10.2196\/10010","volume":"7","author":"J Shen","year":"2019","unstructured":"Shen, J. et al. Artificial intelligence versus clinicians in disease diagnosis: systematic review. JMIR Med. Inform. 7, e10010 (2019).","journal-title":"JMIR Med. Inform."},{"key":"455_CR20","unstructured":"Sun, C., Shrivastava, A., Singh, S. & Gupta, A. In Proceedings of the IEEE International Conference on Computer Vision, 843\u2013852."},{"key":"455_CR21","unstructured":"Cho, J., Lee, K., Shin, E., Choy, G. & Do, S. How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? Preprint at https:\/\/arxiv.org\/abs\/1511.06348 (2015)."},{"key":"455_CR22","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-017-0538-x","volume":"17","author":"M-L Gentil","year":"2017","unstructured":"Gentil, M.-L. et al. Factors influencing the development of primary care data collection projects from electronic health records: a systematic review of the literature. BMC Med. Inform. Decis. Mak. 17, 139 (2017).","journal-title":"BMC Med. Inform. Decis. Mak."},{"key":"455_CR23","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","volume":"22","author":"SJ Pan","year":"2009","unstructured":"Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345\u20131359 (2009).","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"455_CR24","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111\u20133119 (NIPS, 2013)."},{"key":"455_CR25","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R. & Manning, C. D. Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532\u20131543 (ACL, 2014).","DOI":"10.3115\/v1\/D14-1162"},{"key":"455_CR26","doi-asserted-by":"crossref","unstructured":"Peters, M. et al. Deep Contextualized Word Representations. in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227\u20132237 (ACL, 2018).","DOI":"10.18653\/v1\/N18-1202"},{"key":"455_CR27","unstructured":"Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training. https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/researchcovers\/languageunsupervised\/languageunderstandingpaper.pdf (2018)."},{"key":"455_CR28","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).","journal-title":"OpenAI Blog"},{"key":"455_CR29","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171\u20134186 (ACL, 2019)."},{"key":"455_CR30","unstructured":"Yang, Z. et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems 32, 5754\u20135764 (NIPS, 2019)."},{"key":"455_CR31","unstructured":"Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning, 1597\u20131607 (ICML, 2020)."},{"key":"455_CR32","doi-asserted-by":"crossref","unstructured":"Sun, C., Myers, A., Vondrick, C., Murphy, K. & Schmid, C. VideoBERT: A Joint Model for Video and Language Representation Learning. In Proceedings of the IEEE International Conference on Computer Vision, 7464\u20137473 (IEEE, 2019).","DOI":"10.1109\/ICCV.2019.00756"},{"key":"455_CR33","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234\u20131240 (2020).","journal-title":"Bioinformatics"},{"key":"455_CR34","doi-asserted-by":"crossref","unstructured":"Alsentzer, E. et al. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72\u201378 (ACL, 2019).","DOI":"10.18653\/v1\/W19-1909"},{"key":"455_CR35","doi-asserted-by":"crossref","unstructured":"Zhang, Z. et al. ERNIE: Enhanced Language Representation with Informative Entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1441\u20131451 (ACL, 2019).","DOI":"10.18653\/v1\/P19-1139"},{"key":"455_CR36","unstructured":"Lan, Z. et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations (ICLR, 2019)."},{"key":"455_CR37","doi-asserted-by":"crossref","unstructured":"Adhikari, A., Ram, A., Tang, R., Hamilton, W. L. & Lin, J. Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT. In Proceedings of the 5th Workshop on Representation Learning for NLP, 72\u201377 (ACL, 2020).","DOI":"10.18653\/v1\/2020.repl4nlp-1.10"},{"key":"455_CR38","doi-asserted-by":"crossref","unstructured":"Pires, T., Schlinger, E. & Garrette, D. How Multilingual is Multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4996\u20135001 (ACL, 2019).","DOI":"10.18653\/v1\/P19-1493"},{"key":"455_CR39","doi-asserted-by":"crossref","unstructured":"Beltagy, I., Lo, K. & Cohan, A. SciBERT: a pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3606\u20133611 (ACL, 2019).","DOI":"10.18653\/v1\/D19-1371"},{"key":"455_CR40","unstructured":"Huang, K., Altosaar, J. & Ranganath, R. ClinicalBert: modeling clinical notes and predicting hospital readmission. Preprint at http:\/\/arxiv.org\/abs\/1904.05342 (2019)."},{"key":"455_CR41","doi-asserted-by":"publisher","first-page":"1628","DOI":"10.1056\/NEJMsa0900592","volume":"360","author":"AK Jha","year":"2009","unstructured":"Jha, A. K. et al. Use of electronic health records in US hospitals. N. Engl. J. Med. 360, 1628\u20131638 (2009).","journal-title":"N. Engl. J. Med."},{"key":"455_CR42","doi-asserted-by":"publisher","first-page":"501","DOI":"10.1056\/NEJMp1006114","volume":"363","author":"D Blumenthal","year":"2010","unstructured":"Blumenthal, D. & Tavenner, M. The \u201cmeaningful use\u201d regulation for electronic health records. N. Engl. J. Med. 363, 501\u2013504 (2010).","journal-title":"N. Engl. J. Med."},{"key":"455_CR43","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1007\/s41666-019-00062-3","volume":"4","author":"P Gupta","year":"2020","unstructured":"Gupta, P., Malhotra, P., Narwariya, J., Vig, L. & Shroff, G. Transfer learning for clinical time series analysis using deep neural networks. J. Healthc. Inform. Res. 4, 112\u2013137 (2020).","journal-title":"J. Healthc. Inform. Res."},{"key":"455_CR44","first-page":"295","volume":"25","author":"AL Beam","year":"2020","unstructured":"Beam, A. L. et al. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. Pac. Symp. Biocomput. 25, 295\u2013306 (2020).","journal-title":"Pac. Symp. Biocomput."},{"key":"455_CR45","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-019-0766-3","volume":"19","author":"Y Xiang","year":"2019","unstructured":"Xiang, Y. et al. Time-sensitive clinical concept embeddings learned from large electronic health records. BMC Med. Inf. Decis. Mak. 19, 58 (2019).","journal-title":"BMC Med. Inf. Decis. Mak."},{"key":"455_CR46","doi-asserted-by":"crossref","unstructured":"Howard, J. & Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 328\u2013339 (ACL, 2018).","DOI":"10.18653\/v1\/P18-1031"},{"key":"455_CR47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-019-56847-4","volume":"10","author":"Y Li","year":"2020","unstructured":"Li, Y. et al. BeHRt: transformer for electronic Health Records. Sci. Rep. 10, 1\u201312 (2020).","journal-title":"Sci. Rep."},{"key":"455_CR48","doi-asserted-by":"crossref","unstructured":"Shang, J., Ma, T., Xiao, C. & Sun, J. Pre-training of Graph Augmented Transformers for Medication Recommendation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 5953\u20135959 (IJCAI, 2019).","DOI":"10.24963\/ijcai.2019\/825"},{"key":"455_CR49","doi-asserted-by":"crossref","unstructured":"Ma, F. et al. Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1903\u20131911 (ACM, 2017).","DOI":"10.1145\/3097983.3098088"},{"key":"455_CR50","doi-asserted-by":"publisher","first-page":"e0195024","DOI":"10.1371\/journal.pone.0195024","volume":"13","author":"C Xiao","year":"2018","unstructured":"Xiao, C., Ma, T., Dieng, A. B., Blei, D. M. & Wang, F. Readmission prediction via deep contextual embedding of clinical concepts. PLoS ONE 13, e0195024 (2018).","journal-title":"PLoS ONE"},{"key":"455_CR51","doi-asserted-by":"publisher","first-page":"e16981","DOI":"10.2196\/16981","volume":"22","author":"Y Xiang","year":"2020","unstructured":"Xiang, Y. et al. Asthma exacerbation prediction and risk factor analysis based on a time-sensitive, attentive neural network: retrospective cohort study. J. Med. Internet Res. 22, e16981 (2020).","journal-title":"J. Med. Internet Res."},{"key":"455_CR52","doi-asserted-by":"crossref","unstructured":"Baytas, I. M. et al. Patient Subtyping via Time-Aware LSTM Networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 65\u201374 (ACM, 2017).","DOI":"10.1145\/3097983.3097997"},{"key":"455_CR53","unstructured":"Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014 (NIPS, 2014)."},{"key":"455_CR54","doi-asserted-by":"publisher","first-page":"1539","DOI":"10.1109\/TIE.2017.2733438","volume":"65","author":"R Zhao","year":"2017","unstructured":"Zhao, R. et al. Machine health monitoring using local feature-based gated recurrent unit networks. IEEE Trans. Ind. Electron. 65, 1539\u20131548 (2017).","journal-title":"IEEE Trans. Ind. Electron."},{"key":"455_CR55","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135\u2013146 (2017).","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"455_CR56","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1162\/tacl_a_00134","volume":"3","author":"O Levy","year":"2015","unstructured":"Levy, O., Goldberg, Y. & Dagan, I. Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211\u2013225 (2015).","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"455_CR57","first-page":"625","volume":"11","author":"D Erhan","year":"2010","unstructured":"Erhan, D. et al. Why Does Unsupervised Pre-training Help Deep Learning? J. Mach. Learn. Res. 11, 625\u2013660 (2010).","journal-title":"J. Mach. Learn. Res."},{"key":"455_CR58","doi-asserted-by":"crossref","unstructured":"Vig, J. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 37\u201342 (ACL, 2019).","DOI":"10.18653\/v1\/P19-3007"},{"key":"455_CR59","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T. et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 33, 1877\u20131901 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"455_CR60","doi-asserted-by":"publisher","first-page":"1021","DOI":"10.1016\/j.jacc.2017.12.048","volume":"71.9","author":"KA Hicks","year":"2018","unstructured":"Hicks, K. A. et al. 2017 Cardiovascular and stroke endpoint definitions for clinical trials. J. Am. Coll. Cardiol. 71.9, 1021\u20131034 (2018).","journal-title":"J. Am. Coll. Cardiol."},{"key":"455_CR61","unstructured":"ICD-10 | CMS. http:\/\/www.cms.gov\/Medicare\/Coding\/ICD10 (last accessed May 2021)."},{"key":"455_CR62","unstructured":"Wolf, T. et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in NaturalLanguage Processing: System Demonstrations (2020), 38\u201345 (ACL, 2020).."},{"key":"455_CR63","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1093\/ije\/dyv098","volume":"44","author":"E Herrett","year":"2015","unstructured":"Herrett, E. et al. Data resource profile: clinical practice research datalink (CPRD). Int. J. Epidemiol. 44, 827\u2013836 (2015).","journal-title":"Int. J. Epidemiol."},{"key":"455_CR64","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).","journal-title":"Sci. Data"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00455-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00455-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00455-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,3]],"date-time":"2023-11-03T23:32:57Z","timestamp":1699054377000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00455-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,20]]},"references-count":64,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["455"],"URL":"https:\/\/doi.org\/10.1038\/s41746-021-00455-y","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,20]]},"assertion":[{"value":"27 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 April 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"86"}}