{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T05:26:30Z","timestamp":1780982790977,"version":"3.54.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,4,8]],"date-time":"2022-04-08T00:00:00Z","timestamp":1649376000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,8]],"date-time":"2022-04-08T00:00:00Z","timestamp":1649376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["1R01HL139731"],"award-info":[{"award-number":["1R01HL139731"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["T32HL007208"],"award-info":[{"award-number":["T32HL007208"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R38HL150212"],"award-info":[{"award-number":["R38HL150212"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["K01HL148506"],"award-info":[{"award-number":["K01HL148506"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["1R01HL092577"],"award-info":[{"award-number":["1R01HL092577"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01HL128914"],"award-info":[{"award-number":["R01HL128914"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["K24HL105780"],"award-info":[{"award-number":["K24HL105780"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01NS103924"],"award-info":[{"award-number":["R01NS103924"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["U01NS069673"],"award-info":[{"award-number":["U01NS069673"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01HL134893"],"award-info":[{"award-number":["R01HL134893"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01HL140224"],"award-info":[{"award-number":["R01HL140224"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["K24HL153669"],"award-info":[{"award-number":["K24HL153669"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["18SFRN34250007"],"award-info":[{"award-number":["18SFRN34250007"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["18SFRN34250007"],"award-info":[{"award-number":["18SFRN34250007"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["18SFRN34250007"],"award-info":[{"award-number":["18SFRN34250007"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["18SFRN34110082"],"award-info":[{"award-number":["18SFRN34110082"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["18SFRN34250007"],"award-info":[{"award-number":["18SFRN34250007"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000968","name":"American Heart Association","doi-asserted-by":"publisher","award":["21SFRN812095"],"award-info":[{"award-number":["21SFRN812095"]}],"id":[{"id":"10.13039\/100000968","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"DOI":"10.13039\/501100001674","name":"Fondation Leducq","doi-asserted-by":"publisher","award":["14CVD01"],"award-info":[{"award-number":["14CVD01"]}],"id":[{"id":"10.13039\/501100001674","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Heart, Lung, and Blood Institute"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Electronic health record (EHR) datasets are statistically powerful but are subject to ascertainment bias and missingness. Using the Mass General Brigham multi-institutional EHR, we approximated a community-based cohort by sampling patients receiving longitudinal primary care between 2001-2018 (Community Care Cohort Project [C3PO],\n                    <jats:italic>n<\/jats:italic>\n                    \u2009=\u2009520,868). We utilized natural language processing (NLP) to recover vital signs from unstructured notes. We assessed the validity of C3PO by deploying established risk models for myocardial infarction\/stroke and atrial fibrillation. We then compared C3PO to Convenience Samples including all individuals from the same EHR with complete data, but without a longitudinal primary care requirement. NLP reduced the missingness of vital signs by 31%. NLP-recovered vital signs were highly correlated with values derived from structured fields (Pearson\n                    <jats:italic>r<\/jats:italic>\n                    range 0.95\u20130.99). Atrial fibrillation and myocardial infarction\/stroke incidence were lower and risk models were better calibrated in C3PO as opposed to the Convenience Samples (calibration error range for myocardial infarction\/stroke: 0.012\u20130.030 in C3PO vs. 0.028\u20130.046 in Convenience Samples; calibration error for atrial fibrillation 0.028 in C3PO vs. 0.036 in Convenience Samples). Sampling patients receiving regular primary care and using NLP to recover missing data may reduce bias and maximize generalizability of EHR research.\n                  <\/jats:p>","DOI":"10.1038\/s41746-022-00590-0","type":"journal-article","created":{"date-parts":[[2022,4,8]],"date-time":"2022-04-08T06:11:21Z","timestamp":1649398281000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":76,"title":["Cohort design and natural language processing to reduce bias in electronic health records research"],"prefix":"10.1038","volume":"5","author":[{"given":"Shaan","family":"Khurshid","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christopher","family":"Reeder","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lia X.","family":"Harrington","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pulkit","family":"Singh","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gopal","family":"Sarma","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Samuel F.","family":"Friedman","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9256-0678","authenticated-orcid":false,"given":"Paolo","family":"Di Achille","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1738-304X","authenticated-orcid":false,"given":"Nathaniel","family":"Diamant","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4481-7867","authenticated-orcid":false,"given":"Jonathan W.","family":"Cunningham","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ashby C.","family":"Turner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Emily S.","family":"Lau","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Julian S.","family":"Haimovich","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mostafa A.","family":"Al-Alusi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7630-2708","authenticated-orcid":false,"given":"Marcus D. R.","family":"Klarqvist","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jeffrey M.","family":"Ashburner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christian","family":"Diedrich","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mercedeh","family":"Ghadessi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Johanna","family":"Mielke","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hanna M.","family":"Eilken","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alice","family":"McElhinney","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andrea","family":"Derix","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steven J.","family":"Atlas","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2067-0533","authenticated-orcid":false,"given":"Patrick T.","family":"Ellinor","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anthony A.","family":"Philippakis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christopher D.","family":"Anderson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7987-4768","authenticated-orcid":false,"given":"Jennifer E.","family":"Ho","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6822-0593","authenticated-orcid":false,"given":"Puneet","family":"Batra","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9599-4866","authenticated-orcid":false,"given":"Steven A.","family":"Lubitz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,4,8]]},"reference":[{"key":"590_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00392-016-1025-6","volume":"106","author":"MR Cowie","year":"2017","unstructured":"Cowie, M. R. et al. Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 106, 1\u20139 (2017).","journal-title":"Clin. Res. Cardiol."},{"key":"590_CR2","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/S0140-6736(19)31721-0","volume":"394","author":"ZI Attia","year":"2019","unstructured":"Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 394, 861\u2013867 (2019).","journal-title":"Lancet"},{"key":"590_CR3","first-page":"e005289","volume":"12","author":"GH Tison","year":"2019","unstructured":"Tison, G. H., Zhang, J., Delling, F. N. & Deo, R. C. Automated and interpretable patient ECG profiles for disease detection, tracking, and discovery. Circulation 12, e005289 (2019).","journal-title":"Circulation"},{"key":"590_CR4","doi-asserted-by":"publisher","first-page":"1331","DOI":"10.1016\/j.jacep.2019.07.016","volume":"5","author":"OL Hulme","year":"2019","unstructured":"Hulme, O. L. et al. Development and validation of a prediction model for atrial fibrillation using electronic health records. JACC Clin. Electrophysiol. 5, 1331\u20131341 (2019).","journal-title":"JACC Clin. Electrophysiol."},{"key":"590_CR5","doi-asserted-by":"publisher","first-page":"e14830","DOI":"10.2196\/14830","volume":"7","author":"F Li","year":"2019","unstructured":"Li, F. et al. Fine-tuning Bidirectional Encoder Representations From Transformers (BERT)-based models on large-scale electronic health record notes: an empirical study. JMIR Med. Inform. 7, e14830 (2019).","journal-title":"JMIR Med. Inform."},{"key":"590_CR6","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1016\/j.jbi.2015.09.009","volume":"58","author":"Y Zhang","year":"2015","unstructured":"Zhang, Y., Padman, R. & Patel, N. Paving the COWpath: learning and visualizing clinical pathways from electronic health record data. J. Biomed. Inform. 58, 186\u2013197 (2015).","journal-title":"J. Biomed. Inform."},{"key":"590_CR7","doi-asserted-by":"publisher","first-page":"342","DOI":"10.1111\/cts.12178","volume":"7","author":"RM Kaplan","year":"2014","unstructured":"Kaplan, R. M., Chambers, D. A. & Glasgow, R. E. Big data and large sample size: a cautionary note on the potential for bias. Clin. Transl. Sci. 7, 342\u2013346 (2014).","journal-title":"Clin. Transl. Sci."},{"key":"590_CR8","doi-asserted-by":"publisher","unstructured":"Raghunath, S. et al. Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network. Nat. Med. https:\/\/doi.org\/10.1038\/s41591-020-0870-z (2020).","DOI":"10.1038\/s41591-020-0870-z"},{"key":"590_CR9","first-page":"16","volume":"4","author":"S Haneuse","year":"2016","unstructured":"Haneuse, S. & Daniels, M. A general framework for considering selection bias in EHR-based studies: what data are observed and why? EGEMs (Wash DC) 4, 16 (2016).","journal-title":"EGEMs (Wash DC)"},{"key":"590_CR10","doi-asserted-by":"publisher","first-page":"e008997","DOI":"10.1161\/CIRCEP.120.008997","volume":"14","author":"S Khurshid","year":"2021","unstructured":"Khurshid, S. et al. Performance of atrial fibrillation risk prediction models in over 4 million individuals. Circ. Arrhythm. Electrophysiol. 14, e008997 (2021).","journal-title":"Circ. Arrhythm. Electrophysiol."},{"key":"590_CR11","doi-asserted-by":"publisher","first-page":"047829","DOI":"10.1161\/CIRCULATIONAHA.120.047829","volume":"120","author":"S Raghunath","year":"2021","unstructured":"Raghunath, S. et al. Deep neural networks can predict new-onset atrial fibrillation from the 12-lead electrocardiogram and help identify those at risk of AF-related stroke. Circulation. 120, 047829, https:\/\/doi.org\/10.1161\/CIRCULATIONAHA.120.047829 (2021).","journal-title":"Circulation."},{"key":"590_CR12","doi-asserted-by":"publisher","first-page":"412","DOI":"10.1093\/europace\/euz324","volume":"22","author":"J-M Kwon","year":"2020","unstructured":"Kwon, J.-M. et al. Comparing the performance of artificial intelligence and conventional diagnosis criteria for detecting left ventricular hypertrophy using electrocardiography. Eurpace. 22, 412\u2013419 (2020).","journal-title":"Eurpace."},{"key":"590_CR13","doi-asserted-by":"publisher","first-page":"e0215571","DOI":"10.1371\/journal.pone.0215571","volume":"14","author":"R Hammond","year":"2019","unstructured":"Hammond, R. et al. Predicting childhood obesity using electronic health records and publicly available data. PloS One 14, e0215571 (2019).","journal-title":"PloS One"},{"key":"590_CR14","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1038\/s41591-019-0724-8","volume":"26","author":"NS Artzi","year":"2020","unstructured":"Artzi, N. S. et al. Prediction of gestational diabetes based on nationwide electronic health records. Nat. Med. 26, 71\u201376 (2020).","journal-title":"Nat. Med."},{"key":"590_CR15","doi-asserted-by":"publisher","first-page":"1033","DOI":"10.1097\/BRS.0000000000000872","volume":"40","author":"T Cole","year":"2015","unstructured":"Cole, T. et al. Anterior versus posterior approach for multilevel degenerative cervical disease: a retrospective propensity score-matched study of the MarketScan database. Spine 40, 1033\u20131038 (2015).","journal-title":"Spine"},{"key":"590_CR16","doi-asserted-by":"publisher","first-page":"731","DOI":"10.34067\/KID.0002252020","volume":"1","author":"K Chauhan","year":"2020","unstructured":"Chauhan, K. et al. Initial validation of a machine learning-derived prognostic test (KidneyIntelX) integrating biomarkers and electronic health record data to predict longitudinal kidney outcomes. Kidney360 1, 731\u2013739 (2020).","journal-title":"Kidney360"},{"key":"590_CR17","doi-asserted-by":"publisher","first-page":"R33","DOI":"10.1093\/hmg\/ddaa192","volume":"29","author":"HR Due\u00f1as","year":"2020","unstructured":"Due\u00f1as, H. R., Seah, C., Johnson, J. S. & Huckins, L. M. Implicit bias of encoded variables: frameworks for addressing structured bias in EHR\u2013GWAS data. Hum. Mol. Genet 29, R33\u2013R41 (2020).","journal-title":"Hum. Mol. Genet"},{"key":"590_CR18","doi-asserted-by":"publisher","first-page":"S49","DOI":"10.1161\/01.cir.0000437741.48606.98","volume":"129","author":"DC Goff","year":"2014","unstructured":"Goff, D. C. et al. 2013 ACC\/AHA guideline on the assessment of cardiovascular risk: a report of the american college of cardiology\/american heart association task force on practice guidelines. Circulation 129, S49\u2013S73 (2014).","journal-title":"Circulation"},{"key":"590_CR19","doi-asserted-by":"publisher","first-page":"e000102","DOI":"10.1161\/JAHA.112.000102","volume":"2","author":"A Alonso","year":"2013","unstructured":"Alonso, A. et al. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J. Am. Heart Assoc. 2, e000102 (2013).","journal-title":"J. Am. Heart Assoc."},{"key":"590_CR20","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1001\/jama.286.2.180","volume":"286","author":"RB D\u2019Agostino","year":"2001","unstructured":"D\u2019Agostino, R. B., Grundy, S., Sullivan, L. M., Wilson, P. & Risk, C. H. D., Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA 286, 180\u2013187 (2001).","journal-title":"JAMA"},{"key":"590_CR21","doi-asserted-by":"publisher","first-page":"e022363","DOI":"10.1161\/JAHA.121.022363","volume":"10","author":"JM Ashburner","year":"2021","unstructured":"Ashburner, J. M. et al. Re-CHARGE-AF: recalibration of the CHARGE-AF model for atrial fibrillation risk prediction in patients with acute stroke. J. Am. Heart Assoc. 10, e022363 (2021).","journal-title":"J. Am. Heart Assoc."},{"key":"590_CR22","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1093\/aje\/kwr301","volume":"175","author":"G Danaei","year":"2012","unstructured":"Danaei, G., Tavakkoli, M. & Hern\u00e1n, M. A. Bias in observational studies of prevalent users: lessons for comparative effectiveness research from a meta-analysis of statins. Am. J. Epidemiol. 175, 250\u2013262 (2012).","journal-title":"Am. J. Epidemiol."},{"key":"590_CR23","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.ahj.2018.04.015","volume":"202","author":"SR Raman","year":"2018","unstructured":"Raman, S. R. et al. Leveraging electronic health records for clinical research. Am. Heart J. 202, 13\u201319 (2018).","journal-title":"Am. Heart J."},{"key":"590_CR24","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.jclinepi.2017.11.021","volume":"96","author":"G Danaei","year":"2018","unstructured":"Danaei, G., Garc\u00eda Rodr\u00edguez, L. A., Cantero, O. F., Logan, R. W. & Hern\u00e1n, M. A. Electronic medical records can be used to emulate target trials of sustained treatment strategies. J. Clin. Epidemiol. 96, 12\u201322 (2018).","journal-title":"J. Clin. Epidemiol."},{"key":"590_CR25","doi-asserted-by":"publisher","first-page":"b2393","DOI":"10.1136\/bmj.b2393","volume":"338","author":"JAC Sterne","year":"2009","unstructured":"Sterne, J. A. C. et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338, b2393 (2009).","journal-title":"BMJ"},{"key":"590_CR26","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1161\/CIRCULATIONAHA.105.595140","volume":"114","author":"Y Miyasaka","year":"2006","unstructured":"Miyasaka, Y. et al. Secular trends in incidence of atrial fibrillation in Olmsted County, Minnesota, 1980 to 2000, and implications on the projections for future prevalence. Circulation 114, 119\u2013125 (2006).","journal-title":"Circulation"},{"key":"590_CR27","doi-asserted-by":"publisher","first-page":"949","DOI":"10.1093\/eurheartj\/ehi825","volume":"27","author":"J Heeringa","year":"2006","unstructured":"Heeringa, J. et al. Prevalence, incidence and lifetime risk of atrial fibrillation: the Rotterdam study. Eur. Heart J. 27, 949\u2013953 (2006).","journal-title":"Eur. Heart J."},{"key":"590_CR28","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1161\/01.STR.0000199613.38911.b2","volume":"37","author":"S Seshadri","year":"2006","unstructured":"Seshadri, S. et al. The lifetime risk of stroke: estimates from the Framingham study. Stroke 37, 345\u2013350 (2006).","journal-title":"Stroke"},{"key":"590_CR29","unstructured":"Nalichowski, R., Keogh, D., Chueh, H. C. & Murphy, S. N. Calculating the benefits of a Research Patient Data Repository. AMIA Annu. Symp. Proc. 2006, 1044 (2006)."},{"key":"590_CR30","unstructured":"The HDF Group. Hierarchical Data Format, version 5 http:\/\/www.hdfgroup.org\/HDF5\/ (2019)."},{"key":"590_CR31","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/j.ahj.2016.05.004","volume":"178","author":"IE Christophersen","year":"2016","unstructured":"Christophersen, I. E. et al. A comparison of the CHARGE-AF and the CHA2DS2-VASc risk scores for prediction of atrial fibrillation in the Framingham Heart Study. Am. Heart J. 178, 45\u201354 (2016).","journal-title":"Am. Heart J."},{"key":"590_CR32","doi-asserted-by":"publisher","first-page":"786","DOI":"10.7326\/M16-1739","volume":"165","author":"NR Cook","year":"2016","unstructured":"Cook, N. R. & Ridker, P. M. Calibration of the Pooled Cohort Equations for atherosclerotic cardiovascular disease: an update. Ann. Intern. Med. 165, 786\u2013794 (2016).","journal-title":"Ann. Intern. Med."},{"key":"590_CR33","doi-asserted-by":"publisher","first-page":"e007716","DOI":"10.1161\/CIRCEP.119.007716","volume":"13","author":"EY Wang","year":"2020","unstructured":"Wang, E. Y. et al. Initial precipitants and recurrence of atrial fibrillation. Circ. Arrhythm. Electrophysiol. 13, e007716 (2020).","journal-title":"Circ. Arrhythm. Electrophysiol."},{"key":"590_CR34","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1016\/j.amjcard.2015.10.031","volume":"117","author":"S Khurshid","year":"2016","unstructured":"Khurshid, S., Keaney, J., Ellinor, P. T. & Lubitz, S. A. A simple and portable algorithm for identifying atrial fibrillation in the electronic medical record. Am. J. Cardiol. 117, 221\u2013225 (2016).","journal-title":"Am. J. Cardiol."},{"key":"590_CR35","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-22328-4","volume":"12","author":"JA Fries","year":"2021","unstructured":"Fries, J. A. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat. Commun. 12, 2017 (2021).","journal-title":"Nat. Commun."},{"key":"590_CR36","first-page":"4171","volume":"1","author":"J Devlin","year":"2019","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. NAACL 1, 4171\u20134186 (2019).","journal-title":"NAACL"},{"key":"590_CR37","doi-asserted-by":"crossref","unstructured":"Alsentzer, E. et al. Publicly available clinical BERT embeddings. In Proc. 2nd Clinical Natural Language Processing Workshop 72\u201378 (Association for Computational Linguistics, Minnesota, 2019).","DOI":"10.18653\/v1\/W19-1909"},{"key":"590_CR38","doi-asserted-by":"publisher","unstructured":"Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics https:\/\/doi.org\/10.1093\/bioinformatics\/btz682 (2019).","DOI":"10.1093\/bioinformatics\/btz682"},{"key":"590_CR39","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AEW Johnson","year":"2016","unstructured":"Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).","journal-title":"Sci. Data"},{"key":"590_CR40","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.amjcard.2015.10.009","volume":"117","author":"E Shulman","year":"2016","unstructured":"Shulman, E. et al. Validation of the Framingham heart study and CHARGE-AF risk scores for atrial fibrillation in Hispanics, African-Americans, and Non-Hispanic Whites. Am. J. Cardiol. 117, 76\u201383 (2016).","journal-title":"Am. J. Cardiol."},{"key":"590_CR41","doi-asserted-by":"publisher","unstructured":"Rodriguez, F. et al. Atherosclerotic cardiovascular disease risk prediction in disaggregated Asian and Hispanic subgroups using electronic health records. J. Am. Heart Assoc. https:\/\/doi.org\/10.1161\/JAHA.118.011874 (2019).","DOI":"10.1161\/JAHA.118.011874"},{"key":"590_CR42","doi-asserted-by":"publisher","first-page":"2430","DOI":"10.1002\/sim.5647","volume":"32","author":"H Uno","year":"2013","unstructured":"Uno, H., Tian, L., Cai, T., Kohane, I. S. & Wei, L. J. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data. Stat. Med. 32, 2430\u20132442 (2013).","journal-title":"Stat. Med."},{"key":"590_CR43","doi-asserted-by":"publisher","first-page":"2714","DOI":"10.1002\/sim.8570","volume":"39","author":"PC Austin","year":"2020","unstructured":"Austin, P. C., Harrell, F. E. & Klaveren, D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat. Med. 39, 2714\u20132742 (2020).","journal-title":"Stat. Med."},{"key":"590_CR44","doi-asserted-by":"publisher","first-page":"1659","DOI":"10.1002\/sim.6428","volume":"34","author":"OV Demler","year":"2015","unstructured":"Demler, O. V., Paynter, N. P. & Cook, N. R. Tests of calibration and goodness-of-fit in the survival setting. Stat. Med. 34, 1659\u20131680 (2015).","journal-title":"Stat. Med."},{"key":"590_CR45","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1016\/j.jclinepi.2019.09.016","volume":"118","author":"RJ Stevens","year":"2020","unstructured":"Stevens, R. J. & Poppe, K. K. Validation of clinical prediction models: what does the \u2018calibration slope\u2019 really measure? J. Clin. Epidemiol. 118, 93\u201399 (2020).","journal-title":"J. Clin. Epidemiol."},{"key":"590_CR46","first-page":"e596","volume":"140","author":"DK Arnett","year":"2019","unstructured":"Arnett, D. K. et al. 2019 ACC\/AHA guideline on the primary prevention of cardiovascular disease: a report of the American College of Cardiology\/American Heart Association Task Force on clinical practice guidelines. Circulation 140, e596\u2013e646 (2019).","journal-title":"Circulation"},{"key":"590_CR47","unstructured":"Python Core Team. Python: A Dynamic, Open Source Programming Language. Python Software Foundation. https:\/\/www.python.org\/ (2015)."},{"key":"590_CR48","unstructured":"R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. https:\/\/www.R-project.org\/ (2015)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00590-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00590-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00590-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,25]],"date-time":"2022-11-25T02:55:04Z","timestamp":1669344904000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-022-00590-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,8]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["590"],"URL":"https:\/\/doi.org\/10.1038\/s41746-022-00590-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.05.26.21257872","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,8]]},"assertion":[{"value":"26 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare the following financial and non-financial interests: A.A.P. receives sponsored research support from Bayer AG, IBM, Intel, and Verily. He has also received consulted fees from Novartis and Rakuten. He is a Venture Partner at GV and is compensated for this work. J.E.H. receives sponsored research support from Bayer AG and Gilead Sciences. J.E.H. has received research supplies from EcoNugenics. S.F.F. receives sponsored research support from Bayer AG and IBM. C.D.A. receives sponsored research support from Bayer AG and has consulted for ApoPharma and Invitae. P.B. receives sponsored research support from Bayer AG and IBM, and consults for Novartis. S.A.L. receives sponsored research support from Bristol Myers Squibb\/Pfizer, Bayer AG, Boehringer Ingelheim, and Fitbit, has consulted for Bristol Myers Squibb\/Pfizer and Bayer AG, and participates in a research collaboration with IBM. P.T.E. receives sponsored research support from Bayer AG and IBM Research and he has consulted for Bayer AG, Novartis, MyoKardia, and Quest Diagnostics. S.J.A. receives sponsored research support from Bristol Myers Squibb\/Pfizer and has consulted for Bristol Myers Squibb\/Pfizer and Fitbit. J.M.A. has received sponsored research support from Bristol Myers Squibb\/Pfizer. C.D., J.M., H.M.E., A.D., and M.G. are employees of Bayer AG. The remaining authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"47"}}