{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,27]],"date-time":"2026-04-27T11:10:24Z","timestamp":1777288224656,"version":"3.51.4"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T00:00:00Z","timestamp":1565136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objective<\/jats:title>\n                    <jats:p>Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Materials and Methods<\/jats:title>\n                    <jats:p>We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes\/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocz066","type":"journal-article","created":{"date-parts":[[2019,4,26]],"date-time":"2019-04-26T07:26:47Z","timestamp":1556263607000},"page":"1255-1262","source":"Crossref","is-referenced-by-count":102,"title":["High-throughput multimodal automated phenotyping (MAP) with application to PheWAS"],"prefix":"10.1093","volume":"26","author":[{"given":"Katherine P","family":"Liao","sequence":"first","affiliation":[{"name":"Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA"},{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Jiehuan","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Tianrun A","family":"Cai","sequence":"additional","affiliation":[{"name":"Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA"},{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Nicholas","family":"Link","sequence":"additional","affiliation":[{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Chuan","family":"Hong","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Jie","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"}]},{"given":"Jennifer E","family":"Huffman","sequence":"additional","affiliation":[{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Jessica","family":"Gronsbell","sequence":"additional","affiliation":[{"name":"Verily Life Sciences, Cambridge, MA, USA"}]},{"given":"Yichi","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Rhode Island, Kingston, RI, USA"},{"name":"Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA"}]},{"given":"Yuk-Lam","family":"Ho","sequence":"additional","affiliation":[{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Victor","family":"Castro","sequence":"additional","affiliation":[{"name":"Partners Healthcare Systems, Summerville, MA, USA"}]},{"given":"Vivian","family":"Gainer","sequence":"additional","affiliation":[{"name":"Partners Healthcare Systems, Summerville, MA, USA"}]},{"given":"Shawn N","family":"Murphy","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Partners Healthcare Systems, Summerville, MA, USA"},{"name":"Massachusetts General Hospital, Boston, MA, USA"}]},{"given":"Christopher J","family":"O\u2019Donnell","sequence":"additional","affiliation":[{"name":"Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"J Michael","family":"Gaziano","sequence":"additional","affiliation":[{"name":"Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA"},{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Kelly","family":"Cho","sequence":"additional","affiliation":[{"name":"Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA"},{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]},{"given":"Peter","family":"Szolovits","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA"}]},{"given":"Isaac S","family":"Kohane","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"}]},{"given":"Sheng","family":"Yu","sequence":"additional","affiliation":[{"name":"Center for Statistical Science, Tsinghua University, Beijing, China"},{"name":"Department of Industrial Engineering, Tsinghua University, Beijing, China"},{"name":"Institute for Data Science, Tsinghua University, Beijing, China"}]},{"given":"Tianxi","family":"Cai","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA"},{"name":"Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA"},{"name":"Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,8,7]]},"reference":[{"issue":"9","key":"2020110613073165200_ocz066-B1","doi-asserted-by":"crossref","first-page":"1205","DOI":"10.1093\/bioinformatics\/btq126","article-title":"PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations","volume":"26","author":"Denny","year":"2010","journal-title":"Bioinformatics"},{"issue":"3","key":"2020110613073165200_ocz066-B2","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1212\/WNL.49.3.660","article-title":"Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease","volume":"49","author":"Benesch","year":"1997","journal-title":"Neurology"},{"issue":"5","key":"2020110613073165200_ocz066-B3","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1097\/01.mlr.0000160417.39497.a9","article-title":"Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors","volume":"43","author":"Birman-Deych","year":"2005","journal-title":"Med Care"},{"issue":"1","key":"2020110613073165200_ocz066-B4","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.thromres.2010.03.009","article-title":"Evaluation of the predictive value of ICD-9-CM coded administrative data for venous thromboembolism in the United States","volume":"126","author":"White","year":"2010","journal-title":"Thromb Res"},{"issue":"6","key":"2020110613073165200_ocz066-B5","first-page":"326","article-title":"The validity of ICD-9-CM codes in identifying postoperative deep vein thrombosis and pulmonary embolism","volume":"33","author":"Zhan","year":"2007","journal-title":"Jt Comm J Qual Patient Saf"},{"key":"2020110613073165200_ocz066-B6"},{"key":"2020110613073165200_ocz066-B7","article-title":"The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies","volume":"4: 13","author":"McCarty","year":"2011","journal-title":"BMC Med Genomics"},{"key":"2020110613073165200_ocz066-B8","first-page":"274","article-title":"Analyzing the heterogeneity and complexity of electronic health record oriented phenotyping algorithms","volume":"2011","author":"Conway","year":"2011","journal-title":"AMIA Annu Symp Proc"},{"issue":"e1","key":"2020110613073165200_ocz066-B9","doi-asserted-by":"crossref","first-page":"e147","DOI":"10.1136\/amiajnl-2012-000896","article-title":"Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network","volume":"20","author":"Newton","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"8","key":"2020110613073165200_ocz066-B10","doi-asserted-by":"crossref","first-page":"1120","DOI":"10.1002\/acr.20184","article-title":"Electronic medical records for discovery research in rheumatoid arthritis","volume":"62","author":"Liao","year":"2010","journal-title":"Arthritis Care Res"},{"issue":"7","key":"2020110613073165200_ocz066-B11","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1097\/MIB.0b013e31828133fd","article-title":"Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach","volume":"19","author":"Ananthakrishnan","year":"2013","journal-title":". Inflamm Bowel Dis"},{"issue":"11","key":"2020110613073165200_ocz066-B12","doi-asserted-by":"crossref","first-page":"e78927","DOI":"10.1371\/journal.pone.0078927","article-title":"Modeling disease severity in multiple sclerosis using electronic health records","volume":"8","author":"Xia","year":"2013","journal-title":"PLoS One"},{"issue":"4","key":"2020110613073165200_ocz066-B13","first-page":"363","article-title":"Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am","volume":"172","author":"Castro","journal-title":"J Psychiatry 2015"},{"key":"2020110613073165200_ocz066-B14","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.jbi.2014.08.001","article-title":"Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing","volume":"52","author":"Yu","year":"2014","journal-title":"J Biomed Inform"},{"issue":"8","key":"2020110613073165200_ocz066-B15","doi-asserted-by":"crossref","first-page":"e0136651","DOI":"10.1371\/journal.pone.0136651","article-title":"Methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts","volume":"10","author":"Liao","year":"2015","journal-title":"PLoS One"},{"key":"2020110613073165200_ocz066-B16","doi-asserted-by":"crossref","first-page":"h1885.","DOI":"10.1136\/bmj.h1885","article-title":"Development of phenotype algorithms using electronic medical records and incorporating natural language processing","volume":"350","author":"Liao","year":"2015","journal-title":"BMJ"},{"key":"2020110613073165200_ocz066-B17","doi-asserted-by":"crossref","DOI":"10.1186\/s12958-015-0115-z","article-title":"Identification of subjects with polycystic ovary syndrome using electronic health records","volume":"13","author":"Castro","year":"2015","journal-title":"Reprod Biol Endocrinol"},{"issue":"2","key":"2020110613073165200_ocz066-B18","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1212\/WNL.0000000000003490","article-title":"Large-scale identification of patients with cerebral aneurysms using natural language processing","volume":"88","author":"Castro","year":"2017","journal-title":"Neurology"},{"issue":"6","key":"2020110613073165200_ocz066-B19","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1093\/jamia\/ocv202","article-title":"PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability","volume":"23","author":"Kirby","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"e2","key":"2020110613073165200_ocz066-B20","doi-asserted-by":"crossref","first-page":"e253","DOI":"10.1136\/amiajnl-2013-001945","article-title":"Applying active learning to high-throughput phenotyping algorithms for electronic health records data","volume":"20","author":"Chen","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613073165200_ocz066-B21","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/jamia\/ocv034","article-title":"Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources","volume":"22","author":"Yu","year":"2015","journal-title":"J Am Med Inform Assoc"},{"issue":"e1","key":"2020110613073165200_ocz066-B22","doi-asserted-by":"crossref","first-page":"e143","DOI":"10.1093\/jamia\/ocw135","article-title":"Surrogate-assisted feature extraction for high-throughput phenotyping","volume":"24","author":"Yu","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613073165200_ocz066-B23","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.jbi.2017.04.009","article-title":"EHR-based phenotyping: bulk learning and evaluation","volume":"70","author":"Chiu","year":"2017","journal-title":"J Biomed Inform"},{"issue":"6","key":"2020110613073165200_ocz066-B24","doi-asserted-by":"crossref","first-page":"1166","DOI":"10.1093\/jamia\/ocw028","article-title":"Learning statistical models of phenotypes using noisy labeled training data","volume":"23","author":"Agarwal","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"4","key":"2020110613073165200_ocz066-B25","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1093\/jamia\/ocw011","article-title":"Electronic medical record phenotyping using the anchor and learn framework","volume":"23","author":"Halpern","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2020110613073165200_ocz066-B26","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1093\/jamia\/ocx111","article-title":"Enabling phenotypic big data with PheNorm","volume":"25","author":"Yu","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"12","key":"2020110613073165200_ocz066-B27","doi-asserted-by":"crossref","first-page":"1102.","DOI":"10.1038\/nbt.2749","article-title":"Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data","volume":"31","author":"Denny","year":"2013","journal-title":"Nat Biotechnol"},{"key":"2020110613073165200_ocz066-B28","year":"2018"},{"key":"2020110613073165200_ocz066-B29","author":"Yu"},{"issue":"1","key":"2020110613073165200_ocz066-B30","doi-asserted-by":"crossref","first-page":"2","DOI":"10.3390\/jpm6010002","article-title":"Building the Partners Healthcare Biobank at Partners personalized medicine: informed consent, return of research results, recruitment lessons, and operational considerations","volume":"6","author":"Karlson","year":"2016","journal-title":"J Pers Med"},{"issue":"1","key":"2020110613073165200_ocz066-B31","doi-asserted-by":"crossref","DOI":"10.3390\/jpm6010011","article-title":"The biobank portal for Partners Personalized Medicine: a query tool for working with consented biobank samples, genotypes, and phenotypes using i2b2","volume":"6","author":"Gainer","year":"2016","journal-title":"J Pers Med"},{"key":"2020110613073165200_ocz066-B32","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/j.jclinepi.2015.09.016","article-title":"Million Veteran Program: a mega-biobank to study genetic influences on health and disease","volume":"70","author":"Gaziano","year":"2016","journal-title":"J Clin Epidemiol"},{"issue":"9","key":"2020110613073165200_ocz066-B33","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1001\/jamacardio.2018.2287","article-title":"Association of interleukin 6 receptor variant with cardiovascular disease effects of interleukin 6 receptor blocking therapy: a phenome-wide association study","volume":"3","author":"Cai","year":"2018","journal-title":"JAMA Cardiol"},{"issue":"7307","key":"2020110613073165200_ocz066-B34","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1038\/nature09270","article-title":"Biological, clinical and population relevance of 95 loci for blood lipids","volume":"466","author":"Teslovich","year":"2010","journal-title":"Nature"},{"issue":"6","key":"2020110613073165200_ocz066-B35","doi-asserted-by":"crossref","first-page":"1170","DOI":"10.1136\/annrheumdis-2012-203202","article-title":"Association between low density lipoprotein and rheumatoid arthritis genetic factors with low density lipoprotein levels in rheumatoid arthritis and non-rheumatoid arthritis controls","volume":"73","author":"Liao","year":"2014","journal-title":"Ann Rheum Dis"},{"issue":"10","key":"2020110613073165200_ocz066-B36","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1038\/gim.2013.72","article-title":"The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future","volume":"15","author":"Gottesman","year":"2013","journal-title":"Genet Med"},{"issue":"2","key":"2020110613073165200_ocz066-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v015.i02","article-title":"The R package GEEPACK for generalized estimating equations","volume":"15","author":"Halekoh","year":"2006","journal-title":"J Stat Softw"},{"issue":"11","key":"2020110613073165200_ocz066-B38","doi-asserted-by":"crossref","first-page":"1369","DOI":"10.1007\/s00439-014-1466-9","article-title":"Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records","volume":"133","author":"Sinnott","year":"2014","journal-title":"Hum Genet"},{"key":"2020110613073165200_ocz066-B39","article-title":"Developing and evaluating mappings of ICD-10 and ICD-10-CM codes to phecodes","volume":"462077","author":"Wu","year":"2018","journal-title":"BioRxiv"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1255\/34151659\/ocz066.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1255\/34151659\/ocz066.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T14:15:04Z","timestamp":1604672104000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/11\/1255\/5544731"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,7]]},"references-count":39,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,8,7]]},"published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz066","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/587436","asserted-by":"object"}]},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11]]},"published":{"date-parts":[[2019,8,7]]}}}