{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T18:40:11Z","timestamp":1781894411323,"version":"3.54.5"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2017,7,8]],"date-time":"2017-07-08T00:00:00Z","timestamp":1499472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01 GM103859"],"award-info":[{"award-number":["R01 GM103859"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","award":["UL1 TR000445"],"award-info":[{"award-number":["UL1 TR000445"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","award":["CDRN-1306-04869"],"award-info":[{"award-number":["CDRN-1306-04869"]}],"id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC\u2009=\u20090.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC\u2009=\u20090.95) than ACE (AUPRC\u2009=\u20090.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and\/or physical (50.6%) abuse, with the top-ranked abuser keywords being \u201cfather\u201d (21.8%) and \u201cmother\u201d (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%\u201347.6%).<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocx059","type":"journal-article","created":{"date-parts":[[2017,5,10]],"date-time":"2017-05-10T19:11:35Z","timestamp":1494443495000},"page":"61-71","source":"Crossref","is-referenced-by-count":98,"title":["Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records"],"prefix":"10.1093","volume":"25","author":[{"given":"Cosmin A","family":"Bejan","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"John","family":"Angiolillo","sequence":"additional","affiliation":[{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douglas","family":"Conway","sequence":"additional","affiliation":[{"name":"Institute for Clinical and Translational Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robertson","family":"Nash","sequence":"additional","affiliation":[{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jana K","family":"Shirey-Rice","sequence":"additional","affiliation":[{"name":"Institute for Clinical and Translational Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Loren","family":"Lipworth","sequence":"additional","affiliation":[{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robert M","family":"Cronin","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics"},{"name":"Department of Medicine"},{"name":"Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jill","family":"Pulley","sequence":"additional","affiliation":[{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sunil","family":"Kripalani","sequence":"additional","affiliation":[{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shari","family":"Barkin","sequence":"additional","affiliation":[{"name":"Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kevin B","family":"Johnson","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics"},{"name":"Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joshua C","family":"Denny","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics"},{"name":"Department of Medicine"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2017,7,8]]},"reference":[{"key":"2020110612450463800_ocx059-B1","volume-title":"Tobacco-Related Mortality","author":"Centers for Disease Control and Prevention"},{"key":"2020110612450463800_ocx059-B2","volume-title":"Alcohol Use and Your Health","author":"Centers for Disease Control and Prevention"},{"key":"2020110612450463800_ocx059-B3","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1001\/jama.291.10.1238","article-title":"Actual causes of death in the United States, 2000","volume":"291","author":"Mokdad","year":"2004","journal-title":"JAMA."},{"key":"2020110612450463800_ocx059-B4","doi-asserted-by":"crossref","first-page":"e1000316","DOI":"10.1371\/journal.pmed.1000316","article-title":"Social relationships and mortality risk: a meta-analytic review","volume":"7","author":"Holt-Lunstad","year":"2010","journal-title":"PLoS Med."},{"key":"2020110612450463800_ocx059-B5","volume-title":"Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1","author":"National Academy of Medicine"},{"key":"2020110612450463800_ocx059-B6","volume-title":"Capturing Social and Behavioral Domains and Measures in Electronic Health Records: Phase 2","author":"National Academy of Medicine"},{"key":"2020110612450463800_ocx059-B7","doi-asserted-by":"crossref","first-page":"921","DOI":"10.1093\/jamia\/ocv035","article-title":"Informatics to support the IOM social and behavioral domains and measures","volume":"22","author":"Hripcsak","year":"2015","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B8","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/jamia\/ocv034","article-title":"Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources","volume":"22","author":"Yu","year":"2015","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B9","doi-asserted-by":"crossref","first-page":"1220","DOI":"10.1093\/jamia\/ocv112","article-title":"Desiderata for computable representations of electronic health records\u2013driven phenotype algorithms","volume":"22","author":"Mo","year":"2015","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B10","first-page":"1","article-title":"Automatic prediction of rheumatoid arthritis disease activity from the electronic medical records","volume":"8","author":"Lin","year":"2013","journal-title":"PLoS One."},{"key":"2020110612450463800_ocx059-B11","doi-asserted-by":"crossref","first-page":"e162","DOI":"10.1136\/amiajnl-2011-000583","article-title":"Portability of an algorithm to identify rheumatoid arthritis in electronic health records","volume":"19","author":"Carroll","year":"2012","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B12","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1136\/amiajnl-2011-000752","article-title":"Pneumonia identification using statistical feature selection","volume":"19","author":"Bejan","year":"2012","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B13","first-page":"2121","article-title":"Automated extraction of substance use information from clinical texts","volume":"2015","author":"Wang","year":"2015","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110612450463800_ocx059-B14","first-page":"366","article-title":"Examining the use, contents, and quality of free-text tobacco use documentation in the electronic health record","volume":"2014","author":"Chen","year":"2014","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110612450463800_ocx059-B15","first-page":"625","article-title":"Social and behavioral history information in public health datasets","volume":"2012","author":"Melton","year":"2012","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110612450463800_ocx059-B16","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1197\/jamia.M2408","article-title":"Identifying patient smoking status from medical discharge records","volume":"15","author":"Uzuner","year":"2008","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B17","doi-asserted-by":"crossref","first-page":"464","DOI":"10.2105\/AJPH.2005.076190","article-title":"Homelessness, health status, and health care use","volume":"97","author":"Schanzer","year":"2007","journal-title":"Am J Public Health."},{"key":"2020110612450463800_ocx059-B18","article-title":"Homeless people","volume-title":"Handbook of Urban Health: Populations, Methods, and Practice","author":"Hwang","year":"2006"},{"key":"2020110612450463800_ocx059-B19","doi-asserted-by":"crossref","first-page":"314","DOI":"10.2105\/AJPH.2015.302904","article-title":"Adverse childhood experiences related to poor adult health among lesbian, gay, and bisexual individuals","volume":"106","author":"Austin","year":"2016","journal-title":"Am J Public Health."},{"key":"2020110612450463800_ocx059-B20","doi-asserted-by":"crossref","first-page":"e355","DOI":"10.1136\/amiajnl-2013-001946","article-title":"Validating a strategy for psychosocial phenotyping using a large corpus of clinical text","volume":"20","author":"Gundlapalli","year":"2013","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B21","first-page":"537","article-title":"Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans","volume":"2013","author":"Gundlapalli","year":"2013","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110612450463800_ocx059-B22","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1002\/jts.22058","article-title":"The feasibility of using large-scale text mining to detect adverse childhood experiences in a VA-treated population","volume":"28","author":"Hammond","year":"2015","journal-title":"J Trauma Stress."},{"key":"2020110612450463800_ocx059-B23","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1016\/j.addbeh.2011.05.001","article-title":"The influence of co-occurring axis I disorders on treatment utilization and outcome in homeless patients with substance use disorders","volume":"36","author":"Austin","year":"2011","journal-title":"Addict Behav."},{"key":"2020110612450463800_ocx059-B24","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1176\/appi.ps.201300026","article-title":"Datapoints: trends in mortality among homeless VA patients with severe mental illness","volume":"64","author":"Birgenheir","year":"2013","journal-title":"Psychiatr Serv."},{"key":"2020110612450463800_ocx059-B25","doi-asserted-by":"crossref","first-page":"682","DOI":"10.1093\/jamia\/ocu005","article-title":"Identifying homelessness using health information exchange data","volume":"22","author":"Zech","year":"2015","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B26","doi-asserted-by":"crossref","first-page":"1734","DOI":"10.1056\/NEJM199806113382406","article-title":"Hospitalization costs associated with homelessness in New York City","volume":"338","author":"Salit","year":"1998","journal-title":"N Engl J Med."},{"key":"2020110612450463800_ocx059-B27","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1038\/clpt.2008.89","article-title":"Development of a large-scale de-identified DNA biobank to enable personalized medicine","volume":"84","author":"Roden","year":"2008","journal-title":"Clin Pharmacol Ther."},{"key":"2020110612450463800_ocx059-B28","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","article-title":"2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text","volume":"18","author":"Uzuner","year":"2011","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B29","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0749-3797(98)00017-8","article-title":"Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults. The Adverse Childhood Experiences (ACE) Study","volume":"14","author":"Felitti","year":"1998","journal-title":"Am J Prev Med"},{"key":"2020110612450463800_ocx059-B30","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1001\/jama.285.2.200","article-title":"Factors associated with the health care utilization of homeless persons","volume":"285","author":"Kushel","year":"2001","journal-title":"JAMA."},{"key":"2020110612450463800_ocx059-B31","doi-asserted-by":"crossref","first-page":"2329","DOI":"10.1056\/NEJMp038222","article-title":"Health Care for Homeless Persons","volume":"350","author":"Levy","year":"2004","journal-title":"N Engl J Med."},{"key":"2020110612450463800_ocx059-B32","volume-title":"Premature Mortality in Homeless Populations: A Review of the Literature","author":"O\u2019Connell"},{"key":"2020110612450463800_ocx059-B33","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1056\/NEJM199408043310506","article-title":"Mortality in a cohort of homeless adults in Philadelphia","volume":"331","author":"Hibbs","year":"1994","journal-title":"N Engl J Med."},{"key":"2020110612450463800_ocx059-B34","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1016\/S0006-3223(01)01157-X","article-title":"The role of childhood trauma in the neurobiology of mood and anxiety disorders: preclinical and clinical studies","volume":"49","author":"Heim","year":"2001","journal-title":"Biol Psychiatry."},{"key":"2020110612450463800_ocx059-B35","first-page":"221","article-title":"A review of approaches to identifying patient phenotype cohorts using electronic health records","volume":"21","author":"Chaitanya","year":"2013","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B36","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1093\/jamia\/ocv202","article-title":"PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability","volume":"23","author":"Kirby","year":"2016","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B37","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1136\/amiajnl-2012-001145","article-title":"Next-generation phenotyping of electronic health records","volume":"20","author":"Hripcsak","year":"2012","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B38","doi-asserted-by":"crossref","first-page":"1166","DOI":"10.1093\/jamia\/ocw028","article-title":"Learning statistical models of phenotypes using noisy labeled training data","volume":"23","author":"Agarwal","year":"2016","journal-title":"J Am Med Inform Assoc."},{"key":"2020110612450463800_ocx059-B39","volume-title":"What Is the Official Definition of Homelessness?","author":"National Health Care for the Homeless Council"},{"key":"2020110612450463800_ocx059-B40","volume-title":"Changes in the HUD Definition of \u201cHomeless.\u201d","author":"National Alliance to End Homelessness"},{"key":"2020110612450463800_ocx059-B41","first-page":"43","article-title":"Mining phenotypic keywords from a large collection of clinical narratives","volume":"242","author":"Bejan","year":"2015","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2020110612450463800_ocx059-B42","first-page":"147","article-title":"Embedding-based query language models","author":"Zamani","year":"2016"},{"key":"2020110612450463800_ocx059-B43","first-page":"367","article-title":"Query expansion with locally-trained word embeddings","author":"Diaz","year":"2016"},{"key":"2020110612450463800_ocx059-B44","doi-asserted-by":"crossref","article-title":"Learning concept embeddings for query expansion by quantum entropy minimization","author":"Sordoni","DOI":"10.1609\/aaai.v28i1.8933"},{"key":"2020110612450463800_ocx059-B45","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013","journal-title":"ICLR."},{"key":"2020110612450463800_ocx059-B46","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov","year":"2013","journal-title":"NIPS."},{"key":"2020110612450463800_ocx059-B47","volume-title":"A Comparison of Open Source Search Engines","author":"Middleton"},{"key":"2020110612450463800_ocx059-B48","volume-title":"Open Source Search Engines","author":"Rappoport"},{"key":"2020110612450463800_ocx059-B49","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1006\/jbin.2001.1029","article-title":"A simple algorithm for identifying negated findings and diseases in discharge summaries","volume":"34","author":"Chapman","year":"2001","journal-title":"J Biomed Inform."},{"key":"2020110612450463800_ocx059-B50","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.jbi.2012.09.001","article-title":"Assertion modeling and its role in clinical phenotype identification","volume":"46","author":"Bejan","year":"2013","journal-title":"J Biomed Inform."},{"key":"2020110612450463800_ocx059-B51","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"Manning","year":"2008"},{"key":"2020110612450463800_ocx059-B52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aos\/1176344552","article-title":"Bootstrap Methods: Another Look at the Jackknife","volume":"7","author":"Efron","year":"1979","journal-title":"Ann Stat."},{"key":"2020110612450463800_ocx059-B53","first-page":"451","article-title":"Area under the precision-recall curve: point estimates and confidence intervals","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"Boyd"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/1\/61\/34149546\/ocx059.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/1\/61\/34149546\/ocx059.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,28]],"date-time":"2022-07-28T12:49:18Z","timestamp":1659012558000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/25\/1\/61\/3940211"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,8]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,7,8]]},"published-print":{"date-parts":[[2018,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocx059","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,1]]},"published":{"date-parts":[[2017,7,8]]}}}