{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T04:16:19Z","timestamp":1774498579100,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T00:00:00Z","timestamp":1598918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["R01LM011366"],"award-info":[{"award-number":["R01LM011366"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["RM1HG009034"],"award-info":[{"award-number":["RM1HG009034"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008657"],"award-info":[{"award-number":["U01HG008657"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["U01HG008701"],"award-info":[{"award-number":["U01HG008701"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this \u201cresidual PII problem.\u201d HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusions<\/jats:title>\n                  <jats:p>Approximately 70% of leaked PII \u201chiding\u201d in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario\u2014more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaa095","type":"journal-article","created":{"date-parts":[[2020,5,26]],"date-time":"2020-05-26T19:11:10Z","timestamp":1590520270000},"page":"1374-1382","source":"Crossref","is-referenced-by-count":3,"title":["Resilience of clinical text de-identified with \u201chiding in plain sight\u201d to hostile reidentification attacks by human readers"],"prefix":"10.1093","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8471-0928","authenticated-orcid":false,"given":"David S","family":"Carrell","sequence":"first","affiliation":[{"name":"Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA"}]},{"given":"Bradley A","family":"Malin","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA"}]},{"given":"David J","family":"Cronkite","sequence":"additional","affiliation":[{"name":"Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA"}]},{"given":"John S","family":"Aberdeen","sequence":"additional","affiliation":[{"name":"Human Language Technology, MITRE Corporation, Bedford, Massachusetts, USA"}]},{"given":"Cheryl","family":"Clark","sequence":"additional","affiliation":[{"name":"Human Language Technology, MITRE Corporation, Bedford, Massachusetts, USA"}]},{"given":"Muqun (Rachel)","family":"Li","sequence":"additional","affiliation":[{"name":"Privacy Analytics Inc, Nashville, Tennessee, USA"}]},{"given":"Dikshya","family":"Bastakoty","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA"}]},{"given":"Steve","family":"Nyemba","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA"}]},{"given":"Lynette","family":"Hirschman","sequence":"additional","affiliation":[{"name":"Human Language Technology, MITRE Corporation, Bedford, Massachusetts, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,9,15]]},"reference":[{"key":"2020110613101745800_ocaa095-B1","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.jbi.2018.10.005","article-title":"Using clinical natural language processing for health outcomes research: overview and actionable suggestions for future advances","volume":"88","author":"Velupillai","year":"2018","journal-title":"J Biomed Inform"},{"issue":"4","key":"2020110613101745800_ocaa095-B2","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1093\/jamia\/ocy173","article-title":"Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review","volume":"26","author":"Koleck","year":"2019","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2020110613101745800_ocaa095-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-018-0723-6","article-title":"A clinical text classification paradigm using weak supervision and deep representation","volume":"19","author":"Wang","year":"2019","journal-title":"BMC Med Inform Decis Mak"},{"issue":"1","key":"2020110613101745800_ocaa095-B4","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1093\/jamia\/ocx111","article-title":"Enabling phenotypic big data with PheNorm","volume":"25","author":"Yu","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2020110613101745800_ocaa095-B5","doi-asserted-by":"crossref","first-page":"e12239","DOI":"10.2196\/12239","article-title":"Natural language processing of clinical notes on chronic diseases: systematic review","volume":"7","author":"Sheikhalishahi","year":"2019","journal-title":"JMIR Med Inform"},{"key":"2020110613101745800_ocaa095-B6","author":"European Medicines Agency","year":"2018"},{"issue":"S1","key":"2020110613101745800_ocaa095-B7","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1186\/s12874-016-0169-4","article-title":"Protecting patient privacy when sharing patient-level data from clinical trials","volume":"16","author":"Tucker","year":"2016","journal-title":"BMC Med Res Methodol"},{"key":"2020110613101745800_ocaa095-B8","first-page":"53181","article-title":"Standards for privacy of individually identifiable health information; final rule","volume":"67","author":"U.S. Department of Health and Human Services","year":"2002","journal-title":"Fed Regist"},{"key":"2020110613101745800_ocaa095-B9","doi-asserted-by":"crossref","first-page":"103971","DOI":"10.1016\/j.ijmedinf.2019.103971","article-title":"A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis","volume":"132","author":"Young","year":"2019","journal-title":"Int J Med Inform"},{"issue":"1","key":"2020110613101745800_ocaa095-B10","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1111\/j.1748-720X.2010.00460.x","article-title":"The Hippocratic bargain and health information technology","volume":"38","author":"Rothstein","year":"2010","journal-title":"J Law Med Ethics"},{"issue":"4","key":"2020110613101745800_ocaa095-B11","doi-asserted-by":"crossref","first-page":"356","DOI":"10.3414\/ME15-01-0122","article-title":"Is the worth the squeeze? Costs and benefits of multiple human annotators for clinical text de-identification","volume":"55","author":"Carrell","year":"2016","journal-title":"Methods Inf Med"},{"key":"2020110613101745800_ocaa095-B12","doi-asserted-by":"crossref","first-page":"S82","DOI":"10.1097\/MLR.0b013e3182585355","article-title":"Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies","volume":"50 Suppl","author":"Kushida","year":"2012","journal-title":"Med Care"},{"issue":"1","key":"2020110613101745800_ocaa095-B13","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1186\/1471-2288-10-70","article-title":"Automatic de-identification of textual documents in the electronic health record: a review of recent research","volume":"10","author":"Meystre","year":"2010","journal-title":"BMC Med Res Methodol"},{"issue":"2","key":"2020110613101745800_ocaa095-B14","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1136\/amiajnl-2012-001034","article-title":"Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text","volume":"20","author":"Carrell","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110613101745800_ocaa095-B15","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1093\/jamia\/ocw156","article-title":"De-identification of patient notes with recurrent neural networks","volume":"24","author":"Dernoncourt","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613101745800_ocaa095-B16","first-page":"72","volume-title":"proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents","author":"Hirschman","year":"2010"},{"issue":"5","key":"2020110613101745800_ocaa095-B17","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1093\/jamia\/ocv004","article-title":"R-U policy frontiers for health data de-identification","volume":"22","author":"Xia","year":"2015","journal-title":"J Am Med Inform Assoc"},{"issue":"12","key":"2020110613101745800_ocaa095-B18","doi-asserted-by":"crossref","first-page":"1536","DOI":"10.1093\/jamia\/ocz114","article-title":"The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight","volume":"26","author":"Carrell","year":"2020","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613101745800_ocaa095-B19","first-page":"31","author":"Grouin","year":"2015"},{"key":"2020110613101745800_ocaa095-B20","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.jbi.2016.03.019","article-title":"Optimizing annotation resources for natural language de-identification via a game theoretic framework","volume":"61","author":"Li","year":"2016","journal-title":"J Biomed Inform"},{"key":"2020110613101745800_ocaa095-B21","doi-asserted-by":"crossref","first-page":"S53","DOI":"10.1016\/j.jbi.2015.06.029","article-title":"Combining knowledge- and data-driven methods for de-identification of clinical narratives","volume":"58","author":"Dehghan","year":"2015","journal-title":"J Biomed Inform"},{"issue":"5","key":"2020110613101745800_ocaa095-B22","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1197\/jamia.M2444","article-title":"Evaluating the state-of-the-art in automatic de-identification","volume":"14","author":"Uzuner","year":"2007","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613101745800_ocaa095-B23","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1016\/j.jbi.2015.06.007","article-title":"Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2\/UTHealth shared task Track 1","volume":"58","author":"Stubbs","year":"2015","journal-title":"J Biomed Inform"},{"issue":"12","key":"2020110613101745800_ocaa095-B24","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/j.ijmedinf.2010.09.007","article-title":"The MITRE Identification Scrubber Toolkit: Design, training, and assessment","volume":"79","author":"Aberdeen","year":"2010","journal-title":"Int J Med Inform"},{"issue":"1","key":"2020110613101745800_ocaa095-B25","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1136\/amiajnl-2012-001020","article-title":"BoB, a best-of-breed automated text de-identification system for VHA clinical documents","volume":"20","author":"Ferrandez","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613101745800_ocaa095-B26","first-page":"333","article-title":"Replacing personally-identifying information in medical records, the Scrub system","volume":"1996","author":"Sweeney","year":"1996","journal-title":"Proc AMIA Annu Fall Symp"},{"issue":"3","key":"2020110613101745800_ocaa095-B27","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1055\/s-0038-1634080","article-title":"Assessing the difficulty and time cost of de-identification in clinical narratives","volume":"45","author":"Dorr","year":"2006","journal-title":"Methods Inf Med"},{"issue":"5","key":"2020110613101745800_ocaa095-B28","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1197\/jamia.M2702","article-title":"A software tool for removing patient identifying information from clinical documents","volume":"15","author":"Friedlin","year":"2008","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2020110613101745800_ocaa095-B29","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1055\/s-0038-1638592","article-title":"Extracting information from textual documents in the electronic health record: a review of recent research","volume":"17","author":"Meystre","year":"2008","journal-title":"Yearb Med Inform"},{"issue":"1","key":"2020110613101745800_ocaa095-B30","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1197\/jamia.M2862","article-title":"Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?","volume":"16","author":"Morrison","year":"2009","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613101745800_ocaa095-B31","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1197\/jamia.M2441","article-title":"State-of-the-art anonymization of medical records using an iterative machine learning framework","volume":"14","author":"Szarvas","year":"2007","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613101745800_ocaa095-B32","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1197\/jamia.M2435","article-title":"Rapidly retargetable approaches to de-identification in medical records","volume":"14","author":"Wellner","year":"2007","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2020110613101745800_ocaa095-B33","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1136\/jamia.2009.002212","article-title":"Effects of personal identifier resynthesis on clinical text de-identification","volume":"17","author":"Yeniterzi","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613101745800_ocaa095-B34","first-page":"757","article-title":"Identification of patient name references within medical documents using semantic selectional restrictions","volume":"2002","author":"Taira","year":"2002","journal-title":"Proc AMIA Symp"},{"issue":"1","key":"2020110613101745800_ocaa095-B35","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/1472-6947-8-32","article-title":"Automated de-identification of free-text medical records","volume":"8","author":"Neamatullah","year":"2008","journal-title":"BMC Med Inform Decis Mak"},{"key":"2020110613101745800_ocaa095-B36","first-page":"416","article-title":"Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes","volume":"2009","author":"Mayer","year":"2009","journal-title":"Proc AMIA Symp"},{"issue":"12","key":"2020110613101745800_ocaa095-B37","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.1016\/j.datak.2009.07.006","article-title":"An integrated framework for de-identifying unstructured medical data","volume":"68","author":"Gardner","year":"2009","journal-title":"Data Know Eng"},{"issue":"1","key":"2020110613101745800_ocaa095-B38","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1186\/s12911-019-0867-z","article-title":"A privacy-preserving distributed filtering framework for NLP artifacts","volume":"19","author":"Sadat","year":"2019","journal-title":"BMC Med Inform Decis Mak"},{"issue":"3","key":"2020110613101745800_ocaa095-B39","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1109\/TKDE.2016.2628180","article-title":"Scalable iterative classification for sanitizing large-scale datasets","volume":"29","author":"Li","year":"2017","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"10","key":"2020110613101745800_ocaa095-B40","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1016\/j.ijmedinf.2014.07.002","article-title":"De-identification of clinical narratives through writing complexity measures","volume":"83","author":"Li","year":"2014","journal-title":"Int J Med Inform"},{"key":"2020110613101745800_ocaa095-B41","author":"MITRE. MITRE Identification Scrubber Toolkit (MIST)","year":"2011"},{"issue":"6","key":"2020110613101745800_ocaa095-B42","doi-asserted-by":"crossref","first-page":"537","DOI":"10.2217\/cer-2017-0009","article-title":"Stakeholders\u2019 views on data sharing in multicenter studies","volume":"6","author":"Mazor","year":"2017","journal-title":"J Comp Eff Res"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/9\/1374\/34153350\/ocaa095.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/9\/1374\/34153350\/ocaa095.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T19:32:26Z","timestamp":1604691146000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/27\/9\/1374\/5905873"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,1]]},"references-count":42,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9,15]]},"published-print":{"date-parts":[[2020,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaa095","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,9]]},"published":{"date-parts":[[2020,9,1]]}}}