{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T11:14:39Z","timestamp":1774005279668,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T00:00:00Z","timestamp":1565136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or \u201chide in plain sight.\u201d We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender\u2019s process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker\u2019s success using measures of leak-detection rate and accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained undetected.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusion<\/jats:title>\n                  <jats:p>A malicious parrot attack to reveal leaked PII in clinical text deidentified by machine-learned HIPS resynthesis can attenuate but not eliminate the protective effect of HIPS deidentification.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocz114","type":"journal-article","created":{"date-parts":[[2019,6,13]],"date-time":"2019-06-13T19:13:24Z","timestamp":1560453204000},"page":"1536-1544","source":"Crossref","is-referenced-by-count":9,"title":["The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8471-0928","authenticated-orcid":false,"given":"David S","family":"Carrell","sequence":"first","affiliation":[{"name":"Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA"}]},{"given":"David J","family":"Cronkite","sequence":"additional","affiliation":[{"name":"Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA"}]},{"given":"Muqun (Rachel)","family":"Li","sequence":"additional","affiliation":[{"name":"Privacy Analytics Inc, Ottawa, Ontario, Canada"}]},{"given":"Steve","family":"Nyemba","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA"}]},{"given":"Bradley A","family":"Malin","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA"},{"name":"Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA"},{"name":"Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA"}]},{"given":"John S","family":"Aberdeen","sequence":"additional","affiliation":[{"name":"The MITRE Corp, Bedford, Massachusetts, USA"}]},{"given":"Lynette","family":"Hirschman","sequence":"additional","affiliation":[{"name":"The MITRE Corp, Bedford, Massachusetts, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,8,7]]},"reference":[{"key":"2020110612460212400_ocz114-B1","first-page":"53181","author":"US Department of Health and Human Services","year":"2002"},{"key":"2020110612460212400_ocz114-B2","doi-asserted-by":"crossref","first-page":"70.","DOI":"10.1186\/1471-2288-10-70","article-title":"Automatic de-identification of textual documents in the electronic health record: a review of recent research","volume":"10","author":"Meystre","year":"2010","journal-title":"BMC Med Res Methodol"},{"issue":"2","key":"2020110612460212400_ocz114-B3","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1136\/amiajnl-2012-001034","article-title":"Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text","volume":"20","author":"Carrell","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B4","author":"Hirschman"},{"issue":"12","key":"2020110612460212400_ocz114-B5","doi-asserted-by":"crossref","first-page":"e28071","DOI":"10.1371\/journal.pone.0028071","article-title":"A systematic review of re-identification attacks on health data","volume":"6","author":"El Emam","year":"2011","journal-title":"PloS One"},{"issue":"5","key":"2020110612460212400_ocz114-B6","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1093\/jamia\/ocv004","article-title":"R-U policy frontiers for health data identification","volume":"22","author":"Xia","year":"2015","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2020110612460212400_ocz114-B7","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1109\/TKDE.2005.32","article-title":"Preserving privacy by de-identifying face images","volume":"17","author":"Newton","year":"2005","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2020110612460212400_ocz114-B8","doi-asserted-by":"crossref","first-page":"S53","DOI":"10.1016\/j.jbi.2015.06.029","article-title":"Combining knowledge- and data-driven methods for de-identification of clinical narratives","volume":"58","author":"Dehghan","year":"2015","journal-title":"J Biomed Inform"},{"issue":"5","key":"2020110612460212400_ocz114-B9","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1197\/jamia.M2444","article-title":"Evaluating the state-of-the-art in automatic identification","volume":"14","author":"Uzuner","year":"2007","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B10","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1016\/j.jbi.2015.06.007","article-title":"Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2\/UTHealth shared task Track 1","volume":"58","author":"Stubbs","year":"2015","journal-title":"J Biomed Inform"},{"issue":"12","key":"2020110612460212400_ocz114-B11","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/j.ijmedinf.2010.09.007","article-title":"The MITRE identification scrubber toolkit: design, training, and assessment","volume":"79","author":"Aberdeen","year":"2010","journal-title":"Int J Med Inform"},{"key":"2020110612460212400_ocz114-B12","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1136\/amiajnl-2012-001020","article-title":"BoB, a best-of-breed automated text de-identification system for VHA clinical documents","volume":"20","author":"Ferrandez","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110612460212400_ocz114-B13","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1055\/s-0038-1634080","article-title":"Assessing the difficulty and time cost of de-identification in clinical narratives","volume":"45","author":"Dorr","year":"2006","journal-title":"Methods Inf Med"},{"issue":"5","key":"2020110612460212400_ocz114-B14","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1197\/jamia.M2702","article-title":"A software tool for removing patient identifying information from clinical documents","volume":"15","author":"Friedlin","year":"2008","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B15","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1055\/s-0038-1638592","article-title":"Extracting information from textual documents in the electronic health record: a review of recent research","volume":"17","author":"Meystre","year":"2008","journal-title":"Yearb Med Inform"},{"issue":"1","key":"2020110612460212400_ocz114-B16","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1197\/jamia.M2862","article-title":"Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?","volume":"16","author":"Morrison","year":"2009","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B17","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1197\/jamia.M2441","article-title":"State-of-the-art anonymization of medical records using an iterative machine learning framework","volume":"14","author":"Szarvas","year":"2007","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110612460212400_ocz114-B18","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1197\/jamia.M2435","article-title":"Rapidly retargetable approaches to de-identification in medical records","volume":"14","author":"Wellner","year":"2007","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2020110612460212400_ocz114-B19","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1136\/jamia.2009.002212","article-title":"Effects of personal identifier resynthesis on clinical text de-identification","volume":"17","author":"Yeniterzi","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B20","author":"Taira"},{"key":"2020110612460212400_ocz114-B21","doi-asserted-by":"crossref","first-page":"32.","DOI":"10.1186\/1472-6947-8-32","article-title":"Automated de-identification of free-text medical records","volume":"8","author":"Neamatullah","year":"2008","journal-title":"BMC Med Inform Decis Mak"},{"key":"2020110612460212400_ocz114-B22","first-page":"416","author":"Mayer","year":"2009"},{"issue":"12","key":"2020110612460212400_ocz114-B23","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.1016\/j.datak.2009.07.006","article-title":"An integrated framework for de-identifying unstructured medical data","volume":"68","author":"Gardner","year":"2009","journal-title":"Data Knowl Eng"},{"issue":"3","key":"2020110612460212400_ocz114-B24","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1093\/jamia\/ocw156","article-title":"De-identification of patient notes with recurrent neural networks","volume":"24","author":"Dernoncourt","year":"2017","journal-title":"J Am Med Inform Assoc"},{"key":"2020110612460212400_ocz114-B25","author":"Sweeney"},{"issue":"3","key":"2020110612460212400_ocz114-B26","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1109\/TKDE.2016.2628180","article-title":"Scalable iterative classification for sanitizing large-scale datasets","volume":"29","author":"Li","year":"2017","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2020110612460212400_ocz114-B27","doi-asserted-by":"crossref","first-page":"356","DOI":"10.3414\/ME15-01-0122","article-title":"Is the juice worth the squeeze? Costs and benefits of multiple human annotators for clinical text de-identification","volume":"55","author":"Carrell","year":"2016","journal-title":"Methods Inf Med"},{"key":"2020110612460212400_ocz114-B28","author":"OWASP"},{"key":"2020110612460212400_ocz114-B29","author":"MITRE. MITRE Identification Scrubber Toolkit (MIST)"},{"issue":"6","key":"2020110612460212400_ocz114-B30","doi-asserted-by":"crossref","first-page":"1893","DOI":"10.1109\/JBHI.2014.2344095","article-title":"Systematic poisoning attacks on and defenses for machine learning in healthcare","volume":"19","author":"Mozaffari-Kermani","year":"2015","journal-title":"IEEE J Biomed Health Inform"},{"issue":"9","key":"2020110612460212400_ocz114-B31","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1038\/ng.3062","article-title":"Data use under the NIH GWAS data sharing policy and future directions","volume":"46","author":"Paltoo","year":"2014","journal-title":"Nat Genet"},{"issue":"2","key":"2020110612460212400_ocz114-B32","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1016\/j.ajhg.2016.12.002","article-title":"Expanding access to large-scale genomic data while promoting privacy: a game theoretic approach","volume":"100","author":"Wan","year":"2017","journal-title":"Am J Hum Genet"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/12\/1536\/34152323\/ocz114.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/12\/1536\/34152323\/ocz114.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T18:32:49Z","timestamp":1604687569000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/12\/1536\/5544736"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,7]]},"references-count":32,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,8,7]]},"published-print":{"date-parts":[[2019,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz114","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,12]]},"published":{"date-parts":[[2019,8,7]]}}}