{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T21:14:11Z","timestamp":1758402851030,"version":"3.37.3"},"reference-count":0,"publisher":"IOS Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018]]},"abstract":"<jats:p>Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named entity recognition using a three-step approach. First, we used well known rule-based models based on regular expressions and gazetteers, second we used a spelling variant detector based on Levenshtein distance, exploiting the fact that the medical texts contain semi-structured headers including sensible personal data, and third we trained a named entity recognition model on out of domain data to add statistical capabilities to our prototype. Using a baseline based on regular expressions and gazetteers we could improve F2-score from 78% to 85% for de-identification. Our prototype is a first step for further research on German medical text de-identification and could show that using spelling variant detection and out of domain trained statistical models can improve de-identification performance significantly.<\/jats:p>","DOI":"10.3233\/978-1-61499-896-9-165","type":"book-chapter","created":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T23:19:12Z","timestamp":1740266352000},"source":"Crossref","is-referenced-by-count":1,"title":["De-Identification of German Medical Admission Notes"],"prefix":"10.3233","author":[{"family":"Richter-Pechanski Phillip","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Riezler Stefan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Dieterich Christoph","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","German Medical Data Sciences: A Learning Healthcare System"],"original-title":[],"deposited":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T23:45:10Z","timestamp":1740267910000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.medra.org\/servlet\/aliasResolver?alias=iospressISBN&isbn=978-1-61499-895-2&spage=165&doi=10.3233\/978-1-61499-896-9-165"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/978-1-61499-896-9-165","relation":{},"ISSN":["0926-9630"],"issn-type":[{"value":"0926-9630","type":"print"}],"subject":[],"published":{"date-parts":[[2018]]}}}