{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T09:50:01Z","timestamp":1747216201766,"version":"3.40.5"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"type":"electronic","value":"9781643685366"}],"license":[{"start":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T00:00:00Z","timestamp":1724976000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,8,30]]},"abstract":"<jats:p>Introduction: The German Medical Text Project (GeMTeX) is one of the largest infrastructure efforts targeting German-language clinical documents. We here introduce the architecture of the de-identification pipeline of GeMTeX. Methods: This pipeline comprises the export of raw clinical documents from the local hospital information system, the import into the annotation platform INCEpTION, fully automatic pre-tagging with protected health information (PHI) items by the Averbis Health Discovery pipeline, a manual curation step of these pre-annotated data, and, finally, the automatic replacement of PHI items with type-conformant substitutes. This design was implemented in a pilot study involving six annotators and two curators each at the Data Integration Centers of the University Hospitals Leipzig and Erlangen. Results: As a proof of concept, the publicly available Graz Synthetic Text Clinical Corpus (GRASSCO) was enhanced with PHI annotations in an annotation campaign for which reasonable inter-annotator agreement values of Krippendorff\u2019s \u03b1 \u2248 0.97 can be reported. Conclusion: These curated 1.4 K PHI annotations are released as open-source data constituting the first publicly available German clinical language text corpus with PHI metadata.<\/jats:p>","DOI":"10.3233\/shti240853","type":"book-chapter","created":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T09:07:20Z","timestamp":1725527240000},"source":"Crossref","is-referenced-by-count":0,"title":["De-Identifying GRASCCO \u2013 A Pilot Study for the De-Identification of the German Medical Text Project (GeMTeX) Corpus"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9889-162X","authenticated-orcid":false,"given":"Christina","family":"Lohr","sequence":"first","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Franz","family":"Matthies","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Jakob","family":"Faller","sequence":"additional","affiliation":[{"name":"Medical Center for Information and Communication Technology, Universit\u00e4tsklinikum Erlangen, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg, Erlangen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Luise","family":"Modersohn","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Informatics in Medicine, Medical Center rechts der Isar, Technical University Munich, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Andrea","family":"Riedel","sequence":"additional","affiliation":[{"name":"Medical Center for Information and Communication Technology, Universit\u00e4tsklinikum Erlangen, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg, Erlangen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Udo","family":"Hahn","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Rebekka","family":"Kiser","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Informatics in Medicine, Medical Center rechts der Isar, Technical University Munich, Germany"}]},{"given":"Martin","family":"Boeker","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Informatics in Medicine, Medical Center rechts der Isar, Technical University Munich, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Frank","family":"Meineke","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","German Medical Data Sciences 2024"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI240853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T09:07:21Z","timestamp":1725527241000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI240853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,30]]},"ISBN":["9781643685366"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti240853","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"type":"print","value":"0926-9630"},{"type":"electronic","value":"1879-8365"}],"subject":[],"published":{"date-parts":[[2024,8,30]]}}}