{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T16:52:08Z","timestamp":1757609528388,"version":"3.44.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"type":"electronic","value":"9781643686158"}],"license":[{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,3]]},"abstract":"<jats:p>Introduction: In 2024, the GeMTeX project launched the largest ever de-identification campaign for German-language clinical reports, and, as a pilot study, published GraSCCoPHI, the first de-identified German-language gold standard corpus of synthetic discharge summaries. Methods: GeMTeX\u2019s de-identification workflow is described here \u2013 including annotation tool management and, pre-annotation experience, such as assembling and training annotation groups and the evolution of guidelines. Results: We present the project\u2019s progress in the first year with respect to de-identification efforts and the challenges we faced during the rollout at six hospital sites in four German states. The refinement of the annotation guidelines became an ongoing process, often with unforeseen hurdles to overcome as we moved from testing to production. From our current internal interim corpus (9,000 documents with about 20 million tokens), we are publishing the first quantitative insights, such as the average amount of identifiable information per document, a list of confounding factors we did not anticipate at the beginning of the project, and three key lessons learned. Conclusion: We note that the unforeseen hurdles behave like the Pareto principle and fall into the set of less than 20% of the annotations.<\/jats:p>","DOI":"10.3233\/shti251406","type":"book-chapter","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T10:24:58Z","timestamp":1756895098000},"source":"Crossref","is-referenced-by-count":0,"title":["GeMTeX\u2019s De-Identification in Action: Lessons Learned &amp; Devil\u2019s Details"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9889-162X","authenticated-orcid":false,"given":"Christina","family":"Lohr","sequence":"first","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Jakob","family":"Faller","sequence":"additional","affiliation":[{"name":"Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Andrea","family":"Riedel","sequence":"additional","affiliation":[{"name":"Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany"},{"name":"Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg, Medical Informatics, Erlangen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Hung Manh","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Markus","family":"Wolfien","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany"},{"name":"Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Justin","family":"Hofenbitzer","sequence":"additional","affiliation":[{"name":"Technical University of Munich, School of Medicine and Health, Institute for AI and Informatics in Medicine, TUM University Hospital, Munich, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Luise","family":"Modersohn","sequence":"additional","affiliation":[{"name":"Technical University of Munich, School of Medicine and Health, Institute for AI and Informatics in Medicine, TUM University Hospital, Munich, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Jutta","family":"Romberg","sequence":"additional","affiliation":[{"name":"Data Integration Center, Berlin Institute of Health (BIH) at Charit\u00e9, Berlin, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Fabian","family":"Prasser","sequence":"additional","affiliation":[{"name":"Data Integration Center, Berlin Institute of Health (BIH) at Charit\u00e9, Berlin, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Jazia","family":"Omeirat","sequence":"additional","affiliation":[{"name":"Central IT Department, Data Integration Center, University Hospital Essen, Essen, Germany"},{"name":"Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Yutong","family":"Wen","sequence":"additional","affiliation":[{"name":"Central IT Department, Data Integration Center, University Hospital Essen, Essen, Germany"},{"name":"Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Oksana","family":"Galusch","sequence":"additional","affiliation":[{"name":"Data Integration Center, University of Leipzig Medical Center, Leipzig, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Udo","family":"Hahn","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Marvin","family":"Seiferling","sequence":"additional","affiliation":[{"name":"Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Christoph","family":"Dieterich","sequence":"additional","affiliation":[{"name":"Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Peter","family":"Kl\u00fcgl","sequence":"additional","affiliation":[{"name":"Averbis GmbH, Freiburg, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Franz","family":"Matthies","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Janina","family":"Kind","sequence":"additional","affiliation":[{"name":"Leipziger Forschungszentrum f\u00fcr Zivilisationserkrankungen \u2013 LIFE Management Cluster, Leipzig, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Martin","family":"Boeker","sequence":"additional","affiliation":[{"name":"Technical University of Munich, School of Medicine and Health, Institute for AI and Informatics in Medicine, TUM University Hospital, Munich, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Markus","family":"L\u00f6ffler","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany"},{"name":"Leipziger Forschungszentrum f\u00fcr Zivilisationserkrankungen \u2013 LIFE Management Cluster, Leipzig, Leipzig University, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]},{"given":"Frank","family":"Meineke","sequence":"additional","affiliation":[{"name":"Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany"},{"name":"GeMTeX Consortium of the German Medical Informatics Initiative"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","German Medical Data Sciences 2025: GMDS Illuminates Health"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI251406","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T10:24:59Z","timestamp":1756895099000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI251406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,3]]},"ISBN":["9781643686158"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti251406","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"type":"print","value":"0926-9630"},{"type":"electronic","value":"1879-8365"}],"subject":[],"published":{"date-parts":[[2025,9,3]]}}}