{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T14:30:36Z","timestamp":1780324236505,"version":"3.54.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,10,2]],"date-time":"2021-10-02T00:00:00Z","timestamp":1633132800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,2]],"date-time":"2021-10-02T00:00:00Z","timestamp":1633132800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.<\/jats:p>","DOI":"10.1007\/s41060-021-00285-x","type":"journal-article","created":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T15:21:29Z","timestamp":1633360889000},"page":"151-161","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Anonymization of German financial documents using neural network-based language models with contextual word representations"],"prefix":"10.1007","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6954-4722","authenticated-orcid":false,"given":"David","family":"Biesner","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rajkumar","family":"Ramamurthy","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robin","family":"Stenzel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Max","family":"L\u00fcbbering","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lars","family":"Hillebrand","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anna","family":"Ladi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Maren","family":"Pielka","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"R\u00fcdiger","family":"Loitz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christian","family":"Bauckhage","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rafet","family":"Sifa","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,10,2]]},"reference":[{"key":"285_CR1","unstructured":"Gola, P., Eichler, C., Franck, L., et al.: Datenschutz-grundverordnung: Ds-gvo (2017). Art. 6, paragraph 255"},{"key":"285_CR2","unstructured":"Sweeney, L.: Replacing Personally-Identifying Information in Medical Records, the Scrub System. (1996)"},{"issue":"5","key":"285_CR3","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1197\/jamia.M2435","volume":"14","author":"B Wellner","year":"2007","unstructured":"Wellner, B., Huyck, M., Mardis, S., et al.: Rapidly retargetable approaches to de-identification in medical records. J. Am. Med. Inf. Assoc. 14(5), 564 (2007)","journal-title":"J. Am. Med. Inf. Assoc."},{"key":"285_CR4","doi-asserted-by":"crossref","unstructured":"Gardner, J., Xiong, L.: HIDE: an integrated system for health information de-identification. In: Proc. on. International Symposium on Computer-Based Medical Systems (2008), pp. 254\u2013259","DOI":"10.1109\/CBMS.2008.129"},{"issue":"1","key":"285_CR5","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/1472-6947-8-32","volume":"8","author":"I Neamatullah","year":"2008","unstructured":"Neamatullah, I., Douglass, M.M., Li-wei, H.L., et al.: Automated de-identification of free-text medical records. BMC Med. Inf. Decis. Mak. 8(1), 32 (2008)","journal-title":"BMC Med. Inf. Decis. Mak."},{"issue":"1","key":"285_CR6","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1136\/amiajnl-2012-001020","volume":"20","author":"O Ferr\u00e1ndez","year":"2012","unstructured":"Ferr\u00e1ndez, O., South, B., Shen, S., et al.: BoB, a best-of-breed automated text de-identification system for VHA clinical documents. J. Am. Med. Inf. Assoc. 20(1), 77 (2012)","journal-title":"J. Am. Med. Inf. Assoc."},{"key":"285_CR7","doi-asserted-by":"crossref","unstructured":"Nguyen, N., Guo, Y.: Comparisons of sequence labeling algorithms and extensions. In: Proceedings of the 24th International Conference on Machine Learning 227, 681\u2013688 (2007)","DOI":"10.1145\/1273496.1273582"},{"key":"285_CR8","doi-asserted-by":"crossref","unstructured":"Li, J., Sun, A., Han, J., Li, C.: A Survey on Deep Learning for Named Entity Recognition (2018)","DOI":"10.18653\/v1\/W17-2314"},{"key":"285_CR9","unstructured":"Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient Estimation of Word Representations in Vector Space (2013)"},{"key":"285_CR10","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: Proc. of Empirical Methods in Natural Language Processing (2014), pp. 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"key":"285_CR11","doi-asserted-by":"crossref","unstructured":"Kudo, T., Richardson, J.: Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing (2018)","DOI":"10.18653\/v1\/D18-2012"},{"key":"285_CR12","unstructured":"Ghannay, S., Favre, B., Est\u00e8ve, Y., Camelin, N.: Word Embeddings Evaluation and Combination. Tech. rep. https:\/\/code.google.com\/p\/word2vec\/"},{"key":"285_CR13","doi-asserted-by":"crossref","unstructured":"Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proc. of CoNLL, pp. 142\u2013147 (2003)","DOI":"10.3115\/1119176.1119195"},{"key":"285_CR14","unstructured":"Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: Proc. of Int. Con. on Computational Linguistics (2018), pp. 1638\u20131649. https:\/\/www.aclweb.org\/anthology\/C18-1139"},{"key":"285_CR15","doi-asserted-by":"crossref","unstructured":"Ramamurthy, R., Stenzel, R., Sifa, R., Ladi, A., Bauckhage, C.: Echo state networks for named entity recognition. In: Proc. of. International Conference on Artificial Neural Networks (2019)","DOI":"10.1007\/978-3-030-30493-5_11"},{"key":"285_CR16","doi-asserted-by":"crossref","unstructured":"Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016)","DOI":"10.18653\/v1\/E17-2068"},{"key":"285_CR17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need, CoRR. arXiv preprint arXiv:1706.03762"},{"key":"285_CR18","unstructured":"Bai, S., Kolter, J.Z., Koltun, V.: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, CoRR. arXiv preprint arXiv:1803.01271"},{"key":"285_CR19","unstructured":"Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, CoRR. arXiv preprint arXiv:1810.04805."},{"key":"285_CR20","unstructured":"Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q.V., Salakhutdinov, R.: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, CoRR. arXiv preprint arXiv:1901.02860."},{"key":"285_CR21","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735 (1997)","journal-title":"Neural Comput."},{"key":"285_CR22","unstructured":"Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proc. of Int. Conf. on Machine Learning (2001), ICML \u201901, pp. 282\u2013289"},{"key":"285_CR23","unstructured":"Benikova, D., Biemann, C., Kisselew, M., Pad\u00f3, S.: GermEval Named Entity Recognition: Companion paper, pp. 104\u2013112. In: Proc. of the KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany (2014)"},{"key":"285_CR24","doi-asserted-by":"crossref","unstructured":"Yamada, I., Asai, A., Shindo, H., Takeda, H., Matsumoto, Y.: LUKE: Deep Contextualized Entity Representations with Entity-Aware Self-Attention. arXiv preprint arXiv:2010.01057 (2020)","DOI":"10.18653\/v1\/2020.emnlp-main.523"},{"key":"285_CR25","doi-asserted-by":"crossref","unstructured":"Manning, C.D.: Part-of-speech tagging from 97 to 100%: is it time for some linguistics?. In: International Conference on Intelligent Text Processing and Computational Linguistics (Springer, 2011), pp. 171\u2013189","DOI":"10.1007\/978-3-642-19400-9_14"},{"key":"285_CR26","doi-asserted-by":"crossref","unstructured":"Wang, Z., Shang, J., Liu, L. Lu, L., Liu, J., Han, J.: Crossweigh: Training Named Entity Tagger from Imperfect Annotations. arXiv preprint arXiv:1909.01441 (2019)","DOI":"10.18653\/v1\/D19-1519"},{"key":"285_CR27","first-page":"1","volume":"2019","author":"R Sifa","year":"2019","unstructured":"Sifa, R., L\u00fcbbering, M., N\u00fctten, U., Bauckhage, C., Warning, U., F\u00fcrst, B., Khameneh, T., Thom, D., Huseynov, I., Kahlert, R., Schlums, J., Ladi, A., Ismail, H., Kliem, B., Loitz, R., Pielka, M., Ramamurthy, R., Hillebrand, L., Kirsch, B., Bell, T.: Towards automated auditing with machine learning. Proc. ACM Symp. Doc. Eng. 2019, 1\u20134 (2019)","journal-title":"Proc. ACM Symp. Doc. Eng."}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-021-00285-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-021-00285-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-021-00285-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,10]],"date-time":"2022-03-10T12:05:14Z","timestamp":1646913914000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-021-00285-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,2]]},"references-count":27,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["285"],"URL":"https:\/\/doi.org\/10.1007\/s41060-021-00285-x","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,2]]},"assertion":[{"value":"1 October 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"None.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}