{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T05:27:58Z","timestamp":1731043678388,"version":"3.28.0"},"reference-count":26,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,11,7]],"date-time":"2024-11-07T00:00:00Z","timestamp":1730937600000},"content-version":"vor","delay-in-days":311,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,11,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Authorship verification (AV) aims to identify whether a pair of texts has the same author. We address the challenge of evaluating AV models\u2019 robustness against topic shifts. The conventional evaluation assumes minimal topic overlap between training and test data. However, we argue that there can still be topic leakage in test data, causing misleading model performance and unstable rankings. To address this, we propose an evaluation method called Heterogeneity-Informed Topic Sampling (HITS), which creates a smaller dataset with a heterogeneously distributed topic set. Our experimental results demonstrate that HITS-sampled datasets yield a more stable ranking of models across random seeds and evaluation splits. Our contributions include: 1. An analysis of causes and effects of topic leakage; 2. A demonstration of the HITS in reducing the effects of topic leakage; and 3. The Robust Authorship Verification bENchmark (RAVEN) that allows topic shortcut test to uncover AV models\u2019 reliance on topic-specific features.<\/jats:p>","DOI":"10.1162\/tacl_a_00709","type":"journal-article","created":{"date-parts":[[2024,11,7]],"date-time":"2024-11-07T20:18:41Z","timestamp":1731010721000},"page":"1363-1377","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification"],"prefix":"10.1162","volume":"12","author":[{"given":"Jitkapat","family":"Sawatphol","sequence":"first","affiliation":[{"name":"School of Information Science and Technology Vidyasirimedhi Institute of Science and Technology, Thailand. jitkapat.s_s20@vistec.ac.th"}]},{"given":"Can","family":"Udomcharoenchaikit","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology Vidyasirimedhi Institute of Science and Technology, Thailand. canu_pro@vistec.ac.th"}]},{"given":"Sarana","family":"Nutanong","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology Vidyasirimedhi Institute of Science and Technology, Thailand. snutanon@vistec.ac.th"}]}],"member":"281","published-online":{"date-parts":[[2024,11,4]]},"reference":[{"key":"2024110720183720900_bib1","doi-asserted-by":"publisher","first-page":"4242","DOI":"10.18653\/v1\/2021.findings-emnlp.359","article-title":"The topic confusion task: A novel evaluation scenario for authorship attribution","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Altakrori","year":"2021"},{"key":"2024110720183720900_bib2","doi-asserted-by":"publisher","first-page":"654","DOI":"10.18653\/v1\/N19-1068","article-title":"Generalizing unmasking for short texts","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Bevendorff","year":"2019"},{"issue":"Jan","key":"2024110720183720900_bib3","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"2024110720183720900_bib4","article-title":"O2D2: Out-of-distribution detector to capture undecidable trials in authorship verification\u2014notebook for PAN at CLEF 2021","volume-title":"CLEF 2021 Labs and Workshops, Notebook Papers","author":"Boenninghoff","year":"2021"},{"key":"2024110720183720900_bib5","doi-asserted-by":"publisher","first-page":"5634","DOI":"10.18653\/v1\/2022.emnlp-main.380","article-title":"Rethinking the authorship verification experimental setups","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Brad","year":"2022"},{"key":"2024110720183720900_bib6","doi-asserted-by":"publisher","first-page":"4069","DOI":"10.18653\/v1\/D19-1418","article-title":"Don\u2019t take the easy way out: Ensemble based methods for avoiding known dataset biases","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Clark","year":"2019"},{"key":"2024110720183720900_bib7","doi-asserted-by":"publisher","first-page":"774","DOI":"10.1162\/tacl_a_00397","article-title":"Towards question-answering as an automatic metric for evaluating the content quality of a summary","volume":"9","author":"Deutsch","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024110720183720900_bib8","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"issue":"85","key":"2024110720183720900_bib9","first-page":"2825","article-title":"Scikit-learn: Machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2024110720183720900_bib10","first-page":"1321","article-title":"On calibration of modern neural networks","volume-title":"International Conference on Machine Learning","author":"Guo","year":"2017"},{"key":"2024110720183720900_bib11","article-title":"Overview of the cross-domain authorship verification task at PAN 2020","volume-title":"Conference and Labs of the Evaluation Forum","author":"Kestemont","year":"2020"},{"key":"2024110720183720900_bib12","article-title":"Overview of the cross-domain authorship verification task at PAN 2021","volume-title":"CLEF (Working Notes)","author":"Kestemont","year":"2021"},{"key":"2024110720183720900_bib13","article-title":"Investigating topic influence in authorship attribution","volume-title":"PAN","author":"Mikros","year":"2007"},{"key":"2024110720183720900_bib14","first-page":"1415","article-title":"A simple measure to assess non-response","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Pe\u00f1as","year":"2011"},{"key":"2024110720183720900_bib15","doi-asserted-by":"publisher","first-page":"3982","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-BERT: Sentence embeddings using Siamese BERT-networks","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Reimers","year":"2019"},{"key":"2024110720183720900_bib16","doi-asserted-by":"publisher","first-page":"913","DOI":"10.18653\/v1\/2021.emnlp-main.70","article-title":"Learning universal authorship representations","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Rivera-Soto","year":"2021"},{"key":"2024110720183720900_bib17","first-page":"1228","article-title":"Cross-topic authorship attribution: Will out-of-topic data help?","volume-title":"Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers","author":"Sapkota","year":"2014"},{"key":"2024110720183720900_bib18","doi-asserted-by":"publisher","first-page":"1076","DOI":"10.18653\/v1\/2022.emnlp-main.70","article-title":"Topic-regularized authorship representation learning","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Sawatphol","year":"2022"},{"key":"2024110720183720900_bib19","first-page":"7","article-title":"On the robustness of authorship attribution based on character n-gram features","volume":"21","author":"Stamatatos","year":"2013","journal-title":"Journal of Law and Policy"},{"key":"2024110720183720900_bib20","doi-asserted-by":"publisher","first-page":"1138","DOI":"10.18653\/v1\/E17-1107","article-title":"Authorship attribution using text distortion","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers","author":"Stamatatos","year":"2017"},{"key":"2024110720183720900_bib21","first-page":"877","article-title":"Overview of the author identification task at PAN 2014","volume-title":"CEUR Workshop Proceedings","author":"Stamatatos","year":"2014"},{"key":"2024110720183720900_bib22","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1007\/978-94-017-0171-6_7","article-title":"Using compression-based language models for text categorization","author":"Teahan","year":"2003","journal-title":"Language Modeling for Information Retrieval"},{"key":"2024110720183720900_bib23","doi-asserted-by":"publisher","first-page":"649","DOI":"10.18653\/v1\/2023.ijcnlp-main.43","article-title":"Valla: Standardizing and benchmarking authorship attribution and verification through empirical evaluation and comparative analysis","volume-title":"Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Tyo","year":"2023"},{"key":"2024110720183720900_bib24","doi-asserted-by":"crossref","first-page":"8717","DOI":"10.18653\/v1\/2020.acl-main.770","article-title":"Mind the trade-off: Debiasing NLU models without degrading the in-distribution performance","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Utama","year":"2020"},{"key":"2024110720183720900_bib25","doi-asserted-by":"crossref","first-page":"7597","DOI":"10.18653\/v1\/2020.emnlp-main.613","article-title":"Towards debiasing NLU models from unknown biases","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Utama","year":"2020"},{"key":"2024110720183720900_bib26","doi-asserted-by":"publisher","first-page":"249","DOI":"10.18653\/v1\/2022.repl4nlp-1.26","article-title":"Same author or just same topic? Towards content-independent style representations","volume-title":"Proceedings of the 7th Workshop on Representation Learning for NLP","author":"Wegmann","year":"2022"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00709\/2478605\/tacl_a_00709.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00709\/2478605\/tacl_a_00709.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,7]],"date-time":"2024-11-07T20:18:47Z","timestamp":1731010727000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00709\/125173\/Addressing-Topic-Leakage-in-Cross-Topic-Evaluation"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":26,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00709","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}