{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:40:32Z","timestamp":1774449632423,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>Authorship attribution involves determining the original author of an anonymous text from a pool of potential authors. The author attribution task has applications in several domains, such as plagiarism detection, digital text forensics, and information retrieval. While these applications extend beyond any single language, existing research has predominantly centered on English, posing challenges for application in languages such as Sinhala due to linguistic disparities and a lack of language processing tools. We present the first comprehensive study on cross-topic authorship attribution for Sinhala texts and propose a solution that can effectively perform the authorship attribution task even if the topics within the test and training samples differ. Our solution consists of three main parts: (i) extraction of topic-independent stylometric features, (ii) generation of a small candidate author set with the help of similarity search, and (iii) identification of the true author. Several experimental studies were carried out to demonstrate that the proposed solution can effectively handle real-world scenarios involving a large number of candidate authors and a limited number of text samples for each candidate author.<\/jats:p>","DOI":"10.1145\/3655620","type":"journal-article","created":{"date-parts":[[2024,3,30]],"date-time":"2024-03-30T09:24:01Z","timestamp":1711790641000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0640-807X","authenticated-orcid":false,"given":"Raheem","family":"Sarwar","sequence":"first","affiliation":[{"name":"Manchester Metropolitan University - All Saints Campus, Manchester, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0684-726X","authenticated-orcid":false,"given":"Maneesha","family":"Perera","sequence":"additional","affiliation":[{"name":"Thammasat University Sirindhorn International Institute of Technology, Khlong Nueng, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0607-2617","authenticated-orcid":false,"given":"Pin Shen","family":"Teh","sequence":"additional","affiliation":[{"name":"Manchester Metropolitan University - All Saints Campus, Manchester United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9588-0052","authenticated-orcid":false,"given":"Raheel","family":"Nawaz","sequence":"additional","affiliation":[{"name":"Staffordshire University, Stoke-on-Trent United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7607-5154","authenticated-orcid":false,"given":"Muhammad Umair","family":"Hassan","sequence":"additional","affiliation":[{"name":"Norwegian University of Science and Technology, \u00c5lesund United States"}]}],"member":"320","published-online":{"date-parts":[[2024,5,10]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2005.81"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12652-023-04584-y"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/13811.001.0001"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2014.06.006"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-021-05729-x"},{"key":"e_1_3_2_7_2","first-page":"39","article-title":"Extracting algorithmic complexity in scientific literature for advance searching","volume":"1","author":"Bakar Abu","year":"2023","unstructured":"Abu Bakar, Raheem Sarwar, Saeed-Ul Hassan, and Raheel Nawaz. 2023. Extracting algorithmic complexity in scientific literature for advance searching. Journal of Computational and Applied Linguistics 1 (2023), 39\u201365.","journal-title":"Journal of Computational and Applied Linguistics"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1088-467X(99)00018-9"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-03689-2_3"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2011.04.002"},{"key":"e_1_3_2_12_2","volume-title":"Analysing E-mail Text Authorship for Forensic Purposes","author":"Corney Malcolm W.","year":"2003","unstructured":"Malcolm W. Corney. 2003. Analysing E-mail Text Authorship for Forensic Purposes. Ph. D. Dissertation. Queensland University of Technology."},{"key":"e_1_3_2_13_2","article-title":"Survey on publicly available Sinhala natural language processing tools and research","author":"Silva Nisansa De","year":"2019","unstructured":"Nisansa De Silva. 2019. Survey on publicly available Sinhala natural language processing tools and research. arXiv preprint arXiv:1906.02358 (2019).","journal-title":"arXiv preprint arXiv:1906.02358"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/604264.604272"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023824908771"},{"key":"e_1_3_2_16_2","volume-title":"A Grammar of the Sinhalese Language","author":"Geiger Wilhelm","year":"1995","unstructured":"Wilhelm Geiger. 1995. A Grammar of the Sinhalese Language. Asian Educational Services."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.heliyon.2023.e15407"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2023.100523"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/13.3.111"},{"issue":"4","key":"e_1_3_2_20_2","first-page":"48","article-title":"Text classification for authorship attribution using na\u00efve Bayes classifier with limited training data","volume":"5","author":"Howedi Fatma","year":"2014","unstructured":"Fatma Howedi and Masnizah Mohd. 2014. Text classification for authorship attribution using na\u00efve Bayes classifier with limited training data. Computer Engineering and Intelligent Systems 5, 4 (2014), 48\u201356.","journal-title":"Computer Engineering and Intelligent Systems"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05397-3"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/0-387-36891-4_10"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"P. Juaola. 2006. Authorship attribution. Foundations and Trends. Information RETRIEVAL Journal 1 3 (2006) 233\u2013334.","DOI":"10.1561\/1500000005"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05445-y"},{"key":"e_1_3_2_25_2","first-page":"255","volume-title":"Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING","volume":"3","author":"Ke\u0161elj Vlado","year":"2003","unstructured":"Vlado Ke\u0161elj, Fuchun Peng, Nick Cercone, and Calvin Thomas. 2003. N-gram-based author profiles for authorship attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, Vol. 3. 255\u2013264."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-0908"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.4159\/harvard.9780674434929"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2555-1"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/17.4.401"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9111-2"},{"issue":"6","key":"e_1_3_2_31_2","article-title":"Measuring differentiability: Unmasking pseudonymous authors.","volume":"8","author":"Koppel Moshe","year":"2007","unstructured":"Moshe Koppel, Jonathan Schler, and Elisheva Bonchek-Dokow. 2007. Measuring differentiability: Unmasking pseudonymous authors. Journal of Machine Learning Research 8, 6 (2007).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSESS.2014.6933697"},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the 2005 Meeting of the Classification Society of North America (CSNA)","author":"Madigan David","year":"2005","unstructured":"David Madigan, Alexander Genkin, David D. Lewis, Shlomo Argamon, Dmitriy Fradkin, and Li Ye. 2005. Author identification on the large scale. In Proceedings of the 2005 Meeting of the Classification Society of North America (CSNA)."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05031-2"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18038-0_19"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-77116-8_21"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICISA.2014.6847369"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.ns-9.214S.237"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqac054"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0147"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05479-2"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1001018624850"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.46298\/jdmdh.8990"},{"key":"e_1_3_2_44_2","first-page":"122874","article-title":"Model optimization techniques in personalized federated learning: A survey","author":"Sabah Fahad","year":"2023","unstructured":"Fahad Sabah, Yuwen Chen, Zhen Yang, Muhammad Azam, Nadeem Ahmad, and Raheem Sarwar. 2023. Model optimization techniques in personalized federated learning: A survey. Expert Systems with Applications (2023), 122874.","journal-title":"Expert Systems with Applications"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/IC-NIDC59918.2023.10390693"},{"key":"e_1_3_2_46_2","first-page":"1228","volume-title":"Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers","author":"Sapkota Upendra","year":"2014","unstructured":"Upendra Sapkota, Thamar Solorio, Manuel Montes, Steven Bethard, and Paolo Rosso. 2014. Cross-topic authorship attribution: Will out-of-topic data help?. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1228\u20131237."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-15925-1_16"},{"issue":"2","key":"e_1_3_2_48_2","first-page":"1","article-title":"Urdu AI: Writeprints for Urdu authorship identification","volume":"21","author":"Sarwar Raheem","year":"2021","unstructured":"Raheem Sarwar and Saeed-Ul Hassan. 2021. Urdu AI: Writeprints for Urdu authorship identification. Transactions on Asian and Low-Resource Language Information Processing 21, 2 (2021), 1\u201318.","journal-title":"Transactions on Asian and Low-Resource Language Information Processing"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2018.07.009"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqab103"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.13053\/rcs-110-1-12"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3365832"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383202"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3358199"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2967449"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-91452-7_52"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2869198"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2012.06.003"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2106"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-srw.44"},{"key":"e_1_3_2_61_2","first-page":"325","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop","author":"Silva Kanishka","year":"2024","unstructured":"Kanishka Silva, Ingo Frommholz, Burcu Can, Fred Blain, Raheem Sarwar, and Laura Ugolini. 2024. Forged-GAN-BERT: Authorship attribution for LLM-generated forged novels. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. 325\u2013337."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213006002965"},{"issue":"2","key":"e_1_3_2_63_2","first-page":"7","article-title":"On the robustness of authorship attribution based on character n-gram features","volume":"21","author":"Stamatatos. Efstathios","year":"2013","unstructured":"Efstathios Stamatatos.2013. On the robustness of authorship attribution based on character n-gram features. Journal of Law and Policy 21, 2 (2013), 7.","journal-title":"Journal of Law and Policy"},{"key":"e_1_3_2_64_2","volume-title":"The Statistical Study of Literary Vocabulary","author":"Yule C. Udny","year":"2014","unstructured":"C. Udny Yule. 2014. The Statistical Study of Literary Vocabulary. Cambridge University Press."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3655620","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3655620","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:46Z","timestamp":1750291426000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3655620"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,10]]},"references-count":63,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3655620"],"URL":"https:\/\/doi.org\/10.1145\/3655620","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,10]]},"assertion":[{"value":"2024-01-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}