{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T05:03:52Z","timestamp":1764997432468,"version":"3.40.5"},"reference-count":54,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T00:00:00Z","timestamp":1642550400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Authorship attribution \u2013 the computational task of identifying the author of a given text document within a set of possible candidates \u2013 has been attracting interest in Natural Language Processing research for many years. At the same time, significant advances have also been observed in the related field of author profiling, that is, the computational task of learning author demographics from text such as gender, age and others. The close relation between the two topics \u2013 both of which focused on gaining knowledge about the individual who wrote a piece of text \u2013 suggests that research in these fields may benefit from each other. To illustrate this, this work addresses the issue of author identification with the aid of author profiling methods, adding demographics predictions to an authorship attribution architecture that may be particularly suitable to extensions of this kind, namely, a stack of classifiers devoted to different aspects of the input text (words, characters and text distortion patterns.) The enriched model is evaluated across a range of text domains, languages and author profiling estimators, and its results are shown to compare favourably to those obtained by a standard authorship attribution method that does not have access to author demographics predictions.<\/jats:p>","DOI":"10.1017\/s1351324921000383","type":"journal-article","created":{"date-parts":[[2022,1,19]],"date-time":"2022-01-19T06:13:00Z","timestamp":1642572780000},"page":"110-137","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":8,"title":["Authorship attribution using author profiling classifiers"],"prefix":"10.1017","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9830-8625","authenticated-orcid":false,"given":"Caio","family":"Deutsch","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7270-1477","authenticated-orcid":false,"given":"Ivandr\u00e9","family":"Paraboni","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,1,19]]},"reference":[{"doi-asserted-by":"publisher","key":"S1351324921000383_ref8","DOI":"10.26615\/978-954-452-056-4_123"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref14","DOI":"10.1109\/EISIC.2017.16"},{"unstructured":"Juola, P. and Stamatatos, E. (2013). Overview of the author identification task at PAN 2013. In Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23\u201326, 2013.","key":"S1351324921000383_ref17"},{"unstructured":"Kestemont, M. , Stamatatos, E. , Manjavacas, E. , Daelemans, W. , Potthast, M. and Stein, B. (2019). Overview of the cross-domain authorship attribution task at PAN 2019. In Cappellato L., Ferro N., Losada D. and M\u00fcller H. (eds), CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org.","key":"S1351324921000383_ref18"},{"unstructured":"Pizarro, J. (2019). Using N-grams to detect Bots on Twitter. In Cappellato L., Ferro N., Losada D. and M\u00fcller H. (eds), CLEF 2019 Labs and Workshops, Notebook Papers, Lugano, Switzerland. CEUR-WS.org.","key":"S1351324921000383_ref30"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref9","DOI":"10.1080\/13614568.2020.1722761"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref37","DOI":"10.1007\/978-981-10-7563-6_1"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref21","DOI":"10.1093\/llc\/fqx011"},{"unstructured":"Schler, J. , Koppel, M. , Argamon, S. and Pennebaker, J. (2006). Effects of age and gender on blogging. In AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Menlo Park, California, USA. AAAI Press, pp. 199\u2013205.","key":"S1351324921000383_ref43"},{"unstructured":"Schwartz, R. , Tsur, O. , Rappoport, A. and Koppel, M. (2013). Authorship attribution of micro-messages. In Empirical Methods in Natural Language Processing, Seattle, Washington, USA. Association for Computational Linguistics, pp. 1880\u20131891.","key":"S1351324921000383_ref44"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref27","DOI":"10.1109\/TAFFC.2020.3034050"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref40","DOI":"10.1016\/j.patrec.2020.04.020"},{"doi-asserted-by":"crossref","unstructured":"Markov, I. , Stamatatos, E. and Sidorov, G. (2017). Improving cross-topic authorship attribution: the role of pre-processing. In 18th International Conference on Computational Linguistics and Intelligent Text Processing, Budapest, Hungary, pp. 289\u2013302.","key":"S1351324921000383_ref22","DOI":"10.1007\/978-3-319-77116-8_21"},{"unstructured":"Garrido-Espinosa, M.G. , Rosales-P\u00e9rez, A. and L\u00f3pez-Monroy, A.P. (2020). GRU with author profiling information to detect aggressiveness. In Notebook Papers of 2nd SEPLN Workshop on Iberian Languages Evaluation Forum (IberLEF), Malaga, Spain.","key":"S1351324921000383_ref10"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref15","DOI":"10.1109\/ICMLA.2019.00061"},{"unstructured":"Casavantes, M. , L\u00f3pez, R. and Gonz\u00e1lez, L.C. (2019). UACh at MEX-A3T 2019: preliminary results on detecting aggressive tweets by adding author information via an unsupervised strategy. In IberLEF@ SEPLN, Bilbao, Spain. CEUR-WS.org, pp. 537\u2013543.","key":"S1351324921000383_ref4"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref31","DOI":"10.1007\/978-3-319-65813-1_25"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref36","DOI":"10.1017\/S1351324920000108"},{"doi-asserted-by":"crossref","unstructured":"Sharon Belvisi, N.M. , Muhammad, N. and Alonso-Fernandez, F. (2020). Forensic authorship analysis of microblogging texts using n-grams and stylometric features. In 8th International Workshop on Biometrics and Forensics (IWBF), Porto, Portugal. IEEE, pp. 1\u20136.","key":"S1351324921000383_ref45","DOI":"10.1109\/IWBF49977.2020.9107953"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref12","DOI":"10.1109\/SMC.2016.7844873"},{"unstructured":"Bagnall, D. (2016). Authorship clustering using multi-headed recurrent neural networks. In Cappellato, L. , Ferro, N. , Macdonald, C. and Balog, K. (eds), CEUR Workshop Proceedings, vol. 1609, Evora, Portugal. CEUR-WS.org, pp. 791\u2013804.","key":"S1351324921000383_ref1"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref16","DOI":"10.18653\/v1\/E17-2068"},{"unstructured":"Kestemont, M. , Tschugnall, M. , Stamatatos, E. , Daelemans, W. , Specht, G. , Stein, B. and Potthast, M. (2018). Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In Cappellato L., Ferro N., Nie J.-Y. and Soulier L. (eds), Working Notes Papers of the CLEF 2018 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org.","key":"S1351324921000383_ref19"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref24","DOI":"10.1109\/SMC.2019.8914323"},{"unstructured":"Nguyen, D.-P. , Trieschnigg, R.B. , Dogruoz, A.S. , Gravel, R. , Theune, M. , Meder, T. and de Jong, F.M. (2014). Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In Proceedings of COLING-2014. Association for Computational Linguistics, pp. 1950\u20131961.","key":"S1351324921000383_ref25"},{"unstructured":"Stevenson, M. , Vlachos, A. and Sari, Y. (2017). Continuous n-gram representations for authorship attribution. In 15th Conference of the European Chapter of the Association for Computational Linguistics EACL-2017, Valencia, Spain, pp. 267\u2013273.","key":"S1351324921000383_ref49"},{"unstructured":"Sundararajan, K. and Woodard, D.L. (2018). What constitutes style in authorship attribution? In 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 2814\u20132822.","key":"S1351324921000383_ref50"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref28","DOI":"10.1109\/TrustCom.2016.0054"},{"unstructured":"Pennebaker, J.W. , Francis, M.E. and Booth, R.J. (2001). Inquiry and Word Count: LIWC. Mahwah, NJ: Lawrence Erlbaum.","key":"S1351324921000383_ref29"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref20","DOI":"10.18653\/v1\/P17-2075"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref5","DOI":"10.1007\/978-3-642-23199-5_28"},{"unstructured":"Patchala, J. and Bhatnagar, R. (2018). Authorship attribution by consensus among multiple features. In 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 2766\u20132777.","key":"S1351324921000383_ref26"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref39","DOI":"10.1109\/TIFS.2016.2603960"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref47","DOI":"10.1007\/978-3-319-99722-3_11"},{"unstructured":"Basile, A. , Dwyer, G. , Medvedeva, M. , Rawee, J. , Haagsma, H. and Nissim, M. (2017). N-GrAM: new groningen author-profiling model. In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, Dublin.","key":"S1351324921000383_ref3"},{"unstructured":"Ramos, R.M.S. , Neto, G.B.S. , Silva, B.B.C. , Monteiro, D.S. , Paraboni, I. and Dias, R.F.S. (2018). Building a corpus for personality-dependent natural language understanding and generation. In 11th International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan. ELRA, pp. 1138\u20131145.","key":"S1351324921000383_ref32"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref6","DOI":"10.1007\/978-3-030-28577-7_17"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref23","DOI":"10.1007\/BF02295996"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref11","DOI":"10.1109\/TKDE.2010.173"},{"unstructured":"Sari, Y. and Stevenson, M. (2016). Exploring word embeddings and character N-grams for author clustering notebook for PAN at CLEF 2016. In CEUR Workshop Proceedings, Evora, Portugal. CEUR-WS.org.","key":"S1351324921000383_ref41"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref46","DOI":"10.18653\/v1\/E17-2106"},{"doi-asserted-by":"crossref","unstructured":"Baker, C.F. , Fillmore, C.J. and Lowe, J.B. (1998). The Berkeley FrameNet project. In COLING-1998, Montr\u00e9al, Quebec, Canada. Association for Computational Linguistics, pp. 86\u201390.","key":"S1351324921000383_ref2","DOI":"10.3115\/980845.980860"},{"unstructured":"Rangel, F. , Rosso, P. , Montes-y-G\u00f3mez, M. , Potthast, M. and Stein, B. (2018). Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter. In Cappellato L., Ferro N., Nie, J.-Y. and Soulier L. (eds), Working Notes Papers of the CLEF 2018 Evaluation Labs, CEUR Workshop Proceedings, Avignon, France. CLEF and CEUR-WS.org.","key":"S1351324921000383_ref34"},{"unstructured":"Rangel, F. , Rosso, P. , Potthast, M. and Stein, B. (2017). Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, Dublin. CEUR-WS.org.","key":"S1351324921000383_ref35"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref38","DOI":"10.1109\/IACC.2017.0176"},{"unstructured":"Sari, Y. , Stevenson, M. and Vlachos, A. (2018). Topic or style? exploring the most useful features for authorship attribution. In 27th International Conference on Computational Linguistics COLING-2018, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp. 343\u2013353.","key":"S1351324921000383_ref42"},{"unstructured":"Vartapetiance, A. and Gillam, L. (2012). Quite simple approaches for authorship attribution, intrinsic plagiarism detection and sexual predator identification. In CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy. CEUR-WS.org.","key":"S1351324921000383_ref52"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref48","DOI":"10.18653\/v1\/E17-1107"},{"unstructured":"Hsieh, F.C. , Dias, R.F.S. and Paraboni, I. (2018). Author profiling from facebook corpora. In 11th International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki, Japan. ELRA, pp. 2566\u20132570.","key":"S1351324921000383_ref13"},{"unstructured":"Verhoeven, B. , Daelemans, W. and Plank, B. (2016). TwiSty: a multilingual Twitter Stylometry corpus for gender and personality profiling. In 10th International Conference on Language Resources and Evaluation (LREC-2016), Portoroz, Slovenia. ELRA, pp. 1632\u20131637.","key":"S1351324921000383_ref53"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref54","DOI":"10.1016\/S0893-6080(05)80023-1"},{"unstructured":"Rangel, F. and Rosso, P. (2019). Overview of the 7th author profiling task at PAN 2019: bots and gender profiling. In Cappellato L., Ferro N., Losada D. and M\u00fcller H. (eds), CLEF 2019 Labs and Workshops, Notebook Papers, Lugano, Switzerland. CEUR-WS.org.","key":"S1351324921000383_ref33"},{"doi-asserted-by":"publisher","key":"S1351324921000383_ref7","DOI":"10.1016\/j.eswa.2021.114866"},{"unstructured":"Takahashi, T. , Tahara, T. , Nagatani, K. , Miura, Y. , Taniguchi, T. and Ohkuma, T. (2018). Text and image synergy with feature cross technique for gender identification. In Working Notes Papers of the Conference and Labs of the Evaluation Forum (CLEF 2018), vol. 2125, Avignon, France. CEUR-WS.org.","key":"S1351324921000383_ref51"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324921000383","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T09:05:40Z","timestamp":1675155940000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324921000383\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,19]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1]]}},"alternative-id":["S1351324921000383"],"URL":"https:\/\/doi.org\/10.1017\/s1351324921000383","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2022,1,19]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}