{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,12,26]],"date-time":"2023-12-26T13:13:31Z","timestamp":1703596411511},"reference-count":36,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2013,10,8]],"date-time":"2013-10-08T00:00:00Z","timestamp":1381190400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p><jats:italic>Aliases<\/jats:italic>play an important role in online environments by facilitating anonymity, but also can be used to hide the identity of cybercriminals. Previous studies have investigated this alias matching problem in an attempt to identify whether two aliases are shared by an author, which can assist with identifying users. Those studies create their training data by randomly splitting the documents associated with an alias into two sub-aliases. Models have been built that can regularly achieve over 90% accuracy for recovering the linkage between these \u2018random sub-aliases\u2019. In this paper, random sub-alias generation is shown to enable these high accuracies, and thus does not adequately model the real-world problem. In contrast, creating sub-aliases using topic-based splitting drastically reduces the accuracy of all authorship methods tested. We then present a methodology that can be performed on non-topic controlled datasets, to produce topic-based sub-aliases that are more difficult to match. Finally, we present an experimental comparison between many authorship methods to see which methods better match aliases under these conditions, finding that local<jats:italic>n<\/jats:italic>-gram methods perform better than others.<\/jats:p>","DOI":"10.1017\/s1351324913000272","type":"journal-article","created":{"date-parts":[[2013,10,8]],"date-time":"2013-10-08T12:59:03Z","timestamp":1381237143000},"page":"497-518","source":"Crossref","is-referenced-by-count":4,"title":["Authorship analysis of aliases: Does topic influence accuracy?"],"prefix":"10.1017","volume":"21","author":[{"given":"ROBERT","family":"LAYTON","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"PAUL A.","family":"WATTERS","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"RICHARD","family":"DAZELEY","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2013,10,8]]},"reference":[{"key":"S1351324913000272_ref029","first-page":"104","volume-title":"Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data","author":"Sedding","year":"2004"},{"key":"S1351324913000272_ref033","first-page":"1","volume-title":"2012 Third Cybercrime and Trustworthy Computing Workshop","author":"Ureche","year":"2012"},{"key":"S1351324913000272_ref019","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqq013"},{"key":"S1351324913000272_ref021","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988678"},{"key":"S1351324913000272_ref017","first-page":"1","volume-title":"eCrime Researchers Summit (eCrime), 2010","author":"Layton","year":"2011"},{"key":"S1351324913000272_ref020","volume-title":"Proceedings of the 33rd conference on IEEE Symposium on Security and Privacy","author":"Narayanan","year":"2012"},{"key":"S1351324913000272_ref014","first-page":"1","volume-title":"eCrime Researchers Summit, 2009. eCRIME\u201909.","author":"Layton","year":"2009"},{"key":"S1351324913000272_ref013","doi-asserted-by":"publisher","DOI":"10.1109\/CTC.2012.11"},{"key":"S1351324913000272_ref012","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9111-2"},{"key":"S1351324913000272_ref018","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324911000180"},{"key":"S1351324913000272_ref016","first-page":"1","article-title":"Automated unsupervised authorship analysis using evidence accumulation clustering","volume":"1","author":"Layton","year":"2011","journal-title":"Natural Language Engineering"},{"key":"S1351324913000272_ref004","volume-title":"Situational Crime Prevention","author":"Clarke","year":"1997"},{"key":"S1351324913000272_ref002","unstructured":"Alazab M. , Layton R. , Venkataraman S. , and Watters P. 2010. Malware detection based on structural and behavioural features of API calls. In Proceedings of the International Cyber Resilience Conference, School of Computer and Information Science, Security Research Centre, Edith Cowan University, Perth, Western Australia."},{"key":"S1351324913000272_ref032","first-page":"237","volume-title":"18th International Workshop on Database and Expert Systems Applications, 2007. DEXA\u201907","author":"Stamatatos","year":"2007"},{"key":"S1351324913000272_ref008","first-page":"541","volume-title":"Third IEEE International Conference on Data Mining, 2003. ICDM 2003","author":"Hotho","year":"2003"},{"key":"S1351324913000272_ref024","doi-asserted-by":"publisher","DOI":"10.1023\/A:1001018624850"},{"key":"S1351324913000272_ref030","first-page":"156","volume-title":"IJCNLP","author":"Solorio","year":"2011"},{"key":"S1351324913000272_ref007","doi-asserted-by":"crossref","unstructured":"Holzer R. , Malin B. , and Sweeney L. 2005. Email Alias Detection Using Social Network Analysis. PhD thesis. Information Networking Institute, Carnegie Mellon University.","DOI":"10.1145\/1134271.1134279"},{"key":"S1351324913000272_ref026","volume-title":"Introduction to Modern Information Retrieval","author":"Salton","year":"1986"},{"key":"S1351324913000272_ref011","unstructured":"Ke\u0161elj V. , Peng F. , Cercone N. , and Thomas C. 2003. N-gram-based author profiles for authorship attribution. In Proceedings of the Pacific Association for Computational Linguistics."},{"key":"S1351324913000272_ref023","first-page":"1","volume-title":"eCrime Researchers Summit (eCrime), 2010","author":"Pillay","year":"2010"},{"key":"S1351324913000272_ref027","first-page":"206","volume-title":"KDIR","author":"Schein","year":"2010"},{"key":"S1351324913000272_ref005","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1007\/978-3-642-25324-9_20","article-title":"A weighted profile intersection measure for profile-based authorship attribution","volume":"7094","author":"Escalante","year":"2011","journal-title":"Advances in Artificial Intelligence"},{"key":"S1351324913000272_ref025","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S1351324913000272_ref028","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"S1351324913000272_ref034","doi-asserted-by":"publisher","DOI":"10.1108\/13685201211266015"},{"key":"S1351324913000272_ref022","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324913000272_ref035","first-page":"23","article-title":"Modeling lexical-semantic processes using wordnet","volume":"3","author":"Watters","year":"1998","journal-title":"Glot International"},{"key":"S1351324913000272_ref036","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20316"},{"key":"S1351324913000272_ref009","volume-title":"Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining","author":"Jing","year":"2006"},{"key":"S1351324913000272_ref001","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-3223-4_6"},{"key":"S1351324913000272_ref031","doi-asserted-by":"publisher","DOI":"10.1109\/CTC.2010.14"},{"key":"S1351324913000272_ref006","first-page":"1","article-title":"Identifying authorship by byte-level n-grams: the source code author profile (SCAP) method","volume":"6","author":"Frantzeskou","year":"2007","journal-title":"International Journal of Digital Evidence"},{"key":"S1351324913000272_ref003","doi-asserted-by":"publisher","DOI":"10.5120\/7480-0545"},{"key":"S1351324913000272_ref010","first-page":"175","volume-title":"Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing","author":"Juola","year":"2004"},{"key":"S1351324913000272_ref015","first-page":"1","volume-title":"2010 Second Cybercrime and Trustworthy Computing Workshop","author":"Layton","year":"2010"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324913000272","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,7,30]],"date-time":"2019-07-30T09:16:55Z","timestamp":1564478215000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324913000272\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10,8]]},"references-count":36,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["S1351324913000272"],"URL":"https:\/\/doi.org\/10.1017\/s1351324913000272","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,10,8]]}}}