{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T00:44:19Z","timestamp":1777164259871,"version":"3.51.4"},"reference-count":26,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2012,9,28]],"date-time":"2012-09-28T00:00:00Z","timestamp":1348790400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2013,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Unsupervised Authorship Analysis (UAA) aims to cluster documents by authorship without knowing the authorship of any documents. An important factor in UAA is the method for calculating the distance between documents. This choice of the <jats:italic>authorship distance method<\/jats:italic> is considered more critical to the end result than the choice of cluster analysis algorithm. One method for measuring the correlation between a distance metric and a labelling (such as class values or clusters) is the Silhouette Coefficient (SC). The SC can be leveraged by measuring the correlation between the authorship distance method and the true authorship, evaluating the quality of the distance method. However, we show that the SC can be severely affected by outliers. To address this issue, we introduce the Positive Silhouette Coefficient, given as the proportion of instances with a positive SC value. This metric is not easily altered by outliers and produces a more robust metric. A large number of authorship distance methods are then compared using the PSC, and the findings are presented. This research provides an insight into the efficacy of methods for UAA and presents a framework for testing authorship distance methods.<\/jats:p>","DOI":"10.1017\/s1351324912000241","type":"journal-article","created":{"date-parts":[[2012,9,30]],"date-time":"2012-09-30T13:18:43Z","timestamp":1349011123000},"page":"517-535","source":"Crossref","is-referenced-by-count":32,"title":["Evaluating authorship distance methods using the positive Silhouette coefficient"],"prefix":"10.1017","volume":"19","author":[{"given":"ROBERT","family":"LAYTON","sequence":"first","affiliation":[]},{"given":"PAUL","family":"WATTERS","sequence":"additional","affiliation":[]},{"given":"RICHARD","family":"DAZELEY","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2012,9,28]]},"reference":[{"key":"S1351324912000241_ref24","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213006002965"},{"key":"S1351324912000241_ref23","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"S1351324912000241_ref21","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177732678"},{"key":"S1351324912000241_ref20","first-page":"1","volume-title":"Proceedings of the General Members Meeting and eCrime Researchers Summit","author":"Pillay","year":"2011"},{"key":"S1351324912000241_ref19","article-title":"Recentred local profiles for authorship attribution","author":"Layton","year":"2011","journal-title":"Journal of Natural Language Engineering"},{"key":"S1351324912000241_ref18","first-page":"1","article-title":"Automated unsupervised authorship analysis using evidence accumulation clustering","volume":"1","author":"Layton","year":"2011","journal-title":"Natural Language Engineering"},{"key":"S1351324912000241_ref17","first-page":"1","volume-title":"Proceedings of the General Members Meeting and eCrime Researchers Summit (eCrime 2010)","author":"Layton","year":"2010"},{"key":"S1351324912000241_ref16","volume-title":"Proceedings of the First Conference on Email and Anti-Spam (CEAS)","author":"Klimt","year":"2004"},{"key":"S1351324912000241_ref15","unstructured":"Ke\u0161elj V. , Peng F. , Cercone N. , and Thomas C. 2003. N-gram-based author profiles for authorship attribution. Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING)."},{"key":"S1351324912000241_ref6","doi-asserted-by":"publisher","DOI":"10.1080\/01969727408546059"},{"key":"S1351324912000241_ref5","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14980-1_37"},{"key":"S1351324912000241_ref4","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"S1351324912000241_ref3","unstructured":"Corbin M. 2011. Authorship Attribution in the Enron Email Corpus. PhD thesis, University of Maryland, Baltimore, MD, USA."},{"key":"S1351324912000241_ref12","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"S1351324912000241_ref11","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2008.05.001"},{"key":"S1351324912000241_ref2","first-page":"1027","volume-title":"Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Arthur","year":"2007"},{"key":"S1351324912000241_ref22","first-page":"410","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"Rosenberg","year":"2007"},{"key":"S1351324912000241_ref8","article-title":"Identifying authorship by byte-level n-grams: the source code author profile (SCAP) method.","volume":"6","author":"Frantzeskou","year":"2007","journal-title":"International Journal of Digital Evidence"},{"key":"S1351324912000241_ref26","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20316"},{"key":"S1351324912000241_ref13","volume-title":"Authorship Attribution","author":"Juola","year":"2008"},{"key":"S1351324912000241_ref7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-75555-5_26"},{"key":"S1351324912000241_ref14","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqi024"},{"key":"S1351324912000241_ref25","first-page":"378","article-title":"A survey of modern authorship attribution methods.","volume":"57","author":"Stamatatos","year":"2009","journal-title":"Journal of the American Society for Information Science and Technology"},{"key":"S1351324912000241_ref10","unstructured":"Huber P. J. , and Ronchetti E. 1981. Robust Statistics, 2nd ed. Wiley Online Library. http:\/\/au.wiley.com\/WileyCDA\/WileyTitle\/productCd-0470129905.html (Accessed 17 Sep 2012)."},{"key":"S1351324912000241_ref9","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"S1351324912000241_ref1","volume-title":"Proceedings of LREC","author":"Allison","year":"2008"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324912000241","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,23]],"date-time":"2019-04-23T19:29:48Z","timestamp":1556047788000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324912000241\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,28]]},"references-count":26,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,10]]}},"alternative-id":["S1351324912000241"],"URL":"https:\/\/doi.org\/10.1017\/s1351324912000241","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,9,28]]}}}