{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T21:32:54Z","timestamp":1776807174449,"version":"3.51.2"},"reference-count":82,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2019,4,10]],"date-time":"2019-04-10T00:00:00Z","timestamp":1554854400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this article, we propose an innovative and robust approach to stylometric analysis without annotation and leveraging lexical and sub-lexical information. In particular, we propose to leverage the phonological information of tones and rimes in Mandarin Chinese automatically extracted from unannotated texts. The texts from different authors were represented by tones, tone motifs, and word length motifs as well as rimes and rime motifs. Support vector machines and random forests were used to establish the text classification model for authorship attribution. From the results of the experiments, we conclude that the combination of bigrams of rimes, word-final rimes, and <jats:italic>segment<\/jats:italic>-final rimes can discriminate the texts from different authors effectively when using random forests to establish the classification model. This robust approach can in principle be applied to other languages with established phonological inventory of onset and rimes.<\/jats:p>","DOI":"10.1017\/s135132491900010x","type":"journal-article","created":{"date-parts":[[2019,4,10]],"date-time":"2019-04-10T05:00:59Z","timestamp":1554872459000},"page":"49-71","source":"Crossref","is-referenced-by-count":19,"title":["Robust stylometric analysis and author attribution based on tones and rimes"],"prefix":"10.1017","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2510-6277","authenticated-orcid":false,"given":"Renkui","family":"Hou","sequence":"first","affiliation":[]},{"given":"Chu-Ren","family":"Huang","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,4,10]]},"reference":[{"key":"S135132491900010X_ref79","volume-title":"The Statistical Study of Literary Vocabulary","author":"Yule","year":"1944"},{"key":"S135132491900010X_ref78","first-page":"363","article-title":"On sentence-length as a statistical characteristic of style in prose: With application to two cases of disputed authorship","volume":"30","author":"Yule","year":"1938","journal-title":"Biometrika"},{"key":"S135132491900010X_ref77","first-page":"45","volume-title":"Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature","author":"Yu","year":"2012"},{"key":"S135132491900010X_ref76","unstructured":"Yu, P.B. . (1950). ."},{"key":"S135132491900010X_ref36","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-28084-7_58"},{"key":"S135132491900010X_ref24","doi-asserted-by":"publisher","DOI":"10.1080\/09296174.2017.1314411"},{"key":"S135132491900010X_ref35","doi-asserted-by":"publisher","DOI":"10.1561\/1500000005"},{"key":"S135132491900010X_ref2","volume-title":"Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing","author":"Argamon","year":"2005"},{"key":"S135132491900010X_ref25","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqz005"},{"key":"S135132491900010X_ref62","volume-title":"A Computational Theory of Writing Systems","author":"Sproat","year":"2000"},{"key":"S135132491900010X_ref33","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0026683"},{"key":"S135132491900010X_ref43","first-page":"81","volume-title":"Text and Language","author":"K\u00f6hler","year":"2010"},{"key":"S135132491900010X_ref47","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511483165"},{"key":"S135132491900010X_ref3","doi-asserted-by":"publisher","DOI":"10.17928\/jjadh.2.1_1"},{"key":"S135132491900010X_ref7","doi-asserted-by":"publisher","DOI":"10.1080\/02691728708578445"},{"key":"S135132491900010X_ref58","doi-asserted-by":"publisher","DOI":"10.1177\/0963947016631859"},{"key":"S135132491900010X_ref65","doi-asserted-by":"publisher","DOI":"10.1162\/089120100750105920"},{"key":"S135132491900010X_ref15","first-page":"194","article-title":"Mining stylistic features of rhythm and tempo base on text clustering","volume":"18","author":"He","year":"2014","journal-title":"Journal of Chinese Information Processing"},{"key":"S135132491900010X_ref81","volume-title":"Lectures on Grammar","author":"Zhu","year":"1982"},{"key":"S135132491900010X_ref37","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20961"},{"key":"S135132491900010X_ref80","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20316"},{"key":"S135132491900010X_ref73","first-page":"61","article-title":"Method research of author identification based on semantic analysis","volume":"20","author":"Wu","year":"2006","journal-title":"Journal Chinese Information"},{"key":"S135132491900010X_ref56","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067843"},{"key":"S135132491900010X_ref13","first-page":"15","volume-title":"Contributions to the Science of Text and Language","author":"Grzybek","year":"2007"},{"key":"S135132491900010X_ref14","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-28084-7_5"},{"key":"S135132491900010X_ref26","unstructured":"Hu, S. (1921). ."},{"key":"S135132491900010X_ref49","doi-asserted-by":"publisher","DOI":"10.3115\/1599081.1599146"},{"key":"S135132491900010X_ref66","first-page":"115","volume-title":"Introduction to Data Mining","author":"Tan","year":"2006"},{"key":"S135132491900010X_ref39","first-page":"145","volume-title":"Favete Linguis. Studies in Honour of Victor Krupa","author":"K\u00f6hler","year":"2006"},{"key":"S135132491900010X_ref63","unstructured":"Stamatatos, E. (2007). Author identification using imbalanced and limited training texts. In Proceedings of the 18th International conference on Database and Expert Syterms Applications, Regensburg, Germany: IEEE Computer society. pp. 237\u2013241."},{"key":"S135132491900010X_ref71","volume-title":"Memorial Li Fanggui\u2019s 100th Anniversary International Symposium on Chinese History","author":"Wei","year":"2002"},{"key":"S135132491900010X_ref17","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2016.7844873"},{"key":"S135132491900010X_ref19","first-page":"119","article-title":"From the use of three functional words \u201c\u201d examining author\u2019s unique writing style\u2013and on dream of red chamber author issues","volume":"120","author":"Ho","year":"2015","journal-title":"BIBLID"},{"key":"S135132491900010X_ref57","unstructured":"R Core Team. (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at https:\/\/www.R-project.org."},{"key":"S135132491900010X_ref1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1361684.1361685","article-title":"Writeprints: a stylometric approach to identity-level identification and similarity detection","volume":"26","author":"Abbasi","year":"2008","journal-title":"ACM Transactions on Information Systems"},{"key":"S135132491900010X_ref5","first-page":"247","article-title":"A computerized stylostatistical approach to the disputed authorship problem of the dream of the red chamber","volume":"16","author":"Chan","year":"1986","journal-title":"Tamkang Review: A Quarterly of Comparative Studies between Chinese and Foreign Literatures"},{"key":"S135132491900010X_ref6","volume-title":"A Grammar of Spoken Chinese","author":"Chao","year":"1968"},{"key":"S135132491900010X_ref8","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/9.4.281"},{"key":"S135132491900010X_ref41","doi-asserted-by":"publisher","DOI":"10.1515\/9783110272925"},{"key":"S135132491900010X_ref9","first-page":"167","volume-title":"Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation","author":"Chen","year":"1996"},{"key":"S135132491900010X_ref10","first-page":"137","volume-title":"Proceedings of the Seventh International Conference on Information and Knowledge Management","author":"Dumais","year":"1998"},{"key":"S135132491900010X_ref11","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fql048"},{"key":"S135132491900010X_ref12","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqm020"},{"key":"S135132491900010X_ref16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-88388-0"},{"key":"S135132491900010X_ref18","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqm023"},{"key":"S135132491900010X_ref20","doi-asserted-by":"publisher","DOI":"10.1007\/BF01830689"},{"key":"S135132491900010X_ref4","first-page":"231","volume-title":"Sprache, text, kunst. Quantitative analysen","author":"Boroda","year":"1982"},{"key":"S135132491900010X_ref21","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/13.3.111"},{"key":"S135132491900010X_ref22","doi-asserted-by":"publisher","DOI":"10.1080\/09332480.2003.10554842"},{"key":"S135132491900010X_ref23","doi-asserted-by":"publisher","DOI":"10.1515\/cllt-2016-0062"},{"key":"S135132491900010X_ref69","doi-asserted-by":"publisher","DOI":"10.1515\/cllt-2013-0020"},{"key":"S135132491900010X_ref52","doi-asserted-by":"publisher","DOI":"10.1126\/science.ns-9.214S.237"},{"key":"S135132491900010X_ref27","doi-asserted-by":"publisher","DOI":"10.1142\/S1793536914500125"},{"key":"S135132491900010X_ref28","volume-title":"Handbook of Linguistic Annotation","author":"Huang","year":"2017"},{"key":"S135132491900010X_ref29","first-page":"290","volume-title":"The Oxford Handbook of Chinese Linguistics","author":"Huang","year":"2015"},{"key":"S135132491900010X_ref30","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139028462"},{"key":"S135132491900010X_ref31","first-page":"225","article-title":"Author identification based on n - gram pattern of auxiliary word","volume":"23","author":"Jin","year":"2002","journal-title":"Measurement of Language"},{"key":"S135132491900010X_ref68","volume-title":"Fictional realism in Twentieth-Century China","author":"Wang","year":"1992"},{"key":"S135132491900010X_ref32","doi-asserted-by":"publisher","DOI":"10.1109\/ICoSP.2012.6492012"},{"key":"S135132491900010X_ref34","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqq001"},{"key":"S135132491900010X_ref75","unstructured":"Yang, M. Zhu, D. , Tang, Y. and Wang, J. (2017). Authorship Attribution with Topic Drift Model. Available at https:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI17\/paper\/view\/14152."},{"key":"S135132491900010X_ref38","first-page":"1261","article-title":"Measuring differentiability: Unmasking pseudonymous authors","volume":"8","author":"Koppel","year":"2007","journal-title":"Journal of Machine Learning Research"},{"key":"S135132491900010X_ref40","doi-asserted-by":"publisher","DOI":"10.1515\/glot-2008-0018"},{"key":"S135132491900010X_ref42","doi-asserted-by":"publisher","DOI":"10.1515\/9783110362879-007"},{"key":"S135132491900010X_ref44","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324911000313"},{"key":"S135132491900010X_ref46","doi-asserted-by":"publisher","DOI":"10.1145\/1121949.1121951"},{"key":"S135132491900010X_ref48","first-page":"1","article-title":"The features of Chinese sentences","volume":"1","author":"Lu","year":"1993","journal-title":"Chinese Language Learning"},{"key":"S135132491900010X_ref50","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqq013"},{"key":"S135132491900010X_ref51","first-page":"300","volume-title":"Proceedings of the European Conference on Information Retrieval","author":"Marton","year":"2005"},{"key":"S135132491900010X_ref53","volume-title":"Inference and Disputed Authorship: The Federalist","author":"Mosteller","year":"1964"},{"key":"S135132491900010X_ref54","doi-asserted-by":"publisher","DOI":"10.1145\/3132039"},{"key":"S135132491900010X_ref45","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324912000241"},{"key":"S135132491900010X_ref55","unstructured":"Neergaard, K.D. and Huang, C.-R. (2019). Constructing the Mandarin phonological network: novel syllable inventory used to identify schematic segmentation. To Appear in Complexity (special issue), Cognitive Network Science: A New Frontier."},{"key":"S135132491900010X_ref59","first-page":"482","volume-title":"Proceedings of the International Conference on Empirical Methods in Natural Language Engineering","author":"Sanderson","year":"2006"},{"key":"S135132491900010X_ref60","doi-asserted-by":"publisher","DOI":"10.1080\/09296174.2012.659003"},{"key":"S135132491900010X_ref61","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqt047"},{"key":"S135132491900010X_ref64","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21001"},{"key":"S135132491900010X_ref82","doi-asserted-by":"publisher","DOI":"10.4159\/harvard.9780674434929"},{"key":"S135132491900010X_ref67","first-page":"P735","article-title":"The influence of phonological similarity neighborhoods on speech production","volume":"28","author":"Vitevitch","year":"2002","journal-title":"Journal of Experimental Psychology: Learning, Memory, and Cognition"},{"key":"S135132491900010X_ref70","first-page":"4","article-title":"Research on authorship identification based on sentence rhythm feature","volume":"37","author":"Wang","year":"2011","journal-title":"Computer Engineering"},{"key":"S135132491900010X_ref72","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/62.1.207"},{"key":"S135132491900010X_ref74","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009982220290"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S135132491900010X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,4,21]],"date-time":"2020-04-21T03:58:18Z","timestamp":1587441498000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S135132491900010X\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,10]]},"references-count":82,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1]]}},"alternative-id":["S135132491900010X"],"URL":"https:\/\/doi.org\/10.1017\/s135132491900010x","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,10]]}}}