{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T21:03:56Z","timestamp":1743627836980,"version":"3.37.3"},"reference-count":63,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T00:00:00Z","timestamp":1686268800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"Universities in China","award":["22JJD740018"],"award-info":[{"award-number":["22JJD740018"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,8,31]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This study proposes a linguistic classification method based on quantitative typology, which leverages a large-scale multilingual parallel corpus to obtain valid language classification result by excluding the influence of covariates such as text genre and semantic content in cross-language comparison. To achieve this, we model the type\u2013token relationships of each Slavic parallel text and calculate the lexical diversity to approximate the morphological complexity of the language. We perform automatic clustering of languages based on these lexical diversity metrics. Our findings show that (1) the lexical diversity metrics can well reflect that the language is located somewhere on the continuum of \u2018analytism-synthetism\u2019; (2) the automatic clustering based on these metrics effectively reflects the genealogical classification of Slavic languages; and (3) the geographical distribution of lexical diversity in the region where Slavic languages are spoken shows a monotonic increasing trend from southwest to northeast, which is consistent with the pattern found by previous authors on a global scale. The methodological approach taken in this study is data-driven, with the benefit of being independent of theoretical assumptions and easy for computer processing. This approach can offer a better insight into corpus-based typology and may shed light on the understanding of language as a human-driven complex adaptive system.<\/jats:p>","DOI":"10.1093\/llc\/fqad042","type":"journal-article","created":{"date-parts":[[2023,6,10]],"date-time":"2023-06-10T01:37:40Z","timestamp":1686361060000},"page":"1359-1371","source":"Crossref","is-referenced-by-count":2,"title":["Lexical diversity as a lens into the classification of Slavic languages: A quantitative typology perspective"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6574-1454","authenticated-orcid":false,"given":"Chenliang","family":"Zhou","sequence":"first","affiliation":[{"name":"Department of Linguistics, Zhejiang University , Hangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haitao","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Zhejiang University , Hangzhou, China"},{"name":"Institute of Quantitative Linguistics, Beijing Language and Culture University , Beijing, China"},{"name":"Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies , Guangdong, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,6,9]]},"reference":[{"issue":"4","key":"2023083111394779800_fqad042-B1","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1080\/09296174.2011.608602","article-title":"Automatic language classification by means of syntactic dependency networks","volume":"18","author":"Abramov","year":"2011","journal-title":"Journal of Quantitative Linguistics"},{"volume-title":"Allgemeine Sprachtypologie: Prinzipien und Me\u00dfverfahren","year":"1973","author":"Altmann","key":"2023083111394779800_fqad042-B2"},{"key":"2023083111394779800_fqad042-B3","doi-asserted-by":"publisher","first-page":"3070","DOI":"10.1515\/9783110379082-035","volume-title":"Word-Formation: An International Handbook of the Languages of Europe. Handbooks of Linguistics and Communication Science [HSK]","author":"Arizankovska","year":"2016"},{"key":"2023083111394779800_fqad042-B4","doi-asserted-by":"publisher","first-page":"3049","DOI":"10.1515\/9783110379082-034","volume-title":"Word-Formation: An International Handbook of the Languages of Europe. Handbooks of Linguistics and Communication Science [HSK]","author":"Avramova","year":"2016"},{"key":"2023083111394779800_fqad042-B5","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-010-0844-0","volume-title":"Word Frequency Distributions","author":"Baayen","year":"2001"},{"key":"2023083111394779800_fqad042-B6","doi-asserted-by":"publisher","DOI":"10.1515\/9783110560107","volume-title":"Adaptive Languages: An Information-Theoretic Account of Linguistic Diversity","author":"Bentz","year":"2018"},{"issue":"6","key":"2023083111394779800_fqad042-B7","doi-asserted-by":"publisher","first-page":"275","DOI":"10.3390\/e19060275","article-title":"The entropy of words\u2014learnability and expressivity across more than 1000 languages","volume":"19","author":"Bentz","year":"2017","journal-title":"Entropy"},{"key":"2023083111394779800_fqad042-B8","first-page":"142","volume-title":"Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)","author":"Bentz","year":"2016"},{"issue":"6","key":"2023083111394779800_fqad042-B9","doi-asserted-by":"publisher","first-page":"e0128254","DOI":"10.1371\/journal.pone.0128254","article-title":"Adaptive communication: Languages with more non-native speakers tend to have fewer word forms Aronoff, M. (ed","volume":"10","author":"Bentz","year":"2015","journal-title":"PLoS ONE"},{"year":"2017","author":"Bickel","key":"2023083111394779800_fqad042-B10"},{"key":"2023083111394779800_fqad042-B11","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511486906","volume-title":"Language Classification: History and Method","author":"Campbell","year":"2008"},{"first-page":"54","year":"2017","author":"Chen","key":"2023083111394779800_fqad042-B12"},{"issue":"4","key":"2023083111394779800_fqad042-B13","doi-asserted-by":"publisher","first-page":"598","DOI":"10.1016\/j.plrev.2014.04.004","article-title":"Approaching human language with complex networks","volume":"11","author":"Cong","year":"2014","journal-title":"Physics of Life Reviews"},{"key":"2023083111394779800_fqad042-B14","doi-asserted-by":"publisher","DOI":"10.4324\/9781136861376","volume-title":"The Slavonic Languages","author":"Corbett","year":"1993","edition":"1st edn"},{"issue":"2","key":"2023083111394779800_fqad042-B15","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1524\/stuf.2007.60.2.95","article-title":"Parallel texts: using translational equivalents in linguistic typology","volume":"60","author":"Cysouw","year":"2007","journal-title":"Language Typology and Universals"},{"issue":"5","key":"2023083111394779800_fqad042-B16","doi-asserted-by":"publisher","first-page":"iii","DOI":"10.2307\/1006517","article-title":"An indoeuropean classification: a lexicostatistical experiment","volume":"82","author":"Dyen","year":"1992","journal-title":"Transactions of the American Philosophical Society"},{"year":"2022","author":"Eberhard","key":"2023083111394779800_fqad042-B17"},{"issue":"3","key":"2023083111394779800_fqad042-B18","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1086\/464575","article-title":"A quantitative approach to the morphological typology of language","volume":"26","author":"Greenberg","year":"1960","journal-title":"International Journal of American Linguistics"},{"key":"2023083111394779800_fqad042-B19","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1002\/9781444318159.ch28","volume-title":"The Handbook of Language Contact.","author":"Grenoble","year":"2010"},{"key":"2023083111394779800_fqad042-B20","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1075\/slcs.94.13gro","volume-title":"Language Complexity: Typology, Contact, Change","author":"de Groot","year":"2008"},{"year":"2022","author":"Hammarstr\u00f6m","key":"2023083111394779800_fqad042-B21"},{"key":"2023083111394779800_fqad042-B22","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1075\/silv.19.01has","volume-title":"Language Variation\u2014European Perspectives VI. Studies in Language Variation","author":"Haspelmath","year":"2017"},{"key":"2023083111394779800_fqad042-B23","doi-asserted-by":"publisher","first-page":"5555","DOI":"10.18653\/v1\/2021.emnlp-main.451","author":"He","year":"2021"},{"issue":"4","key":"2023083111394779800_fqad042-B24","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1007\/BF01587632","article-title":"A new derivation and interpretation of Yule\u2019s \u2018characteristic\u2019 K","volume":"6","author":"Herdan","year":"1955","journal-title":"Journal of Applied Mathematics and Physics (ZAMP)"},{"issue":"1","key":"2023083111394779800_fqad042-B25","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/BF01596857","article-title":"An inequality relation between Yule\u2019s characteristic K and Shannon\u2019s entropy H","volume":"9","author":"Herdan","year":"1958","journal-title":"Journal of Applied Mathematics and Physics (ZAMP)"},{"issue":"1\u20132","key":"2023083111394779800_fqad042-B26","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1093\/biomet\/45.1-2.268","article-title":"The mathematical relation between Greenberg\u2019s index of linguistic diversity and Yule\u2019s characteristic","volume":"45","author":"Herdan","year":"1958","journal-title":"Biometrika"},{"key":"2023083111394779800_fqad042-B27","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-88388-0","volume-title":"The Advanced Theory of Language as Choice and Chance","author":"Herdan","year":"1966","edition":"1st edn"},{"issue":"1","key":"2023083111394779800_fqad042-B28","doi-asserted-by":"publisher","first-page":"19","DOI":"10.21248\/jlcl.20.2005.68","article-title":"A brief survey of text mining","volume":"20","author":"Hotho","year":"2005","journal-title":"Journal for Language Technology and Computational Linguistics"},{"issue":"1","key":"2023083111394779800_fqad042-B29","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1080\/00806767908600762","article-title":"Some notes on the development of the active present participle in Bulgarian","volume":"25","author":"Hult","year":"1979","journal-title":"Scando-Slavica"},{"issue":"1\u20134","key":"2023083111394779800_fqad042-B30","first-page":"21","article-title":"Genetic classification: retrospect and prospect","volume":"35","author":"Hymes","year":"1993","journal-title":"Anthropological Linguistics"},{"key":"2023083111394779800_fqad042-B31","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1515\/9783110214475.1.3.129","volume-title":"Die slavischen Sprachen\/The Slavic Languages: Halbband 1. Handbooks of Linguistics and Communication Science [HSK]","author":"Jadranka","year":"2009"},{"issue":"1","key":"2023083111394779800_fqad042-B32","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1111\/j.1467-9922.2012.00739.x","article-title":"Capturing the diversity in lexical diversity","volume":"63","author":"Jarvis","year":"2013","journal-title":"Language Learning"},{"key":"2023083111394779800_fqad042-B33","first-page":"141","volume-title":"New Methods in Language Processing and Computational Natural Language Learning","author":"Juola","year":"1998"},{"volume-title":"NLP, Corpus Linguistics, Corpus Based Grammar Research","year":"2009","author":"Kelih","key":"2023083111394779800_fqad042-B34"},{"key":"2023083111394779800_fqad042-B35","first-page":"11","article-title":"The type-token relationship in Slavic parallel texts","volume":"20","author":"Kelih","year":"2010","journal-title":"Glottometrics"},{"issue":"3","key":"2023083111394779800_fqad042-B36","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1080\/09296174.2014.911506","article-title":"Can type\u2013token ratio be used to show morphological complexity of languages?","volume":"21","author":"Kettunen","year":"2014","journal-title":"Journal of Quantitative Linguistics"},{"issue":"2","key":"2023083111394779800_fqad042-B37","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1080\/09296174.2016.1142327","article-title":"A data-based classification of Slavic languages: Indices of qualitative variation applied to grapheme frequencies","volume":"23","author":"Ko\u0161\u010dov\u00e1","year":"2016","journal-title":"Journal of Quantitative Linguistics"},{"key":"2023083111394779800_fqad042-B38","doi-asserted-by":"publisher","DOI":"10.3726\/b12665","volume-title":"Analytic Modality in Macedonian","author":"Kramer","year":"1986"},{"key":"2023083111394779800_fqad042-B39","doi-asserted-by":"publisher","DOI":"10.1163\/2589-6229_ESLO_COM_032367","article-title":"Genetics and Slavic languages","volume-title":"Encyclopedia of Slavic Languages and Linguistics Online.","author":"Kushniarevich","year":"2020"},{"issue":"9","key":"2023083111394779800_fqad042-B40","doi-asserted-by":"publisher","first-page":"e0135820","DOI":"10.1371\/journal.pone.0135820","article-title":"Genetic heritage of the Balto-Slavic speaking populations: a synthesis of autosomal, mitochondrial and Y-chromosomal data","volume":"10","author":"Kushniarevich","year":"2015","journal-title":"PLoS ONE"},{"key":"2023083111394779800_fqad042-B41","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1515\/lingty-2020-0118","article-title":"Corpus-based typology: applications, challenges and some solutions","volume":"26","author":"Levshina","year":"2021","journal-title":"Linguistic Typology"},{"issue":"4","key":"2023083111394779800_fqad042-B42","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1162\/coli.2009.35.4.35403","article-title":"Punctuation as implicit annotations for Chinese word segmentation","volume":"35","author":"Li","year":"2009","journal-title":"Computational Linguistics"},{"issue":"6","key":"2023083111394779800_fqad042-B43","doi-asserted-by":"publisher","first-page":"1567","DOI":"10.1016\/j.lingua.2009.10.001","article-title":"Dependency direction as a means of word-order typology: a method based on dependency treebanks","volume":"120","author":"Liu","year":"2010","journal-title":"Lingua"},{"key":"2023083111394779800_fqad042-B44","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/j.plrev.2018.06.006","article-title":"Language as a human-driven complex adaptive system","volume":"26\u201327","author":"Liu","year":"2018","journal-title":"Physics of Life Reviews"},{"issue":"10","key":"2023083111394779800_fqad042-B45","doi-asserted-by":"publisher","first-page":"1139","DOI":"10.1007\/s11434-013-5711-8","article-title":"Language clustering with word co-occurrence networks based on parallel texts","volume":"58","author":"Liu","year":"2013","journal-title":"Chinese Science Bulletin"},{"issue":"30","key":"2023083111394779800_fqad042-B46","doi-asserted-by":"publisher","first-page":"3458","DOI":"10.1007\/s11434-010-4114-3","article-title":"Language clusters based on linguistic complex networks","volume":"55","author":"Liu","year":"2010","journal-title":"Chinese Science Bulletin"},{"issue":"4)","key":"2023083111394779800_fqad042-B47","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1515\/psicl-2012-002","article-title":"Quantitative typological analysis of romance languages","volume":"48","author":"Liu","year":"2012","journal-title":"Pozna\u0144 Studies in Contemporary Linguistics"},{"issue":"12","key":"2023083111394779800_fqad042-B48","doi-asserted-by":"publisher","first-page":"e12358","DOI":"10.1111\/lnc3.12358","article-title":"Computational phylogenetics and the classification of South American Languages","volume":"13","author":"Michael","year":"2019","journal-title":"Language and Linguistics Compass"},{"issue":"4","key":"2023083111394779800_fqad042-B49","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1007\/s10579-005-8622-8","article-title":"Yule\u2019s characteristic K revisited","volume":"39","author":"Miranda-Garc\u00eda","year":"2005","journal-title":"Language Resources and Evaluation"},{"key":"2023083111394779800_fqad042-B50","first-page":"13","volume-title":"New Perspectives on the Peopling of the Americas","author":"Nichols","year":"2018"},{"issue":"3","key":"2023083111394779800_fqad042-B51","doi-asserted-by":"publisher","first-page":"643","DOI":"10.3406\/rbph.2010.7798","article-title":"Bulgarian","volume":"88","author":"Osenova","year":"2010","journal-title":"Revue Belge de Philologie et d\u2019Histoire"},{"issue":"4","key":"2023083111394779800_fqad042-B52","doi-asserted-by":"publisher","first-page":"894","DOI":"10.1093\/llc\/fqy014","article-title":"A novel approach towards deriving vocabulary quotient","volume":"33","author":"Rajput","year":"2018","journal-title":"Digital Scholarship in the Humanities"},{"key":"2023083111394779800_fqad042-B53","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1515\/9783111418797-003","volume-title":"Diachronic, Areal, and Typological Linguistics. Current Trends in Linguistics.","author":"Robins","year":"1973"},{"key":"2023083111394779800_fqad042-B54","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/978-1-4020-4068-9_11","volume-title":"Contributions to the Science of Text and Language: Word Length Studies and Related Issues. Text, Speech and Language Technology.","author":"Rottmann","year":"2007"},{"issue":"3","key":"2023083111394779800_fqad042-B55","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1093\/llc\/fqu006","article-title":"Multivariate modeling of the collaboration between Luigi Illica and Giuseppe giacosa for the librettos of three operas by Giacomo Puccini","volume":"30","author":"Saccenti","year":"2015","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"2023083111394779800_fqad042-B56","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1080\/09296170801961843","article-title":"Complexity of European Union languages: a comparative approach","volume":"15","author":"Sadeniemi","year":"2008","journal-title":"Journal of Quantitative Linguistics"},{"issue":"3","key":"2023083111394779800_fqad042-B57","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell System Technical Journal"},{"issue":"1","key":"2023083111394779800_fqad042-B58","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1002\/j.1538-7305.1951.tb01366.x","article-title":"Prediction and entropy of printed English","volume":"30","author":"Shannon","year":"1951","journal-title":"Bell System Technical Journal"},{"issue":"2","key":"2023083111394779800_fqad042-B59","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1524\/stuf.2007.60.2.100","article-title":"Harry potter meets le petit prince\u2014on the usefulness of parallel corpora in crosslinguistic investigations","volume":"60","author":"Stolz","year":"2007","journal-title":"Language Typology and Universals"},{"issue":"3","key":"2023083111394779800_fqad042-B60","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1162\/COLI_a_00228","article-title":"Computational constancy measures of texts\u2014Yule\u2019s K and R\u00e9nyi\u2019s entropy","volume":"41","author":"Tanaka-Ishii","year":"2015","journal-title":"Computational Linguistics"},{"issue":"2","key":"2023083111394779800_fqad042-B61","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1080\/09296174.2018.1560122","article-title":"Using rank-frequency and type\u2013token statistics to compare morphological typology in the Celtic languages","volume":"27","author":"Wilson","year":"2020","journal-title":"Journal of Quantitative Linguistics"},{"key":"2023083111394779800_fqad042-B62","doi-asserted-by":"publisher","first-page":"31","DOI":"10.31857\/0373-658X.2021.4.131-159","article-title":"Morphology and word order in Slavic languages: insights from annotated corpora","volume":"4","author":"Yan","year":"2021","journal-title":"Voprosy Jazykoznanija"},{"volume-title":"The Statistical Study of Literary Vocabulary","year":"1944","author":"Yule","key":"2023083111394779800_fqad042-B63"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/38\/3\/1359\/51309621\/fqad042.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/38\/3\/1359\/51309621\/fqad042.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,31]],"date-time":"2023-08-31T11:44:40Z","timestamp":1693482280000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/38\/3\/1359\/7193351"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,9]]},"references-count":63,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,6,9]]},"published-print":{"date-parts":[[2023,8,31]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqad042","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"type":"print","value":"2055-7671"},{"type":"electronic","value":"2055-768X"}],"subject":[],"published-other":{"date-parts":[[2023,9,1]]},"published":{"date-parts":[[2023,6,9]]}}}