{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T16:42:18Z","timestamp":1757608938750,"version":"3.44.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T00:00:00Z","timestamp":1747440000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Beyond Security: Role of Conflict in Resilience-Building","award":["CZ.02.01.01\/00\/22_008\/0004595"],"award-info":[{"award-number":["CZ.02.01.01\/00\/22_008\/0004595"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This article introduces a novel method for refining approaches to distant reading by proposing a procedure that categorizes prominent units (keywords) into two types: those pertinent to the topic and those associated with the genre\/register of a text. This differentiation holds significant potential for more accurate modeling of topics and further applications in various domains of digital humanities. For example, register-related keywords may assist in various stylometric tasks, such as functional understanding of text groups (formed through clustering) that exhibit common stylistic traits, thus differentiating themselves from other groups or clusters within the discourse under scrutiny. In the initial step of the procedure, a set of texts with similar genre and register characteristics is identified; we term them \u2018sibling texts\u2019, and their similarity is determined using a model derived from multidimensional analysis. The next step involves conducting multiple parallel keyword analyses on these sibling texts and comparing them for keyword overlaps which mark the register relatedness. The test study on a corpus of Czech parliamentary speeches (Parlcorp) underscores the method\u2019s ability to distinguish register-related and topic-related keywords, even in highly homogeneous corpora. The register-related keywords proved to be analytically very useful in substantiating the interpretation of parliamentary subregisters and advancing the understanding of the types of activities (administrative and procedural, deliberating and debating, and government policies) involved in parliamentary discourse.<\/jats:p>","DOI":"10.1093\/llc\/fqaf037","type":"journal-article","created":{"date-parts":[[2025,5,17]],"date-time":"2025-05-17T18:06:21Z","timestamp":1747505181000},"page":"762-778","source":"Crossref","is-referenced-by-count":0,"title":["Sibling-texts keyword analysis: exploring topic and register keywords"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3977-2393","authenticated-orcid":false,"given":"V\u00e1clav","family":"Cvr\u010dek","sequence":"first","affiliation":[{"name":"Charles University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martina","family":"Berrocal","sequence":"additional","affiliation":[{"name":"Friedrich Schiller University Jena"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,5,16]]},"reference":[{"key":"2025090303530034700_fqaf037-B1","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1177\/0075424204269894","article-title":"Querying Keywords: Questions of Difference, Frequency, and Sense in Keywords Analysis","volume":"32","author":"Baker","year":"2004","journal-title":"Journal of English Linguistics"},{"volume-title":"Key Terms in Discourse Analysis","year":"2011","author":"Baker","key":"2025090303530034700_fqaf037-B2"},{"key":"2025090303530034700_fqaf037-B3","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511920103","volume-title":"Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press","author":"Baker","year":"2013"},{"key":"2025090303530034700_fqaf037-B4","doi-asserted-by":"crossref","DOI":"10.1515\/9780748626908","volume-title":"A Glossary of Corpus Linguistics","author":"Baker","year":"2006"},{"key":"2025090303530034700_fqaf037-B5","doi-asserted-by":"crossref","DOI":"10.4324\/9781315724812","volume-title":"Triangulating Methodological Approaches in Corpus Linguistic Research","author":"Baker","year":"2016"},{"key":"2025090303530034700_fqaf037-B6","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1075\/scl.60.02ber","volume-title":"Multi-Dimensional Analysis, 25 Years On","author":"Berber Sardinha","year":"2014"},{"year":"2021","author":"Berrocal","key":"2025090303530034700_fqaf037-B7"},{"key":"2025090303530034700_fqaf037-B8","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511621024","volume-title":"Variation Across Speech and Writing","author":"Biber","year":"1988"},{"key":"2025090303530034700_fqaf037-B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511519871","volume-title":"Dimensions of Register Variation: A Cross-Linguistic Comparison","author":"Biber","year":"1995"},{"key":"2025090303530034700_fqaf037-B10","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1075\/rs.18007.bib","article-title":"Text-Linguistic Approaches to Register Variation","volume":"1","author":"Biber","year":"2019","journal-title":"Register Studies"},{"key":"2025090303530034700_fqaf037-B11","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511814358","volume-title":"Register, Genre, and Style","author":"Biber","year":"2009"},{"key":"2025090303530034700_fqaf037-B12","first-page":"993","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"2025090303530034700_fqaf037-B13","first-page":"10","volume-title":"Begriffsgeschichte und Diskursgeschichte","author":"Busse","year":"1994"},{"key":"2025090303530034700_fqaf037-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v061.i06","article-title":"NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set","volume":"61","author":"Charrad","year":"2014","journal-title":"Journal of Statistical Software"},{"key":"2025090303530034700_fqaf037-B15","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1075\/rs.20024.cla","article-title":"Multiple Correspondence Analysis, Newspaper Discourse and Subregister: A Case Study of Discourses of Islam in the British Press","volume":"3","author":"Clarke","year":"2021","journal-title":"Register Studies"},{"key":"2025090303530034700_fqaf037-B16","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1017\/CBO9781139764377","volume-title":"The Cambridge Handbook of English Corpus Linguistics","author":"Conrad","year":"2015"},{"key":"2025090303530034700_fqaf037-B17","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1017\/CBO9781139764377.006","volume-title":"The Cambridge Handbook of English Corpus Linguistics","author":"Culpeper","year":"2015"},{"key":"2025090303530034700_fqaf037-B18","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1515\/cllt-2018-0020","article-title":"From Extra- to Intratextual Characteristics: Charting the Space of Variation in Czech through MDA","volume":"17","author":"Cvr\u010dek","year":"2021","journal-title":"Corpus Linguistics and Linguistic Theory"},{"key":"2025090303530034700_fqaf037-B19","doi-asserted-by":"publisher","first-page":"713","DOI":"10.1007\/s10579-020-09487-4","article-title":"Comparing Web-Crawled and Traditional Corpora","volume":"54","author":"Cvr\u010dek","year":"2020","journal-title":"Language Resources and Evaluation"},{"key":"2025090303530034700_fqaf037-B20","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1075\/ijcl.19020.cvr","article-title":"Author and Register as Sources of Variation: A Corpus-Based Study Using Elicited Texts\u2019,","volume":"25","author":"Cvr\u010dek","year":"2020","journal-title":"International Journal of Corpus Linguistics"},{"volume-title":"Registry v \u010ce\u0161tin\u011b","year":"2020","author":"Cvr\u010dek","key":"2025090303530034700_fqaf037-B21"},{"key":"2025090303530034700_fqaf037-B22","doi-asserted-by":"crossref","first-page":"77","DOI":"10.3366\/cor.2019.0162","article-title":"Incorporating Text Dispersion into Keyword Analyses","volume":"14","author":"Egbert","year":"2019","journal-title":"Corpora"},{"key":"2025090303530034700_fqaf037-B23","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1353\/jsl.2015.0018","article-title":"A Data-Driven Analysis of Reader Viewpoints: Reconstructing the Historical Reader Using Keyword Analysis","volume":"23","author":"Fidler","year":"2015","journal-title":"Journal of Slavic Linguistics"},{"key":"2025090303530034700_fqaf037-B24","doi-asserted-by":"publisher","first-page":"57","DOI":"10.3366\/cor.2014.0051","article-title":"Using Lexical Variables to Identify Language Ideologies in a Policy Corpus","volume":"9","author":"Fitzsimmons-Doolan","year":"2014","journal-title":"Corpora"},{"key":"2025090303530034700_fqaf037-B25","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1075\/rs.18001.gel","article-title":"The Reference Corpus Matters: Comparing the Effect of Different Reference Corpora on Keyword Analysis","volume":"1","author":"Geluso","year":"2019","journal-title":"Register Studies"},{"key":"2025090303530034700_fqaf037-B26","first-page":"100","article-title":"Algorithm AS 136: A K-Means Clustering Algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"Journal of the Royal Statistical Society"},{"key":"2025090303530034700_fqaf037-B27","first-page":"309","volume-title":"The Routledge Handbook of Language and Politics","author":"Ilie","year":"2018"},{"key":"2025090303530034700_fqaf037-B28","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1515\/9783110763560-009","volume-title":"Quantitative Approaches to Universality and Individuality in Language","author":"Mili\u010dka","year":"2022"},{"volume-title":"Distant Reading","year":"2013","author":"Moretti","key":"2025090303530034700_fqaf037-B29"},{"key":"2025090303530034700_fqaf037-B30","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1075\/rs.19017.poj","article-title":"The Influence of the Benchmark Corpus on Keyword Analysis","volume":"3","author":"Pojanapunya","year":"2021","journal-title":"Register Studies"},{"key":"2025090303530034700_fqaf037-B31","first-page":"557","volume-title":"Exact Methods in the Study of Language and Text","author":"Popescu","year":"2007"},{"year":"1987","author":"Salton","key":"2025090303530034700_fqaf037-B32"},{"key":"2025090303530034700_fqaf037-B33","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1075\/scl.41.04sco","volume-title":"Keyness in Texts","author":"Scott","year":"2010"},{"key":"2025090303530034700_fqaf037-B34","doi-asserted-by":"crossref","DOI":"10.1075\/scl.22","volume-title":"Textual Patterns: Keywords and Corpus Analysis in Language Education","author":"Scott","year":"2006"},{"volume-title":"Pragmatics of Discourse. Volume 3 of Handbooks of Pragmatics","year":"2014","author":"Schneider","key":"2025090303530034700_fqaf037-B35"},{"key":"2025090303530034700_fqaf037-B36","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1111\/modl.12465","article-title":"Using Corpus-Based Register Analysis to Explore the Authenticity of High-Stakes Language Exams: A Register Comparison of TOEFL iBT and Disciplinary Writing Tasks","volume":"102","author":"Staples","year":"2018","journal-title":"The Modern Language Journal"},{"volume-title":"Corpus Linguistics: A Guide to the Methodology","year":"2020","author":"Stefanowitsch","key":"2025090303530034700_fqaf037-B37"},{"key":"2025090303530034700_fqaf037-B38","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-29807-3","volume-title":"Advances in K-Means Clustering","author":"Wu","year":"2012"},{"volume-title":"Koditex: A Corpus of Diversified Texts","year":"2018","author":"Zasina","key":"2025090303530034700_fqaf037-B39"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/40\/3\/762\/63217235\/fqaf037.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/dsh\/article-pdf\/40\/3\/762\/63217235\/fqaf037.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T07:53:12Z","timestamp":1756885992000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article\/40\/3\/762\/8134286"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,16]]},"references-count":39,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,5,16]]},"published-print":{"date-parts":[[2025,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/llc\/fqaf037","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"type":"print","value":"2055-7671"},{"type":"electronic","value":"2055-768X"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,5,16]]}}}