{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T19:30:22Z","timestamp":1781379022574,"version":"3.54.1"},"reference-count":69,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T00:00:00Z","timestamp":1684368000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices. We show that CA has some attractive properties as compared to LSA, for instance that effects of margins, that is, sums of row elements and column elements, arising from differing document lengths and term frequencies are effectively eliminated so that the CA solution is optimally suited to focus on relationships among documents and terms. A unifying framework is proposed that includes both CA and LSA as special cases. We empirically compare CA to various LSA-based methods on text categorization in English and authorship attribution on historical Dutch texts and find that CA performs significantly better. We also apply CA to a long-standing question regarding the authorship of the Dutch national anthem <jats:italic>Wilhelmus<\/jats:italic> and provide further support that it can be attributed to the author Datheen, among several contenders.<\/jats:p>","DOI":"10.1017\/s1351324923000244","type":"journal-article","created":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T06:05:16Z","timestamp":1684389916000},"page":"722-752","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":14,"title":["A comparison of latent semantic analysis and correspondence analysis of document-term matrices"],"prefix":"10.1017","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1058-476X","authenticated-orcid":false,"given":"Qianqian","family":"Qi","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David J.","family":"Hessen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tejaswini","family":"Deoskar","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter G. M.","family":"van der Heijden","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"56","published-online":{"date-parts":[[2023,5,18]]},"reference":[{"key":"S1351324923000244_ref16","volume-title":"An Introduction to Statistical Learning: With Applications in R","author":"Gareth","year":"2021"},{"key":"S1351324923000244_ref40","unstructured":"Koppel, M. and Seidman, S. (2013). Automatically identifying pseudepigraphic texts. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, pp. 1449\u20131454."},{"key":"S1351324923000244_ref64","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-021-87971-9"},{"key":"S1351324923000244_ref61","doi-asserted-by":"publisher","DOI":"10.1145\/275519.275529"},{"key":"S1351324923000244_ref42","volume-title":"Advances in Neural Information Processing Systems","volume":"27","author":"Levy","year":"2014"},{"key":"S1351324923000244_ref11","doi-asserted-by":"publisher","DOI":"10.3758\/BF03203370"},{"key":"S1351324923000244_ref37","unstructured":"Kestemont, M. , Stronks, E. , De Bruin, M. and Winkel, T.D (2017a). Did a poet with donkey ears write the oldest anthem in the world? Ideological implications of the computational attribution of the Dutch national anthem to Petrus Dathenus. Digital Humanities 2017, Conference Abstracts, Montreal, Canada."},{"key":"S1351324923000244_ref48","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1028905828"},{"key":"S1351324923000244_ref18","volume-title":"Theory and Applications of Correspondence Analysis","author":"Greenacre","year":"1984"},{"key":"S1351324923000244_ref38","volume-title":"Van wie is het Wilhelmus? De auteur van het Nederlandse volkslied met de computer onderzocht","author":"Kestemont","year":"2017"},{"key":"S1351324923000244_ref51","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-16-2126-0_29"},{"key":"S1351324923000244_ref36","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.06.029"},{"key":"S1351324923000244_ref10","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324919000019"},{"key":"S1351324923000244_ref49","unstructured":"Morin, A. (1999). Knowledge extraction in texts: A comparison of two methods. Retrieved July 17, 2021, from https:\/\/www.stat.fi\/isi99\/proceedings\/arkisto\/varasto\/mori0673.pdf."},{"key":"S1351324923000244_ref25","first-page":"19","article-title":"Theory and example of quantification (II)","volume":"4","author":"Hayashi","year":"1956","journal-title":"Proceedings of the Institute of Statistical Mathematics"},{"key":"S1351324923000244_ref4","doi-asserted-by":"publisher","DOI":"10.1007\/s41870-018-0137-9"},{"key":"S1351324923000244_ref6","doi-asserted-by":"publisher","DOI":"10.1137\/1037127"},{"key":"S1351324923000244_ref26","doi-asserted-by":"publisher","DOI":"10.4993\/acrt1992.1.17"},{"key":"S1351324923000244_ref55","unstructured":"Rennie, J. (2005). 20 newsgroups data set. Retrieved April 21, 2022, from http:\/\/qwone.com\/jason\/20Newsgroups\/."},{"key":"S1351324923000244_ref68","unstructured":"Winkel, T.d (2015). Of Deutsches blood, Master\u2019s Thesis. Utrecht University"},{"key":"S1351324923000244_ref14","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.03.041"},{"key":"S1351324923000244_ref27","doi-asserted-by":"publisher","DOI":"10.2307\/2258931"},{"key":"S1351324923000244_ref20","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1987.10478446"},{"key":"S1351324923000244_ref15","unstructured":"Frobenius, G. (1912). \u00dcber matrizen aus nicht negativen elementen."},{"key":"S1351324923000244_ref2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-73531-3"},{"key":"S1351324923000244_ref29","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000121"},{"key":"S1351324923000244_ref12","doi-asserted-by":"publisher","DOI":"10.1145\/57167.57214"},{"key":"S1351324923000244_ref63","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324919000093"},{"key":"S1351324923000244_ref67","unstructured":"Vargas Quiros, J. (2017). Information-theoretic anomaly detection and authorship attribution in literature, Master\u2019s Thesis. Department of Information and Computing Sciences, Utrecht University"},{"key":"S1351324923000244_ref46","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/10.3.171"},{"key":"S1351324923000244_ref8","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"S1351324923000244_ref33","unstructured":"Jurafsky, D. and Martin, J. H. (2021). Speech and language processing, (3rd ed. draft), chapter 6. Retrieved October 20, 2022, from https:\/\/web.stanford.edu\/jurafsky\/slp3\/."},{"key":"S1351324923000244_ref50","unstructured":"Nakov, P. , Popova, A. and Mateev, P. (2001). Weight functions impact on LSA performance. EuroConference Recent Advances in Natural Language Processing, Bulgaria: Tzigov Chark, pp. 187\u2013193."},{"key":"S1351324923000244_ref22","doi-asserted-by":"publisher","DOI":"10.1109\/ICAIS50930.2021.9395976"},{"key":"S1351324923000244_ref5","unstructured":"Benz\u00e9cri, J.-P. (1973). L\u2019analyse des donn\u00e9es, 1 and 2, Dunod, Paris."},{"key":"S1351324923000244_ref17","volume-title":"Nonlinear Multivariate Analysis","author":"Gifi","year":"1990"},{"key":"S1351324923000244_ref7","doi-asserted-by":"publisher","DOI":"10.1109\/ISCIS.2007.4456854"},{"key":"S1351324923000244_ref56","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S1351324923000244_ref69","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2010.08.066"},{"key":"S1351324923000244_ref35","unstructured":"Kestemont, M. (2017). Who wrote the Wilhelmus? Retrieved July 17, 2021, from https:\/\/github.com\/mikekestemont\/anthem."},{"key":"S1351324923000244_ref13","doi-asserted-by":"publisher","DOI":"10.1109\/eStream.2019.8732167"},{"key":"S1351324923000244_ref60","volume-title":"Text Mining with R: A Tidy Approach","author":"Silge","year":"2017"},{"key":"S1351324923000244_ref1","first-page":"357","article-title":"Term weighting schemes experiment based on SVD for Malay text retrieval","volume":"8","author":"Ab Samat","year":"2008","journal-title":"International Journal of Computer Science and Network Security (IJCSNS)"},{"key":"S1351324923000244_ref34","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2022.0130209"},{"key":"S1351324923000244_ref23","unstructured":"Guthrie, D. (2008). Unsupervised Detection of Anomalous Text, PhD Thesis, Department of Computer Science, University of Sheffield"},{"key":"S1351324923000244_ref3","volume-title":"Taming Text with the SVD","author":"Albright","year":"2004"},{"key":"S1351324923000244_ref19","doi-asserted-by":"publisher","DOI":"10.1201\/9781315369983"},{"key":"S1351324923000244_ref9","doi-asserted-by":"publisher","DOI":"10.1017\/9781108679930"},{"key":"S1351324923000244_ref39","doi-asserted-by":"publisher","DOI":"10.1145\/291128.291131"},{"key":"S1351324923000244_ref58","unstructured":"S\u00e9gu\u00e9la, J. and Saporta, G. (2011). A comparison between latent semantic analysis and correspondence analysis. CARME 2011 International Conference on Correspondence Analysis and Related Methods, Rennes, France."},{"key":"S1351324923000244_ref31","volume-title":"Hierarchical Cluster Analysis: Comparison of Single Linkage, Complete Linkage, Average Linkage and Centroid Linkage Method","author":"Jarman","year":"2020"},{"key":"S1351324923000244_ref21","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143892"},{"key":"S1351324923000244_ref59","first-page":"177","article-title":"A hybrid recommender system to predict online job offer performance","volume":"25","author":"S\u00e9gu\u00e9la","year":"2013","journal-title":"Revue des Nouvelles Technologies de l\u2019Information"},{"key":"S1351324923000244_ref24","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-021-06014-6"},{"key":"S1351324923000244_ref57","unstructured":"Satyam, A. , Dawn, A. K. and Saha, S. K. (2014). A statistical analysis approach to author identification using latent semantic analysis: Notebook for pan at clef 2014. 2014 Working Notes for CLEF Conference, Sheffield, UK."},{"key":"S1351324923000244_ref65","first-page":"249","article-title":"A combined approach to contingency table analysis using correspondence analysis and loglinear analysis","volume":"38","author":"Van der Heijden","year":"1989","journal-title":"Journal of the Royal Statistical Society: Series C (Applied Statistics)"},{"key":"S1351324923000244_ref44","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/19.4.497"},{"key":"S1351324923000244_ref30","unstructured":"Hu, X. Cai, Z. Franceschetti, D. Penumatsa, P. Graesser, A. Louwerse, M. McNamara, D. S. Tutoring Research Group 2003, LSA: First dimension and dimensional weighting, Proceedings of the Annual Meeting of the Cognitive Science Society, Boston, USA, 25."},{"key":"S1351324923000244_ref41","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"S1351324923000244_ref62","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21001"},{"key":"S1351324923000244_ref43","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00134"},{"key":"S1351324923000244_ref53","unstructured":"Phillips, T. , Saleh, A. , Glazewski, K. D. , Hmelosilver, C. E. , Lee, S. , Mott, B. and Lester, J. C. (2021). Comparing natural language processing methods for text classification of small educational data. Companion Proceedings 11th International Conference on Learning Analytics & Knowledge, Irvine, CA, USA."},{"key":"S1351324923000244_ref52","doi-asserted-by":"publisher","DOI":"10.1007\/BF01449896"},{"key":"S1351324923000244_ref45","unstructured":"McCarthy, P. M. , Lewis, G. A. , Dufty, D. F. and McNamara, D. S. (2006). Analyzing writing styles with Coh-Metrix. Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference, Melbourne Beach, FL, USA, pp. 764\u2013769."},{"key":"S1351324923000244_ref28","first-page":"340","article-title":"Correspondence analysis: A neglected multivariate method","volume":"23","author":"Hill","year":"1974","journal-title":"Journal of the Royal Statistical Society. Series C (Applied Statistics)"},{"key":"S1351324923000244_ref47","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/12.4.227"},{"key":"S1351324923000244_ref66","doi-asserted-by":"publisher","DOI":"10.1007\/11871637_59"},{"key":"S1351324923000244_ref32","doi-asserted-by":"publisher","DOI":"10.1109\/IAEAC50856.2021.9390956"},{"key":"S1351324923000244_ref54","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-021-01919-8"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324923000244","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T08:32:33Z","timestamp":1726821153000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324923000244\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,18]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["S1351324923000244"],"URL":"https:\/\/doi.org\/10.1017\/s1351324923000244","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,18]]},"assertion":[{"value":"\u00a9 The Author(s), 2023. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}