{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T10:06:10Z","timestamp":1764842770249,"version":"3.37.3"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T00:00:00Z","timestamp":1694217600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T00:00:00Z","timestamp":1694217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004543","name":"China Scholarship Council","doi-asserted-by":"publisher","award":["No. 202007720017"],"award-info":[{"award-number":["No. 202007720017"]}],"id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Intell Inf Syst"],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted. An alternative information retrieval technique that ignores the marginal effects is correspondence analysis (CA). In this paper, the information retrieval performance of LSA and CA is empirically compared. Moreover, it is explored whether the two weightings also improve the performance of CA. The results for four empirical datasets show that CA always performs better than LSA. Weighting the elements of the raw data matrix can improve CA; however, it is data dependent and the improvement is small. Adjusting the singular value weighting exponent often improves the performance of CA; however, the extent of the improvement depends on the dataset and the number of dimensions.<\/jats:p>","DOI":"10.1007\/s10844-023-00815-y","type":"journal-article","created":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T10:02:14Z","timestamp":1694253734000},"page":"209-230","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Improving information retrieval through correspondence analysis instead of latent semantic analysis"],"prefix":"10.1007","volume":"62","author":[{"given":"Qianqian","family":"Qi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David J.","family":"Hessen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter G. M.","family":"van der Heijden","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,9,9]]},"reference":[{"key":"815_CR1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-73531-3","author":"CC Aggarwal","year":"2018","unstructured":"Aggarwal, C. C. (2018). Machine learning for text. Springer. https:\/\/doi.org\/10.1007\/978-3-319-73531-3","journal-title":"Springer"},{"key":"815_CR2","doi-asserted-by":"publisher","unstructured":"Al-Qahtani, M., Amira, A., Ramzan, N. (2015). An efficient information retrieval technique for e-health systems. In: 2015 International Conference on Systems, Signals and Image Processing (IWSSIP), 257\u2013260, https:\/\/doi.org\/10.1109\/IWSSIP.2015.7314225","DOI":"10.1109\/IWSSIP.2015.7314225"},{"key":"815_CR3","unstructured":"Altszyler E, Sigman M, Ribeiro S, et\u00a0al (2016) Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. Preprint at arXiv:1610.01520"},{"key":"815_CR4","doi-asserted-by":"publisher","unstructured":"Arenas-M\u00e1rquez, F. J., Martinez-Torres, R., & Toral, S. (2021). Convolutional neural encoding of online reviews for the identification of travel group type topics on tripadvisor. Information Processing & Management, 58(5), 102,645. https:\/\/doi.org\/10.1016\/j.ipm.2021.102645","DOI":"10.1016\/j.ipm.2021.102645"},{"issue":"5","key":"815_CR5","doi-asserted-by":"publisher","first-page":"1736","DOI":"10.1016\/j.ipm.2019.05.008","volume":"56","author":"AM Azmi","year":"2019","unstructured":"Azmi, A. M., Al-Jouie, M. F., & Hussain, M. (2019). AAEE-Automated evaluation of students\u2019 essays in Arabic language. Information Processing & Management, 56(5), 1736\u20131752. https:\/\/doi.org\/10.1016\/j.ipm.2019.05.008","journal-title":"Information Processing & Management"},{"key":"815_CR6","unstructured":"Bacciu, A., Morgia, M.L., Mei, A., et al. (2019). Bot and Gender Detection of Twitter Accounts Using Distortion and LSA. In: CLEF"},{"issue":"3","key":"815_CR7","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1007\/s10579-009-9081-4","volume":"43","author":"M Baroni","year":"2009","unstructured":"Baroni, M., Bernardini, S., Ferraresi, A., et al. (2009). The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209-s226. https:\/\/doi.org\/10.1007\/s10579-009-9081-4","journal-title":"Language Resources and Evaluation"},{"key":"815_CR8","doi-asserted-by":"publisher","DOI":"10.1002\/9781119044482","volume-title":"An introduction to correspondence analysis","author":"EJ Beh","year":"2021","unstructured":"Beh, E. J., & Lombardo, R. (2021). An introduction to correspondence analysis. John Wiley & Sons."},{"issue":"4","key":"815_CR9","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1137\/1037127","volume":"37","author":"MW Berry","year":"1995","unstructured":"Berry, M. W., Dumais, S. T., & O\u2019Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573\u2013595. https:\/\/doi.org\/10.1137\/1037127","journal-title":"SIAM Review"},{"key":"815_CR10","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-022-00772-y","author":"GD Bianco","year":"2023","unstructured":"Bianco, G. D., Duarte, D., & Gon\u00e7alves, M. A. (2023). Reducing the user labeling effort in effective high recall tasks by fine-tuning active learning. Journal of Intelligent Information Systems. https:\/\/doi.org\/10.1007\/s10844-022-00772-y","journal-title":"Journal of Intelligent Information Systems"},{"issue":"4","key":"815_CR11","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1504\/IJCAT.2019.101171","volume":"60","author":"M Bounabi","year":"2019","unstructured":"Bounabi, M., Moutaouakil, K. E., & Satori, K. (2019). A comparison of text classification methods using different stemming techniques. International Journal of Computer Applications in Technology, 60(4), 298\u2013306. https:\/\/doi.org\/10.1504\/IJCAT.2019.101171","journal-title":"International Journal of Computer Applications in Technology"},{"issue":"3","key":"815_CR12","doi-asserted-by":"publisher","first-page":"890","DOI":"10.3758\/s13428-011-0183-8","volume":"44","author":"JA Bullinaria","year":"2012","unstructured":"Bullinaria, J. A., & Levy, J. P. (2012). Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods, 44(3), 890\u2013907. https:\/\/doi.org\/10.3758\/s13428-011-0183-8","journal-title":"Behavior Research Methods"},{"key":"815_CR13","unstructured":"Caron, J. (2001). Experiments with LSA scoring: Optimal rank and basis. In: Proceedings of the SIAM Computational Information Retrieval Workshop, 157\u2013169"},{"key":"815_CR14","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1007\/s10791-021-09394-4","volume":"24","author":"CY Chang","year":"2021","unstructured":"Chang, C. Y., Lee, S. J., Wu, C. H., et al. (2021). Using word semantic concepts for plagiarism detection in text documents. Information Retrieval Journal, 24, 298\u2013321. https:\/\/doi.org\/10.1007\/s10791-021-09394-4","journal-title":"Information Retrieval Journal"},{"issue":"6","key":"815_CR15","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","volume":"41","author":"S Deerwester","year":"1990","unstructured":"Deerwester, S., Dumais, S. T., Furnas, G. W., et al. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391\u2013407. https:\/\/doi.org\/10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","journal-title":"Journal of the American Society for Information Science"},{"key":"815_CR16","unstructured":"Drozd, A., Gladkova, A., Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: Beyond king-man+ woman= queen. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 3519\u20133530, https:\/\/aclanthology.org\/C16-1332"},{"issue":"12","key":"815_CR17","doi-asserted-by":"publisher","first-page":"7867","DOI":"10.1002\/int.22612","volume":"36","author":"L Duan","year":"2021","unstructured":"Duan, L., Gao, T., Ni, W., et al. (2021). A hybrid intelligent service recommendation by latent semantics and explicit ratings. International Journal of Intelligent Systems, 36(12), 7867\u20137894. https:\/\/doi.org\/10.1002\/int.22612","journal-title":"International Journal of Intelligent Systems"},{"issue":"2","key":"815_CR18","doi-asserted-by":"publisher","first-page":"229","DOI":"10.3758\/BF03203370","volume":"23","author":"ST Dumais","year":"1991","unstructured":"Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23(2), 229\u2013236. https:\/\/doi.org\/10.3758\/BF03203370","journal-title":"Behavior Research Methods, Instruments, & Computers"},{"key":"815_CR19","doi-asserted-by":"publisher","unstructured":"Dumais, S.T., Furnas, G.W., Landauer, T.K., et al. (1988). Using latent semantic analysis to improve access to textual information. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 281\u2013285, https:\/\/doi.org\/10.1145\/57167.57214","DOI":"10.1145\/57167.57214"},{"issue":"3","key":"815_CR20","doi-asserted-by":"publisher","first-page":"453","DOI":"10.2307\/2334381","volume":"58","author":"KR Gabriel","year":"1971","unstructured":"Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal component analysis. Biometrika, 58(3), 453\u2013467. https:\/\/doi.org\/10.2307\/2334381","journal-title":"Biometrika"},{"key":"815_CR21","volume-title":"An introduction to statistical learning: with applications in R","author":"J Gareth","year":"2021","unstructured":"Gareth, J., Daniela, W., Trevor, H., et al. (2021). An introduction to statistical learning: with applications in R. Springer."},{"key":"815_CR22","volume-title":"Theory and applications of correspondence analysis","author":"MJ Greenacre","year":"1984","unstructured":"Greenacre, M. J. (1984). Theory and applications of correspondence analysis. Academic Press."},{"key":"815_CR23","doi-asserted-by":"publisher","DOI":"10.1201\/9781315369983","volume-title":"Correspondence analysis in practice","author":"MJ Greenacre","year":"2017","unstructured":"Greenacre, M. J. (2017). Correspondence analysis in practice. CRC Press."},{"key":"815_CR24","doi-asserted-by":"publisher","unstructured":"Greene, D., Cunningham, P. (2006). Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, 377-384, https:\/\/doi.org\/10.1145\/1143844.1143892","DOI":"10.1145\/1143844.1143892"},{"issue":"4","key":"815_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3486250","volume":"40","author":"J Guo","year":"2022","unstructured":"Guo, J., Cai, Y., Fan, Y., et al. (2022). Semantic models for the first-stage retrieval: A comprehensive review. ACM Transactions on Information Systems (TOIS), 40(4), 1\u201342. https:\/\/doi.org\/10.1145\/3486250","journal-title":"ACM Transactions on Information Systems (TOIS)"},{"key":"815_CR26","doi-asserted-by":"publisher","unstructured":"Gupta, H., Patel, M. (2021). Method Of Text Summarization Using Lsa And Sentence Based Topic Modelling With Bert. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), 511\u2013517, https:\/\/doi.org\/10.1109\/ICAIS50930.2021.9395976","DOI":"10.1109\/ICAIS50930.2021.9395976"},{"key":"815_CR27","doi-asserted-by":"publisher","first-page":"13745","DOI":"10.1007\/s00521-021-06014-6","volume":"33","author":"A Hassani","year":"2021","unstructured":"Hassani, A., Iranmanesh, A., & Mansouri, N. (2021). Text mining using nonnegative matrix factorization and latent semantic analysis. Neural Computing and Applications, 33, 13745\u201313766. https:\/\/doi.org\/10.1007\/s00521-021-06014-6","journal-title":"Neural Computing and Applications"},{"key":"815_CR28","doi-asserted-by":"publisher","first-page":"10639","DOI":"10.1007\/s13369-022-06704-w","volume":"47","author":"F Horasan","year":"2022","unstructured":"Horasan, F. (2022). Latent Semantic Indexing-Based Hybrid Collaborative Filtering for Recommender Systems. Arabian Journal for Science and Engineering, 47, 10639\u201310653. https:\/\/doi.org\/10.1007\/s13369-022-06704-w","journal-title":"Arabian Journal for Science and Engineering"},{"key":"815_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2019\/1095643","volume":"2019","author":"F Horasan","year":"2019","unstructured":"Horasan, F., Erbay, H., Var\u00e7in, F., et al. (2019). Alternate Low-Rank Matrix Approximation in Latent Semantic Analysis. Scientific Programming, 2019, 1\u201312. https:\/\/doi.org\/10.1155\/2019\/1095643","journal-title":"Scientific Programming"},{"issue":"6","key":"815_CR30","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1017\/S1351324920000121","volume":"26","author":"R Hou","year":"2020","unstructured":"Hou, R., & Huang, C. R. (2020). Classification of regional and genre varieties of chinese: A correspondence analysis approach based on comparable balanced corpora. Natural Language Engineering, 26(6), 613\u2013640. https:\/\/doi.org\/10.1017\/S1351324920000121","journal-title":"Natural Language Engineering"},{"key":"815_CR31","unstructured":"Hu, X., Cai, Z., Franceschetti, D., et al. (2003). LSA: First dimension and dimensional weighting. In: Proceedings of the Annual Meeting of the Cognitive Science Society"},{"key":"815_CR32","unstructured":"Kestemont, M., Stronks, E., De Bruin, M., et\u00a0al. (2017). Retrieved July 17, 2021, from https:\/\/github.com\/mikekestemont\/anthem"},{"issue":"4","key":"815_CR33","doi-asserted-by":"publisher","first-page":"322","DOI":"10.1145\/291128.291131","volume":"16","author":"TG Kolda","year":"1998","unstructured":"Kolda, T. G., & O\u2019leary, D. .P. (1998). A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Transactions on Information Systems (TOIS), 16(4), 322\u2013346. https:\/\/doi.org\/10.1145\/291128.291131","journal-title":"ACM Transactions on Information Systems (TOIS)"},{"key":"815_CR34","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1162\/tacla00134","volume":"3","author":"O Levy","year":"2015","unstructured":"Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211\u2013225. https:\/\/doi.org\/10.1162\/tacla00134","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"815_CR35","doi-asserted-by":"publisher","unstructured":"Liu, T., Ungar, L., Sedoc, J. (2019). Unsupervised post-processing of word vectors via conceptor negation. In: Proceedings of the AAAI Conference on Artificial Intelligence, 6778\u20136785, https:\/\/doi.org\/10.1609\/aaai.v33i01.33016778","DOI":"10.1609\/aaai.v33i01.33016778"},{"key":"815_CR36","unstructured":"Morin, A. (2004). Intensive use of correspondence analysis for information retrieval. In: 26th International Conference on Information Technology Interfaces, 2004, 255\u2013258"},{"key":"815_CR37","unstructured":"Mu, J., Viswanath, P. (2018). All-but-the-top: Simple and effective postprocessing for word representations. 6th International Conference on Learning Representations, ICLR 2018"},{"key":"815_CR38","doi-asserted-by":"publisher","unstructured":"\u00d6sterlund, A., \u00d6dling, D., Sahlgren, M. (2015). Factorization of latent variables in distributional semantic models. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 227\u2013231, https:\/\/doi.org\/10.18653\/v1\/D15-1024","DOI":"10.18653\/v1\/D15-1024"},{"key":"815_CR39","doi-asserted-by":"publisher","DOI":"10.3966\/160792642019072004004","author":"U Parali","year":"2019","unstructured":"Parali, U., Zontul, M., & Ertu\u011frul, D. C. (2019). Information retrieval using the reduced row echelon form of a term-document matrix. Journal of Internet Technology. https:\/\/doi.org\/10.3966\/160792642019072004004","journal-title":"Journal of Internet Technology"},{"key":"815_CR40","doi-asserted-by":"publisher","unstructured":"Patil, A. (2022). Word Significance Analysis in Documents for Information Retrieval by LSA and TF-IDF using Kubeflow. In: Expert Clouds and Applications. Springer Singapore, Singapore, 335\u2013348, https:\/\/doi.org\/10.1007\/978-981-16-2126-029","DOI":"10.1007\/978-981-16-2126-029"},{"key":"815_CR41","unstructured":"Phillips, T., Saleh, A., Glazewski, K.D., et al. (2021). Comparing Natural Language Processing Methods for Text Classification of Small Educational Data. In: Companion Proceedings 11th International Conference on Learning Analytics & Knowledge"},{"key":"815_CR42","doi-asserted-by":"publisher","unstructured":"Qi, Q., Hessen, D. J., Deoskar, T., et al. (2023). A comparison of latent semantic analysis and correspondence analysis of document-term matrices. Natural Language Engineering, 1\u201331. https:\/\/doi.org\/10.1017\/S1351324923000244","DOI":"10.1017\/S1351324923000244"},{"key":"815_CR43","unstructured":"Rennie, J. (2005). 20 newsgroups data set. Retrieved April 21, 2022, from http:\/\/qwone.com\/~jason\/20Newsgroups\/"},{"key":"815_CR44","unstructured":"S\u00e9gu\u00e9la, J., Saporta, G. (2011). A comparison between latent semantic analysis and correspondence analysis. In: CARME 2011 International Conference on Correspondence Analysis and Related Methods"},{"issue":"114","key":"815_CR45","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1016\/j.eswa.2020.114130","volume":"165","author":"RM Suleman","year":"2021","unstructured":"Suleman, R. M., & Korkontzelos, I. (2021). Extending latent semantic analysis to manage its syntactic blindness. Expert Systems with Applications, 165(114), 130. https:\/\/doi.org\/10.1016\/j.eswa.2020.114130","journal-title":"Expert Systems with Applications"},{"issue":"1","key":"815_CR46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-021-87971-9","volume":"11","author":"A Van Dam","year":"2021","unstructured":"Van Dam, A., Dekker, M., Morales-Castilla, I., et al. (2021). Correspondence analysis, spectral clustering and graph embedding: applications to ecology and economic complexity. Scientific Reports, 11(1), 1\u201314. https:\/\/doi.org\/10.1038\/s41598-021-87971-9","journal-title":"Scientific Reports"},{"key":"815_CR47","unstructured":"Yin, Z., Shen, Y. (2018). On the dimensionality of word embedding. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS\u201918, 895\u2013906"},{"issue":"3","key":"815_CR48","doi-asserted-by":"publisher","first-page":"2758","DOI":"10.1016\/j.eswa.2010.08.066","volume":"38","author":"W Zhang","year":"2011","unstructured":"Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3), 2758\u20132765. https:\/\/doi.org\/10.1016\/j.eswa.2010.08.066","journal-title":"Expert Systems with Applications"}],"container-title":["Journal of Intelligent Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10844-023-00815-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10844-023-00815-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10844-023-00815-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,10]],"date-time":"2024-03-10T18:06:07Z","timestamp":1710093967000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10844-023-00815-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,9]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["815"],"URL":"https:\/\/doi.org\/10.1007\/s10844-023-00815-y","relation":{},"ISSN":["0925-9902","1573-7675"],"issn-type":[{"type":"print","value":"0925-9902"},{"type":"electronic","value":"1573-7675"}],"subject":[],"published":{"date-parts":[[2023,9,9]]},"assertion":[{"value":"30 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 August 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 August 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Author Qianqian Qi is supported by the China Scholarship Council (CSC202007720017). Author David J. Hessen and Author Peter G. M. van der Heijden have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}]}}