{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:46:39Z","timestamp":1773153999149,"version":"3.50.1"},"reference-count":46,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2011,1,5]],"date-time":"2011-01-05T00:00:00Z","timestamp":1294185600000},"content-version":"unspecified","delay-in-days":4,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2011,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this article, we demonstrate several novel ways in which insights from information theory (IT) and computational linguistics (CL) can be woven into a vector-space-model (VSM) approach to information retrieval (IR). Our proposals focus, essentially, on three areas: pre-processing (morphological analysis), term weighting, and alternative geometrical models to the widely used term-by-document matrix. The latter include (1) PARAFAC2 decomposition of a term-by-document-by-language tensor, and (2) eigenvalue decomposition of a term-by-term matrix (inspired by Statistical Machine Translation). We evaluate all proposals, comparing them to a \u2018standard\u2019 approach based on Latent Semantic Analysis, on a multilingual document clustering task. The evidence suggests that proper consideration of IT within IR is indeed called for: in all cases, our best results are achieved using the information-theoretic variations upon the standard approach. Furthermore, we show that different information-theoretic options can be combined for still better results. A key function of language is to encode and convey information, and contributions of IT to the field of CL can be traced back a number of decades. We think that our proposals help bring IR and CL more into line with one another. In our conclusion, we suggest that the fact that our proposals yield empirical improvements is not coincidental given that they increase the theoretical transparency of VSM approaches to IR; on the contrary, they help shed light on why aspects of these approaches work as they do.<\/jats:p>","DOI":"10.1017\/s1351324910000185","type":"journal-article","created":{"date-parts":[[2011,1,5]],"date-time":"2011-01-05T10:44:27Z","timestamp":1294224267000},"page":"37-70","source":"Crossref","is-referenced-by-count":16,"title":["An information-theoretic, vector-space-model approach to cross-language information retrieval"],"prefix":"10.1017","volume":"17","author":[{"given":"PETER A.","family":"CHEW","sequence":"first","affiliation":[]},{"given":"BRETT W.","family":"BADER","sequence":"additional","affiliation":[]},{"given":"STEPHEN","family":"HELMREICH","sequence":"additional","affiliation":[]},{"given":"AHMED","family":"ABDELALI","sequence":"additional","affiliation":[]},{"given":"STEPHEN J.","family":"VERZI","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2011,1,5]]},"reference":[{"key":"S1351324910000185_ref14","first-page":"37","article-title":"The knowledge of good and evil: multilingual ideology classification with PARAFAC2 and machine learning","volume":"34","author":"Chew","year":"2008","journal-title":"Language Forum"},{"key":"S1351324910000185_ref13","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281211"},{"key":"S1351324910000185_ref12","first-page":"872","volume-title":"Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics","author":"Chew","year":"2007"},{"key":"S1351324910000185_ref10","doi-asserted-by":"publisher","DOI":"10.3758\/BF03193020"},{"key":"S1351324910000185_ref7","first-page":"17","volume-title":"Proceedings of the Second Meeting of the ACL Special Interest Group in Computational Phonology","author":"Broe","year":"1996"},{"key":"S1351324910000185_ref8","first-page":"263","article-title":"The mathematics of Statistical Machine Translation: parameter estimation","volume":"19","author":"Brown","year":"1994","journal-title":"Computational Linguistics"},{"key":"S1351324910000185_ref30","doi-asserted-by":"publisher","DOI":"10.1080\/01638539809545028"},{"key":"S1351324910000185_ref6","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-005-0255-6"},{"key":"S1351324910000185_ref2","doi-asserted-by":"publisher","DOI":"10.1201\/9781420059458.ch5"},{"key":"S1351324910000185_ref1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84800-046-9_8"},{"key":"S1351324910000185_ref5","unstructured":"Biola University 2005\u20132006. The Unbound Bible. Retrieved on January 29, 2008, from http:\/\/www.unboundbible.org\/"},{"key":"S1351324910000185_ref11","doi-asserted-by":"publisher","DOI":"10.2307\/410451"},{"key":"S1351324910000185_ref3","doi-asserted-by":"publisher","DOI":"10.3115\/1599081.1599088"},{"key":"S1351324910000185_ref4","volume-title":"Modern Information Retrieval","author":"Baeza-Yates","year":"1999"},{"key":"S1351324910000185_ref32","first-page":"725","volume-title":"Proceedings of the 5th IEEE International Conference on Data Mining","author":"Liu","year":"2005"},{"key":"S1351324910000185_ref9","first-page":"467","article-title":"Class-based n-gram models of natural language","volume":"18","author":"Brown","year":"1992","journal-title":"Computational Linguistics"},{"key":"S1351324910000185_ref33","first-page":"22","article-title":"Development of a stemming algorithm","volume":"11","author":"Lovins","year":"1968","journal-title":"Mechanical Translation and Computational Linguistics"},{"key":"S1351324910000185_ref37","volume-title":"Stochastic Complexity in Statistical Inquiry","author":"Rissanen","year":"1989"},{"key":"S1351324910000185_ref45","volume-title":"Translation (1949)","author":"Weaver","year":"1955"},{"key":"S1351324910000185_ref34","unstructured":"Matveeva I. , Levow G.-A. , Farahat A. , and Royer C. 2005. Term representation with generalized Latent Semantic Analysis. Paper presented at the International Conference on Recent Advances in Natural Language Processing (RANLP-05), September 2005, Borovets, Bulgaria. http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.110.2216&rep=rep1&type=pdf."},{"key":"S1351324910000185_ref17","volume-title":"The Sound Pattern of English","author":"Chomsky","year":"1968"},{"key":"S1351324910000185_ref15","doi-asserted-by":"crossref","unstructured":"Chisholm E. , and Kolda T. G. 1999. New term weighting formulas for the vector space method in information retrieval. Technical Report ORNL-TM-13756, Oak Ridge National Laboratory, Oak Ridge, TN.","DOI":"10.2172\/5698"},{"key":"S1351324910000185_ref16","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1956.1056813"},{"key":"S1351324910000185_ref20","doi-asserted-by":"publisher","DOI":"10.3758\/BF03203370"},{"key":"S1351324910000185_ref18","first-page":"3","volume-title":"Proceedings of SIGIR","author":"Cleverdon","year":"1991"},{"key":"S1351324910000185_ref46","unstructured":"Young P. 1994. Cross Language Information Retrieval Using Latent Semantic Indexing. Master's thesis, University of Knoxville, Knoxville, TN."},{"key":"S1351324910000185_ref19","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"S1351324910000185_ref21","doi-asserted-by":"publisher","DOI":"10.1007\/BF02288367"},{"key":"S1351324910000185_ref23","volume-title":"Matrix Computations","author":"Golub","year":"1996"},{"key":"S1351324910000185_ref24","volume-title":"The Sound Pattern of Russian","author":"Halle","year":"1959"},{"key":"S1351324910000185_ref38","doi-asserted-by":"publisher","DOI":"10.1126\/science.253.5023.974"},{"key":"S1351324910000185_ref25","doi-asserted-by":"publisher","DOI":"10.1016\/j.laa.2006.09.026"},{"key":"S1351324910000185_ref26","doi-asserted-by":"crossref","DOI":"10.1111\/j.1467-1770.1958.tb00870.x","volume-title":"A Course in Modern Linguistics","author":"Hockett","year":"1958"},{"key":"S1351324910000185_ref27","doi-asserted-by":"publisher","DOI":"10.3115\/980845.980955"},{"key":"S1351324910000185_ref28","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"S1351324910000185_ref29","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"S1351324910000185_ref31","first-page":"317","volume-title":"Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics","author":"Lin","year":"1999"},{"key":"S1351324910000185_ref44","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees","year":"2005"},{"key":"S1351324910000185_ref35","doi-asserted-by":"publisher","DOI":"10.3758\/BRM.41.3.647"},{"key":"S1351324910000185_ref36","doi-asserted-by":"publisher","DOI":"10.1023\/A:1001798929185"},{"key":"S1351324910000185_ref39","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S1351324910000185_ref41","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"S1351324910000185_ref42","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(198803)39:2<92::AID-ASI4>3.0.CO;2-P"},{"key":"S1351324910000185_ref43","volume-title":"Working Notes for the Cross-Language Evaluation Forum (CLEF) 2004 Workshop","author":"Tomlinson","year":"2004"},{"key":"S1351324910000185_ref22","doi-asserted-by":"publisher","DOI":"10.1162\/089120101750300490"},{"key":"S1351324910000185_ref40","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324910000185","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,27]],"date-time":"2019-04-27T15:53:05Z","timestamp":1556380385000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324910000185\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,1]]}},"alternative-id":["S1351324910000185"],"URL":"https:\/\/doi.org\/10.1017\/s1351324910000185","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,1]]}}}