{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T18:41:25Z","timestamp":1726425685942},"reference-count":0,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[1999,12,1]],"date-time":"1999-12-01T00:00:00Z","timestamp":944006400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[1999,12]]},"abstract":"<jats:p>This paper describes an approach for constructing a mixture of language models based on \nsimple statistical notions of semantics using probabilistic models developed for information \nretrieval. The approach encapsulates corpus-derived semantic information and is able to model \nvarying styles of text. Using such information, the corpus texts are clustered in an unsupervised \nmanner and a mixture of topic-specific language models is automatically created. The principal \ncontribution of this work is to characterise the <jats:italic>document space<\/jats:italic> resulting from information \nretrieval techniques and to demonstrate the approach for mixture language modelling. A \ncomparison is made between manual and automatic clustering in order to elucidate how the \nglobal content information is expressed in the space. We also compare (in terms of association \nwith manual clustering and language modelling accuracy) alternative term-weighting schemes \nand the effect of singular value decomposition dimension reduction (latent semantic analysis). \nTest set perplexity results using the British National Corpus indicate that the approach can \nimprove the potential of statistical language modelling. Using an adaptive procedure, the \nconventional model may be tuned to track text data with a slight increase in computational \ncost.<\/jats:p>","DOI":"10.1017\/s1351324900002278","type":"journal-article","created":{"date-parts":[[2002,7,27]],"date-time":"2002-07-27T09:30:22Z","timestamp":1027762222000},"page":"355-375","source":"Crossref","is-referenced-by-count":4,"title":["Topic-based mixture language modelling"],"prefix":"10.1017","volume":"5","author":[{"given":"YOSHIHIKO","family":"GOTOH","sequence":"first","affiliation":[]},{"given":"STEVE","family":"RENALS","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[1999,12,1]]},"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324900002278","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,5,9]],"date-time":"2019-05-09T15:41:07Z","timestamp":1557416467000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324900002278\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[1999,12]]},"references-count":0,"journal-issue":{"issue":"4","published-print":{"date-parts":[[1999,12]]}},"alternative-id":["S1351324900002278"],"URL":"https:\/\/doi.org\/10.1017\/s1351324900002278","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[1999,12]]}}}