{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,3]],"date-time":"2022-04-03T07:48:32Z","timestamp":1648972112754},"reference-count":18,"publisher":"World Scientific Pub Co Pte Lt","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Info. Tech. Dec. Mak."],"published-print":{"date-parts":[[2009,6]]},"abstract":"<jats:p> As a hybrid of N-gram in natural language processing and collocation in statistical linguistics, multi-word is becoming a hot topic in area of text mining and information retrieval. In this paper, a study concerning distribution of multi-words is carried out to explore a theoretical basis for probabilistic term-weighting scheme. Specifically, the Poisson distribution, zero-inflated binomial distribution, and G-distribution are comparatively studied on a task of predicting probabilities of multi-words' occurrences using these distributions, for both technical multi-words and nontechnical multi-words. In addition, a rule-based multi-word extraction algorithm is proposed to extract multi-words from texts based on words' occurring patterns and syntactical structures. Our experimental results demonstrate that G-distribution has the best capability to predict probabilities of frequency of multi-words' occurrence and the Poisson distribution is comparable to zero-inflated binomial distribution in estimation of multi-word distribution. The outcome of this study validates that burstiness is a universal phenomenon in linguistic count data, which is applicable not only for individual content words but also for multi-words. <\/jats:p>","DOI":"10.1142\/s0219622009003399","type":"journal-article","created":{"date-parts":[[2009,7,2]],"date-time":"2009-07-02T11:53:30Z","timestamp":1246535610000},"page":"249-265","source":"Crossref","is-referenced-by-count":4,"title":["DISTRIBUTION OF MULTI-WORDS IN CHINESE AND ENGLISH DOCUMENTS"],"prefix":"10.1142","volume":"08","author":[{"given":"WEN","family":"ZHANG","sequence":"first","affiliation":[{"name":"School of Knowledge Science, Japan Advanced Institute, of Science and Technology, 1-1 Asahidai, Tatsunokuchi, Ishikawa 923-1292, Japan"},{"name":"Laboratory of Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing 100190, P. R. China"}]},{"given":"TAKETOSHI","family":"YOSHIDA","sequence":"additional","affiliation":[{"name":"School of Knowledge Science, Japan Advanced Institute, of Science and Technology, 1-1 Asahidai, Tatsunokuchi, Ishikawa 923-1292, Japan"}]},{"given":"XIJIN","family":"TANG","sequence":"additional","affiliation":[{"name":"Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2011,11,20]]},"reference":[{"key":"rf4","volume":"7","author":"Peng Y.","journal-title":"Int. J. Inform. Technol. Decision Making"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1142\/S0219622005001428"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1142\/S0219622007002563"},{"key":"rf7","unstructured":"C. D.\u00a0Manining and S.\u00a0Scheutze, Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, Massachusetts, 1999)\u00a0pp. 548\u2013561."},{"key":"rf9","first-page":"545","volume":"358","author":"Wang D.","journal-title":"Physica A"},{"key":"rf10","doi-asserted-by":"publisher","DOI":"10.1142\/S0219525902000468"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324996001246"},{"key":"rf12","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1017\/S1351324900000139","volume":"1","author":"Church K. W.","journal-title":"Nat. Language Eng."},{"key":"rf15","unstructured":"C. M.\u00a0Bishop, Neural Network for Pattern Recognition (Oxford University Press, New York, 2003)\u00a0pp. 65\u201373."},{"key":"rf16","first-page":"51","volume":"3","author":"Zhang W.","journal-title":"Int. J. Knowl. Syst. Sci."},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1007\/s11518-007-5050-x"},{"key":"rf19","author":"Zhang W.","journal-title":"Int. J. Innovative Comput. Inform. Contr."},{"key":"rf20","first-page":"22","volume":"16","author":"Church K. W.","journal-title":"Comput. Linguist."},{"key":"rf21","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1017\/S1351324900000048","volume":"1","author":"Jueston F.","journal-title":"Nat. Language Eng."},{"key":"rf22","volume-title":"Human Behaviour and the Principle of Least Effort","author":"Zipf G. K.","year":"1949"},{"key":"rf23","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-5256-6"},{"key":"rf24","doi-asserted-by":"publisher","DOI":"10.1142\/S0219622005001477"},{"key":"rf25","doi-asserted-by":"publisher","DOI":"10.1142\/S0219622008003034"}],"container-title":["International Journal of Information Technology &amp; Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219622009003399","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T14:22:15Z","timestamp":1565187735000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219622009003399"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6]]},"references-count":18,"journal-issue":{"issue":"02","published-online":{"date-parts":[[2011,11,20]]},"published-print":{"date-parts":[[2009,6]]}},"alternative-id":["10.1142\/S0219622009003399"],"URL":"https:\/\/doi.org\/10.1142\/s0219622009003399","relation":{},"ISSN":["0219-6220","1793-6845"],"issn-type":[{"value":"0219-6220","type":"print"},{"value":"1793-6845","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,6]]}}}