{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:40:55Z","timestamp":1758271255466},"reference-count":5,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["TACL"],"published-print":{"date-parts":[[2017,12]]},"abstract":"<jats:p> This paper presents a novel hybrid generative\/discriminative model of word segmentation based on nonparametric Bayesian methods. Unlike ordinary discriminative word segmentation which relies only on labeled data, our semi-supervised model also leverages a huge amounts of unlabeled text to automatically learn new \u201cwords\u201d, and further constrains them by using a labeled data to segment non-standard texts such as those found in social networking services. <\/jats:p><jats:p> Specifically, our hybrid model combines a discriminative classifier (CRF; Lafferty et al. (2001) and unsupervised word segmentation (NPYLM; Mochihashi et al. (2009)), with a transparent exchange of information between these two model structures within the semi-supervised framework (JESS-CM; Suzuki and Isozaki (2008)). We confirmed that it can appropriately segment non-standard texts like those in Twitter and Weibo and has nearly state-of-the-art accuracy on standard datasets in Japanese, Chinese, and Thai. <\/jats:p>","DOI":"10.1162\/tacl_a_00054","type":"journal-article","created":{"date-parts":[[2018,12,28]],"date-time":"2018-12-28T15:42:50Z","timestamp":1546011770000},"page":"179-189","source":"Crossref","is-referenced-by-count":8,"title":["Nonparametric Bayesian Semi-supervised Word                     Segmentation"],"prefix":"10.1162","volume":"5","author":[{"given":"Ryo","family":"Fujii","sequence":"first","affiliation":[{"name":"Hakuhodo Inc. R&D Division, 5-3-1 Akasaka, Minato-ku, Tokyo,"}]},{"given":"Ryo","family":"Domoto","sequence":"additional","affiliation":[{"name":"Hakuhodo Inc. R&D Division, 5-3-1 Akasaka, Minato-ku, Tokyo,"}]},{"given":"Daichi","family":"Mochihashi","sequence":"additional","affiliation":[{"name":"The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa city,                         Tokyo,"}]}],"member":"281","reference":[{"issue":"3","key":"p_13","first-page":"1","volume":"1","author":"MacKay David J. C.","year":"1994","journal-title":"Natural Language Engineering"},{"key":"p_20","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1198\/016214502753479464","volume":"97","author":"Scott Steven L.","year":"2002","journal-title":"Journal of the American Statistical Association"},{"key":"p_25","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00193"},{"issue":"2","key":"p_28","first-page":"529","volume":"4","author":"Tsuboi Yuta","year":"2009","journal-title":"Information and Media Technologies"},{"issue":"411","key":"p_30","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1080\/01621459.1990.10474930","volume":"85","author":"Wei Greg C.G.","year":"1990","journal-title":"Journal of the American Statistical Association"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00054","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:38:08Z","timestamp":1615585088000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/43396"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12]]},"references-count":5,"alternative-id":["10.1162\/tacl_a_00054"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00054","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12]]}}}