{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,4,5]],"date-time":"2024-04-05T07:18:43Z","timestamp":1712301523062},"reference-count":7,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p>In this demonstration, we introduce MLJ (MultiLingual Journalism, http:\/\/mljournalism.com), a first Web-based system that enables users to search any topic of latest tweets posted by media outlets and journalists beyond languages. Handling multilingual tweets in real time involves many technical challenges: language barrier, sparsity of words, and real-time data stream. To overcome the language barrier and the sparsity of words, MLJ harnesses CL-ESA, a Wikipedia-based language-independent method to generate a vector of Wikipedia pages (entities) from an input text. To continuously deal with tweet stream, we propose one-pass DP-means, an online clustering method based on DP-means. Given a new tweet as an input, MLJ generates a vector using CL-ESA and classifies it into one of clusters using one-pass DP-means. By interpreting a search query as a vector, users can instantly search clusters containing latest related tweets from the query without being aware of language differences. MLJ as of March 2014 supports nine languages including English, Japanese, Korean, Spanish, Portuguese, German, French, Italian, and Arabic covering 24 countries.<\/jats:p>","DOI":"10.14778\/2733004.2733041","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"1605-1608","source":"Crossref","is-referenced-by-count":4,"title":["MLJ"],"prefix":"10.14778","volume":"7","author":[{"given":"Masumi","family":"Shirakawa","sequence":"first","affiliation":[{"name":"Osaka University, Japan"}]},{"given":"Takahiro","family":"Hara","sequence":"additional","affiliation":[{"name":"Osaka University, Japan"}]},{"given":"Shojiro","family":"Nishio","sequence":"additional","affiliation":[{"name":"Osaka University, Japan"}]}],"member":"320","published-online":{"date-parts":[[2014,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1344","volume":"5","author":"Aouragh M.","year":"2011","unstructured":"M. Aouragh and A. Alexander . The Egyptian Experience : Sense and Nonsense of the Internet Revolution. International Journal of Communication , 5 : 1344 -- 1358 , 2011 . M. Aouragh and A. Alexander. The Egyptian Experience: Sense and Nonsense of the Internet Revolution. International Journal of Communication, 5: 1344--1358, 2011.","journal-title":"Sense and Nonsense of the Internet Revolution. International Journal of Communication"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1614025.1614026"},{"key":"e_1_2_1_3_1","first-page":"1606","volume-title":"IJCAI","author":"Gabrilovich E.","year":"2007","unstructured":"E. Gabrilovich and S. Markovitch . Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis . In IJCAI , pages 1606 -- 1611 , Jan. 2007 . E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In IJCAI, pages 1606--1611, Jan. 2007."},{"key":"e_1_2_1_4_1","first-page":"513","volume-title":"ICML","author":"Kulis B.","year":"2012","unstructured":"B. Kulis and M. I. Jordan . Revisiting k-means: New Algorithms via Bayesian Nonparametrics . In ICML , pages 513 -- 520 , July 2012 . B. Kulis and M. I. Jordan. Revisiting k-means: New Algorithms via Bayesian Nonparametrics. In ICML, pages 513--520, July 2012."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505623"},{"key":"e_1_2_1_6_1","volume-title":"Cross-lingual Information Retrieval with Explicit Semantic Analysis. In Working Notes for the CLEF 2008 Workshop","author":"Sorg P.","year":"2008","unstructured":"P. Sorg and P. Cimiano . Cross-lingual Information Retrieval with Explicit Semantic Analysis. In Working Notes for the CLEF 2008 Workshop , Sept. 2008 . P. Sorg and P. Cimiano. Cross-lingual Information Retrieval with Explicit Semantic Analysis. In Working Notes for the CLEF 2008 Workshop, Sept. 2008."},{"key":"e_1_2_1_7_1","first-page":"1","volume-title":"SIGIR Workshop on Information Access in a Multilingual World","author":"Steinberger R.","year":"2009","unstructured":"R. Steinberger , B. Pouliquen , and E. van der Goot. An Introduction to the Europe Media Monitor Family of Applications . In SIGIR Workshop on Information Access in a Multilingual World , pages 1 -- 8 , July 2009 . R. Steinberger, B. Pouliquen, and E. van der Goot. An Introduction to the Europe Media Monitor Family of Applications. In SIGIR Workshop on Information Access in a Multilingual World, pages 1--8, July 2009."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2733004.2733041","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:42:46Z","timestamp":1672220566000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2733004.2733041"}},"subtitle":["language-independent real-time search of tweets reported by media outlets and journalists"],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":7,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.14778\/2733004.2733041"],"URL":"https:\/\/doi.org\/10.14778\/2733004.2733041","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,8]]}}}