{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T01:13:24Z","timestamp":1768353204476,"version":"3.49.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,7,15]],"date-time":"2015-07-15T00:00:00Z","timestamp":1436918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Innovative Research Team in Soochow University","award":["SDT2012B02"],"award-info":[{"award-number":["SDT2012B02"]}]},{"name":"Natural Science Foundation of the Jiangsu Higher Education Institutions of China","award":["12KJA520004"],"award-info":[{"award-number":["12KJA520004"]}]},{"name":"National Grant Fundamental Research (973 Program) of China","award":["2014CB340304"],"award-info":[{"award-number":["2014CB340304"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61373092 and 61033013"],"award-info":[{"award-number":["61373092 and 61033013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2015,8,13]]},"abstract":"<jats:p>\n            Latent Dirichlet allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engine and online advertising systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some of the largest LDA models reported in literature have up to 10\n            <jats:sup>3<\/jats:sup>\n            topics, which difficultly cover the long-tail semantic word sets. In this article, we show that the number of topics is a key factor that can significantly boost the utility of topic-modeling systems. In particular, we show that a \u201cbig\u201d LDA model with at least 10\n            <jats:sup>5<\/jats:sup>\n            topics inferred from 10\n            <jats:sup>9<\/jats:sup>\n            search queries can achieve a significant improvement on industrial search engine and online advertising systems, both of which serve hundreds of millions of users. We develop a novel distributed system called\n            <jats:italic>Peacock<\/jats:italic>\n            to learn big LDA models from big data. The main features of Peacock include hierarchical distributed architecture, real-time prediction, and topic de-duplication. We empirically demonstrate that the Peacock system is capable of providing significant benefits via highly scalable LDA topic models for several industrial applications.\n          <\/jats:p>","DOI":"10.1145\/2700497","type":"journal-article","created":{"date-parts":[[2015,7,17]],"date-time":"2015-07-17T13:21:25Z","timestamp":1437139285000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Peacock"],"prefix":"10.1145","volume":"6","author":[{"given":"Yi","family":"Wang","sequence":"first","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuemin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenlong","family":"Sun","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Yan","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhihui","family":"Jin","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liubin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Soochow University, Shuzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ching","family":"Law","sequence":"additional","affiliation":[{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jia","family":"Zeng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Soochow University &amp; Huawei Noah's Ark Lab, Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,7,15]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of ICML. 1044--1052","author":"Ahn Sungjin","year":"2014"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2124295.2124312"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273501"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of UAI. 27--34","author":"Asuncion Arthur","year":"2009"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of NIPS. 81--88","author":"Asuncion Arthur U.","year":"2008"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2008.09.023"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_8_1","volume-title":"Lecture Introduction to Computational Advertising","author":"Broder Andrei"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956944"},{"key":"e_1_2_1_10_1","volume-title":"Jordan","author":"Broderick Tamara","year":"2013"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/11752790_1"},{"key":"e_1_2_1_12_1","unstructured":"N. de Freitas and K. Barnard. 2001. Bayesian Latent Semantic Analysis of Multimedia Databases. Technical Report. University of British Columbia.   N. de Freitas and K. Barnard. 2001. Bayesian Latent Semantic Analysis of Multimedia Databases. Technical Report. University of British Columbia."},{"key":"e_1_2_1_13_1","volume-title":"Ng","author":"Dean Jeffrey","year":"2012"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1257\/aer.97.1.242"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487697"},{"key":"e_1_2_1_17_1","volume-title":"Foulds and Padhraic Smyth","author":"James","year":"2014"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020426"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of ICML. 13--20","author":"Graepel Thore","year":"2010"},{"key":"e_1_2_1_20_1","unstructured":"David Graff and Christopher Cieri. 2003. English Gigaword. Linguistic Data Consortium.  David Graff and Christopher Cieri. 2003. English Gigaword. Linguistic Data Consortium."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0307752101"},{"key":"e_1_2_1_22_1","volume-title":"Bach","author":"Hoffman Matthew D.","year":"2010"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2567709.2502622"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of NIPS.","author":"Johnson Matthew","year":"2013"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454008.1454027"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961198"},{"key":"e_1_2_1_27_1","volume-title":"Blei","author":"Mimno David M.","year":"2012"},{"key":"e_1_2_1_28_1","volume-title":"Lafferty","author":"Minka Thomas P.","year":"2002"},{"key":"e_1_2_1_29_1","volume-title":"Machine Learning: A Probabilistic Perspective","author":"Murphy Kevin P.","year":"2012"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of NIPS.","author":"Newman David","year":"2007"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of HLT-NAACL. 100--108","author":"Newman David","year":"2010"},{"key":"e_1_2_1_32_1","volume-title":"Wright","author":"Niu Feng","year":"2011"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of NIPS. 3102--3110","author":"Patterson Sam","year":"2013"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401960"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242643"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of ICML.","author":"Sato Issei","year":"2012"},{"key":"e_1_2_1_38_1","volume-title":"Nicolas Le Roux, and Francis Bach","author":"Schmidt Mark W.","year":"2013"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920931"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014087"},{"key":"e_1_2_1_41_1","article-title":"Hierarchical Dirichlet processes","volume":"101","author":"Teh Yee Whye","year":"2004","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of NIPS. 1353--1360","author":"Teh Yee Whye","year":"2006"},{"key":"e_1_2_1_43_1","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees Ellen M.","year":"2005"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of NIPS. 1973--1981","author":"Wallach Hanna M.","year":"2009"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553515"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02158-9_26"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of NIPS. 2134--2142","author":"Yan Feng","year":"2009"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-014-1376-8"},{"key":"e_1_2_1_49_1","unstructured":"Jian-Feng Yan Jia Zeng Zhi-Qiang Liu and Yang Gao. 2013. Towards big topic modeling. arXiv:1311.4150.  Jian-Feng Yan Jia Zeng Zhi-Qiang Liu and Yang Gao. 2013. Towards big topic modeling. arXiv:1311.4150."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557121"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1953034"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/2503308.2503314"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.185"},{"key":"e_1_2_1_54_1","unstructured":"Jia Zeng Zhi-Qiang Liu and Xiao-Qin Cao. 2012a. A new approach to speeding up topic modeling. arXiv:1204.0170 {cs.LG}.  Jia Zeng Zhi-Qiang Liu and Xiao-Qin Cao. 2012a. A new approach to speeding up topic modeling. arXiv:1204.0170 {cs.LG}."},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Jia Zeng Zhi-Qiang Liu and Xiao-Qin Cao. 2012b. Online belief propagation for topic modeling. arXiv:1210.2179 {cs.LG}.  Jia Zeng Zhi-Qiang Liu and Xiao-Qin Cao. 2012b. Online belief propagation for topic modeling. arXiv:1210.2179 {cs.LG}.","DOI":"10.1007\/978-3-642-35527-1_61"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187955"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507164"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2700497","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2700497","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:34Z","timestamp":1750227214000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2700497"}},"subtitle":["Learning Long-Tail Topic Features for Industrial Applications"],"short-title":[],"issued":{"date-parts":[[2015,7,15]]},"references-count":57,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,8,13]]}},"alternative-id":["10.1145\/2700497"],"URL":"https:\/\/doi.org\/10.1145\/2700497","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,7,15]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-07-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}