{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,8]],"date-time":"2026-07-08T22:43:43Z","timestamp":1783550623231,"version":"3.55.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,5,13]],"date-time":"2020-05-13T00:00:00Z","timestamp":1589328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>\n            Extracting events accurately from vast news corpora and organize events logically is critical for news apps and search engines, which aim to organize news information collected from the Internet and present it to users in the most sensible forms. Intuitively speaking, an event is a group of news documents that report the same news incident possibly in different ways. In this article, we describe our experience of implementing a news content organization system at Tencent to discover events from vast streams of breaking news and to evolve news story structures in an online fashion. Our real-world system faces unique challenges in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we (1) need to accurately and quickly extract distinguishable events from massive streams of long text documents, and (2) must develop the structures of event stories in an online manner, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose\n            <jats:italic>Story Forest<\/jats:italic>\n            , a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. A core novelty of our\n            <jats:italic>Story Forest<\/jats:italic>\n            system is\n            <jats:italic>EventX<\/jats:italic>\n            , a semi-supervised scheme to extract events from massive Internet news corpora.\n            <jats:italic>EventX<\/jats:italic>\n            relies on a two-layered, graph-based clustering procedure to group documents into fine-grained events. We conducted extensive evaluations based on (1) 60 GB of real-world Chinese news data, (2) a large Chinese Internet news dataset that contains 11,748 news articles with truth event labels, and (3) the 20 News Groups English dataset, through detailed pilot user experience studies. The results demonstrate the superior capabilities of\n            <jats:italic>Story Forest<\/jats:italic>\n            to accurately identify events and organize news text into a logical structure that is appealing to human readers.\n          <\/jats:p>","DOI":"10.1145\/3377939","type":"journal-article","created":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T10:42:16Z","timestamp":1589884936000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":58,"title":["Story Forest"],"prefix":"10.1145","volume":"14","author":[{"given":"Bang","family":"Liu","sequence":"first","affiliation":[{"name":"University of Alberta, Alberta, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fred X.","family":"Han","sequence":"additional","affiliation":[{"name":"University of Alberta, Alberta, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Di","family":"Niu","sequence":"additional","affiliation":[{"name":"University of Alberta, Alberta, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Linglong","family":"Kong","sequence":"additional","affiliation":[{"name":"University of Alberta, Alberta, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kunfeng","family":"Lai","sequence":"additional","affiliation":[{"name":"Tencent, Nanshan, Shenzhen, Guangdong Province, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yu","family":"Xu","sequence":"additional","affiliation":[{"name":"Tencent, Nanshan, Shenzhen, Guangdong Province, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,5,13]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text clustering algorithms. In Mining Text Data. Springer 77--128.  Charu C. Aggarwal and ChengXiang Zhai. 2012. A survey of text clustering algorithms. In Mining Text Data. Springer 77--128.","DOI":"10.1007\/978-1-4614-3223-4_4"},{"key":"e_1_2_1_2_1","unstructured":"James Allan. 2012. Topic Detection and Tracking: Event-based Information Organization. Vol. 12. Springer Science 8 Business Media.  James Allan. 2012. Topic Detection and Tracking: Event-based Information Organization. Vol. 12. Springer Science 8 Business Media."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290954"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775110"},{"key":"e_1_2_1_5_1","first-page":"993","article-title":"Latent dirichlet allocation","author":"Blei David M.","year":"2003","journal-title":"Journal of Machine Learning Research 3"},{"key":"e_1_2_1_6_1","first-page":"1","article-title":"Spherical k-means clustering","volume":"50","author":"Buchta Christian","year":"2012","journal-title":"Journal of Statistical Software"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148285"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 3rd Workshop on Statistical Machine Translation. 224--232","author":"Chang Pi-Chuan"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 89--98","author":"Dhillon Inderjit S."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the SIAM International Conference on Data Mining. 606--610","author":"Ding Chris"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.","volume":"96","author":"Ester Martin","year":"1996"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005601"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972733.6"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.259"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312649"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983698"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 726--735","author":"Huang Lifu","year":"2013"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2009.09.011"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390948.2391006"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1632"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 597--601","author":"Liu Luying","year":"2005"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081895"},{"key":"e_1_2_1_23_1","unstructured":"Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. ACL.  Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. ACL."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031171.1031258"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ADL.1998.670375"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.01.039"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0400054101"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339704"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Stuart Rose Dave Engel Nick Cramer and Wendy Cowley. 2010. Automatic keyword extraction from individual documents. Text Mining (2010) 1--20. https:\/\/www.osti.gov\/biblio\/978967-automatic-keyword-extraction-from-individual-documents.  Stuart Rose Dave Engel Nick Cramer and Wendy Cowley. 2010. Automatic keyword extraction from individual documents. Text Mining (2010) 1--20. https:\/\/www.osti.gov\/biblio\/978967-automatic-keyword-extraction-from-individual-documents.","DOI":"10.1002\/9780470689646.ch1"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","volume":"7","author":"Rosenberg Andrew","year":"2007"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 3rd International AAAI Conference on Weblogs and Social Media.","author":"Sayyadi Hassan","year":"2009"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2542214.2542215"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187957"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487690"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345578"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69858-6_21"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 26th AAAI Conference on Artificial Intelligence.","author":"Wang Dingding","year":"2012"},{"key":"e_1_2_1_38_1","first-page":"15","volume-title":"Proceedings of the 2015 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1055--1065","author":"Wang Lu","year":"2016"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1281--1291","author":"Xu Shize","year":"2013"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860485"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010016"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2009.2015885"},{"key":"e_1_2_1_43_1","volume-title":"Topic Detection and Tracking","author":"Yang Yiming"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1225"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377939","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3377939","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:00Z","timestamp":1750200060000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377939"}},"subtitle":["Extracting Events and Telling Stories from Breaking News"],"short-title":[],"issued":{"date-parts":[[2020,5,13]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3377939"],"URL":"https:\/\/doi.org\/10.1145\/3377939","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,13]]},"assertion":[{"value":"2018-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}