{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T14:15:18Z","timestamp":1761401718496},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2013,4]]},"abstract":"<jats:p>Social content, such as Twitter updates, often have the quickest first-hand reports of news events, as well as numerous commentaries that are indicative of public view of such events. As such, social updates provide a good complement to professionally written news articles. In this paper we consider the problem of automatically annotating news stories with social updates (tweets), at a news website serving high volume of pageviews. The high rate of both the pageviews (millions to billions a day) and of the incoming tweets (more than 100 millions a day) make real-time indexing of tweets ineffective, as this requires an index that is both queried and updated extremely frequently. The rate of tweet updates makes caching techniques almost unusable since the cache would become stale very quickly.<\/jats:p><jats:p>We propose a novel architecture where each story is treated as a subscription for tweets relevant to the story's content, and new algorithms that efficiently match tweets to stories, proactively maintaining the top-k tweets for each story. Such top-k pub-sub consumes only a small fraction of the resource cost of alternative solutions, and can be applicable to other large scale content-based publish-subscribe problems. We demonstrate the effectiveness of our approach on realworld data: a corpus of news stories from Yahoo! News and a log of Twitter updates.<\/jats:p>","DOI":"10.14778\/2536336.2536340","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"385-396","source":"Crossref","is-referenced-by-count":32,"title":["Top-k publish-subscribe for social annotation of news"],"prefix":"10.14778","volume":"6","author":[{"given":"Alexander","family":"Shraer","sequence":"first","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maxim","family":"Gurevich","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marcus","family":"Fontoura","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vanja","family":"Josifovski","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2013,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1145\/301308.301326","volume-title":"PODC","author":"Aguilera M. K.","year":"1999","unstructured":"M. K. Aguilera , R. E. Strom , D. C. Sturman , M. Astley , and T. D. Chandra . Matching events in a content-based subscription system . In PODC , pages 53 - 61 , 1999 . M. K. Aguilera, R. E. Strom, D. C. Sturman, M. Astley, and T. D. Chandra. Matching events in a content-based subscription system. In PODC, pages 53-61, 1999."},{"key":"e_1_2_1_2_1","first-page":"1","volume-title":"PODS","author":"Babcock B.","year":"2002","unstructured":"B. Babcock , S. Babu , M. Datar , R. Motwani , and J. Widom . Models and issues in data stream systems . In PODS , pages 1 - 16 , 2002 . B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1-16, 2002."},{"key":"e_1_2_1_3_1","volume-title":"Modern Information Retrieval","author":"Baeza-Yates R.","year":"1999","unstructured":"R. Baeza-Yates and B. Ribeiro-Neto . Modern Information Retrieval . Addison Wesley , 1999 . R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999."},{"key":"e_1_2_1_4_1","first-page":"262","volume-title":"ICDCS","author":"Banavar G.","year":"1999","unstructured":"G. Banavar , T. Chanra , B. Mukherjee , J. Nagarajarao , R. E. Strom , and D. C. Sturman . An efficient multicast protocol for content-based publish-subscribe systems . In ICDCS , pages 262 - 272 , 1999 . G. Banavar, T. Chanra, B. Mukherjee, J. Nagarajarao, R. E. Strom, and D. C. Sturman. An efficient multicast protocol for content-based publish-subscribe systems. In ICDCS, pages 262-272, 1999."},{"key":"e_1_2_1_5_1","first-page":"82","volume-title":"WWW","author":"Blanco R.","year":"2010","unstructured":"R. Blanco , E. Bortnikov , F. Junqueira , R. Lempel , L. Telloli , and H. Zaragoza . Caching search engine results over incremental indices . In WWW , pages 82 - 89 , 2010 . R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza. Caching search engine results over incremental indices. In WWW, pages 82-89, 2010."},{"key":"e_1_2_1_6_1","first-page":"156","volume-title":"ICDE","author":"Bohm C.","year":"2007","unstructured":"C. Bohm , B. C. Ooi , C. Plant , and Y. Yan . Efficiently processing continuous k-nn queries on data streams . In ICDE , pages 156 - 165 , 2007 . C. Bohm, B. C. Ooi, C. Plant, and Y. Yan. Efficiently processing continuous k-nn queries on data streams. In ICDE, pages 156-165, 2007."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1145\/956863.956944","volume-title":"CIKM","author":"Broder A. Z.","year":"2003","unstructured":"A. Z. Broder , D. Carmel , M. Herscovici , A. Soffer , and J. Zien . Efficient query evaluation using a two-level retrieval process . In CIKM , pages 426 - 434 , 2003 . A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, pages 426-434, 2003."},{"key":"e_1_2_1_8_1","first-page":"97","volume-title":"SIGIR","author":"Buckley C.","year":"1985","unstructured":"C. Buckley and A. F. Lewit . Optimization of inverted vector searches . In SIGIR , pages 97 - 110 , 1985 . C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In SIGIR, pages 97-110, 1985."},{"key":"e_1_2_1_9_1","volume-title":"TREC","author":"Carmel D.","year":"2006","unstructured":"D. Carmel and E. Amitay . Juru at 2006: Taat versus daat in the terabyte track . In TREC , 2006 . D. Carmel and E. Amitay. Juru at 2006: Taat versus daat in the terabyte track. In TREC, 2006."},{"key":"e_1_2_1_10_1","first-page":"163","volume-title":"SIGCOMM","author":"Carzaniga A.","year":"2003","unstructured":"A. Carzaniga and A. L. Wolf . Forwarding in a content-based network . In SIGCOMM , pages 163 - 174 , 2003 . A. Carzaniga and A. L. Wolf. Forwarding in a content-based network. In SIGCOMM, pages 163-174, 2003."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1145\/958942.958947","article-title":"Path sharing and predicate evaluation for high-performance xml filtering","volume":"28","author":"Diao Y.","year":"2003","unstructured":"Y. Diao , M. Altinel , M. J. Franklin , H. Zhang , and P. Fischer . Path sharing and predicate evaluation for high-performance xml filtering . TODS , 28 : 467 - 516 , 2003 . Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path sharing and predicate evaluation for high-performance xml filtering. TODS, 28:467-516, 2003.","journal-title":"TODS"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1145\/375663.375677","volume-title":"SIGMOD","author":"Fabret F.","year":"2001","unstructured":"F. Fabret , H. A. Jacobsen , F. Llirbat , J. Pereira , K. A. Ross , and D. Shasha . Filtering algorithms and implementation for very fast publish\/subscribe systems . In SIGMOD , pages 115 - 126 , 2001 . F. Fabret, H. A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering algorithms and implementation for very fast publish\/subscribe systems. In SIGMOD, pages 115-126, 2001."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1145\/375551.375567","volume-title":"PODS","author":"Fagin R.","year":"2001","unstructured":"R. Fagin , A. Lotem , and M. Naor . Optimal aggregation algorithms for middleware . In PODS , pages 102 - 113 , 2001 . R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, pages 102-113, 2001."},{"key":"e_1_2_1_14_1","unstructured":"Google Alerts. http:\/\/alerts.google.com\/. Google Alerts. http:\/\/alerts.google.com\/."},{"key":"e_1_2_1_15_1","first-page":"877","volume-title":"CIKM","author":"Haghani P.","year":"2009","unstructured":"P. Haghani , S. Michel , and K. Aberer . Evaluating top-k queries over incomplete data streams . In CIKM , pages 877 - 886 , 2009 . P. Haghani, S. Michel, and K. Aberer. Evaluating top-k queries over incomplete data streams. In CIKM, pages 877-886, 2009."},{"key":"e_1_2_1_16_1","first-page":"489","volume-title":"CIKM","author":"Haghani P.","year":"2010","unstructured":"P. Haghani , S. Michel , and K. Aberer . The gist of everything new: personalized top-k processing over web 2.0 streams . In CIKM , pages 489 - 498 , 2010 . P. Haghani, S. Michel, and K. Aberer. The gist of everything new: personalized top-k processing over web 2.0 streams. In CIKM, pages 489-498, 2010."},{"key":"e_1_2_1_17_1","unstructured":"B. Keane. Twitter v the msm: covering gaddafi's war against reality. http:\/\/www.crikey.com.au\/2011\/03\/21\/ twitter-v-the-msm-covering-gaddafis-war -against-reality\/ 2011. B. Keane. Twitter v the msm: covering gaddafi's war against reality. http:\/\/www.crikey.com.au\/2011\/03\/21\/ twitter-v-the-msm-covering-gaddafis-war -against-reality\/ 2011."},{"key":"e_1_2_1_18_1","volume-title":"Efficiency Issues in Information Retrieval Workshop; European Conference for Information Retrieval","author":"Lacour P.","year":"2008","unstructured":"P. Lacour , C. Macdonald , and I. Ounis . Efficiency comparison of document matching techniques . In Efficiency Issues in Information Retrieval Workshop; European Conference for Information Retrieval , 2008 . P. Lacour, C. Macdonald, and I. Ounis. Efficiency comparison of document matching techniques. In Efficiency Issues in Information Retrieval Workshop; European Conference for Information Retrieval, 2008."},{"issue":"1","key":"e_1_2_1_19_1","first-page":"451","article-title":"Scalable ranked publish\/subscribe","volume":"1","author":"Machanavajjhala A.","year":"2008","unstructured":"A. Machanavajjhala , E. Vee , M. N. Garofalakis , and J. Shanmugasundaram . Scalable ranked publish\/subscribe . PVLDB , 1 ( 1 ): 451 - 462 , 2008 . A. Machanavajjhala, E. Vee, M. N. Garofalakis, and J. Shanmugasundaram. Scalable ranked publish\/subscribe. PVLDB, 1(1):451-462, 2008.","journal-title":"PVLDB"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1145\/1385989.1386006","volume-title":"DEBS","author":"Pripuzic K.","year":"2008","unstructured":"K. Pripuzic , I. P. Zarko , and K. Aberer . Top-k\/w publish\/subscribe: finding k most relevant publications in sliding time window w . In DEBS , pages 127 - 138 , 2008 . K. Pripuzic, I. P. Zarko, and K. Aberer. Top-k\/w publish\/subscribe: finding k most relevant publications in sliding time window w. In DEBS, pages 127-138, 2008."},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1145\/502034.502050","volume-title":"SOSP","author":"Snoeren A. C.","year":"2001","unstructured":"A. C. Snoeren , K. Conley , and D. K. Gifford . Mesh-based content routing using xml . In SOSP , pages 160 - 173 , 2001 . A. C. Snoeren, K. Conley, and D. K. Gifford. Mesh-based content routing using xml. In SOSP, pages 160-173, 2001."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1145\/1076034.1076074","volume-title":"SIGIR","author":"Strohman T.","year":"2005","unstructured":"T. Strohman , H. R. Turtle , and W. B. Croft . Optimization strategies for complex queries . In SIGIR , pages 219 - 225 , 2005 . T. Strohman, H. R. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR, pages 219-225, 2005."},{"issue":"6","key":"e_1_2_1_23_1","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1016\/0306-4573(95)00020-H","article-title":"Query evaluation: Strategies and optimizations","volume":"31","author":"Turtle H. R.","year":"1995","unstructured":"H. R. Turtle and J. Flood . Query evaluation: Strategies and optimizations . Information Processing and Management , 31 ( 6 ): 831 - 850 , 1995 . H. R. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Information Processing and Management, 31(6):831-850, 1995.","journal-title":"Information Processing and Management"},{"key":"e_1_2_1_24_1","first-page":"194","volume-title":"VLDB","author":"Weber R.","year":"1998","unstructured":"R. Weber , H.-J. Schek , and S. Blott . A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces . In VLDB , pages 194 - 205 , 1998 . R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194-205, 1998."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1145\/1135777.1135845","volume-title":"WWW","author":"Yang B.","year":"2006","unstructured":"B. Yang and G. Jeh . Retroactive answering of search queries . In WWW , pages 457 - 466 , 2006 . B. Yang and G. Jeh. Retroactive answering of search queries. In WWW, pages 457-466, 2006."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2536336.2536340","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,14]],"date-time":"2023-07-14T14:45:08Z","timestamp":1689345908000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2536336.2536340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,4]]},"references-count":25,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2013,4]]}},"alternative-id":["10.14778\/2536336.2536340"],"URL":"https:\/\/doi.org\/10.14778\/2536336.2536340","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2013,4]]}}}