{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:34:54Z","timestamp":1760243694571,"version":"build-2065373602"},"reference-count":26,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2012,8,24]],"date-time":"2012-08-24T00:00:00Z","timestamp":1345766400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters\u2014i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he\/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.<\/jats:p>","DOI":"10.3390\/a5030364","type":"journal-article","created":{"date-parts":[[2012,8,24]],"date-time":"2012-08-24T10:13:36Z","timestamp":1345803216000},"page":"364-378","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Incremental Clustering of News Reports"],"prefix":"10.3390","volume":"5","author":[{"given":"Joel","family":"Azzopardi","sequence":"first","affiliation":[{"name":"Faculty of ICT, University of Malta, Msida, MSD2080, Malta"}]},{"given":"Christopher","family":"Staff","sequence":"additional","affiliation":[{"name":"Faculty of ICT, University of Malta, Msida, MSD2080, Malta"}]}],"member":"1968","published-online":{"date-parts":[[2012,8,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Azzopardi, J., and Staff, C. (2012, January 26\u201329). Fusion of News Reports Using Surface-Based Methods. WAINA\u201912: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications Workshops, Fukuoka, Japan.","DOI":"10.1109\/WAINA.2012.113"},{"key":"ref_2","unstructured":"Azzopardi, J., and Staff, C. (2012, January 28\u201330). Automatic Adaptation and Recommendation of News Reports using Surface-Based Methods. PAAMS\u2019 12 (Special Sessions): Proceedings of the 10th International Conference on Practical Applications of Agents and Multi-Agent Systems, Salamanca, Spain."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ji, X., and Xu, W. (2006, January 6\u201311). Document Clustering with Prior Knowledge. SIGIR\u2019 06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA.","DOI":"10.1145\/1148170.1148241"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Surdeanu, M., Turmo, J., and Ageno, A. (2005, January 21\u201324). A Hybrid Unsupervised Approach for Document Clustering. KDD\u2019 05: Proceedings of the Eleventh ACM SIGKDD International Conference On Knowledge Discovery in Data Mining, Chicago, IL, USA.","DOI":"10.1145\/1081870.1081957"},{"key":"ref_5","unstructured":"Kang, B.H., Kim, Y.S., and Choi, Y.J. (2007, January 2\u20136). Does Multi-User Document Classification Really Help Knowledge Management?. AI\u2019 07: Proceedings of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1145\/321160.321165","article-title":"Automatic document classification","volume":"10","author":"Borko","year":"1963","journal-title":"J. ACM"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Larsen, B., and Aone, C. (1999, January 15\u201318). Fast and Effective Text Mining Using Linear-Time Document Clustering. KDD\u2019 99: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA.","DOI":"10.1145\/312129.312186"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/1324185.1324190","article-title":"Overview and semantic issues of text mining","volume":"36","author":"Stavrianou","year":"2007","journal-title":"SIGMOD Rec."},{"key":"ref_9","unstructured":"Viles, C.L., and French, J.C. (December, January 29). On the Update of Term Weights in Dynamic Information Retrieval Systems. CIKM\u2019 95: Proceedings of the Fourth International Conference on Information and Knowledge Management, Baltimore, MD, USA."},{"key":"ref_10","unstructured":"Aslam, J., Pelekhov, K., and Rus, D. (1999, January 17\u201319). A Practical Clustering Algorithm for Static and Dynamic Information Organization. SODA\u2019 99: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, MD, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Toda, H., and Kataoka, R. (2005, January 10\u201314). A Clustering Method for News Articles Retrieval System. WWW\u2019 05: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan.","DOI":"10.1145\/1062745.1062832"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gulli, A. (2005, January 10\u201314). The Anatomy of a News Search Engine. WWW\u2019 05: Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan.","DOI":"10.1145\/1062745.1062778"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sahoo, N., Callan, J., Krishnan, R., Duncan, G., and Padman, R. (2006, January 5\u201311). Incremental Hierarchical Clustering of Text Documents. CIKM\u2019 06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, VA, USA.","DOI":"10.1145\/1183614.1183667"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Luo, G., Tang, C., and Yu, P.S. (2007, January 11\u201314). Resource-Adaptive Real-Time New Event Detection. SIGMOD\u2019 07: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China.","DOI":"10.1145\/1247480.1247536"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Stokes, N., and Carthy, J. (2001, January 18\u201321). First Story Detection Using a Composite Document Representation. HLT\u2019 01: Proceedings of the First International Conference on Human Language Technology Research, San Diego, CA, USA.","DOI":"10.3115\/1072133.1072182"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1145\/361454.361509","article-title":"Dynamic document processing","volume":"15","author":"Salton","year":"1972","journal-title":"Commun. ACM"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Cardoso-Cachopo, A., and Oliveira, A.L. (2007, January 11\u201315). Semi-Supervised Single-Label Text Categorization Using Centroid-Based Classifiers. SAC\u2019 07: Proceedings of the 2007 ACM Symposium on Applied Computing, Seoul, Korea.","DOI":"10.1145\/1244002.1244189"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/263868.263871","article-title":"A blueprint for automatic indexing","volume":"31","author":"Salton","year":"1997","journal-title":"SIGIR Forum"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, C., Zhang, M., Ma, S., and Ru, L. (2008, January 21\u201325). Automatic Online News Issue Construction in Web Environment. WWW\u2019 08: Proceeding of the 17th International Conference on World Wide Web, Beijing, China.","DOI":"10.1145\/1367497.1367560"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Braun, R.K., and Kaneshiro, R. (2004). Exploiting Topic Pragmatics for New Event Detection in tdt-2004, Proc. of Topic Detection and Tracking Workshop, ACM Press.","DOI":"10.21236\/ADA439316"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., and Sigelman, S. (2002, January 24\u201327). Tracking and Summarizing News on a Daily Basis with Columbia\u2019s Newsblaster. HLT\u2019 02: Proceedings of the Human Language Technology Conference, San Diego, CA, USA.","DOI":"10.3115\/1289189.1289212"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Arora, R., and Bangalore, P. (2005, January 18\u201320). Text Mining: Classification & Clustering of Articles Related to Sports. ACM-SE 43: Proceedings of the 43rd Annual Southeast Regional Conference, Kennesaw, GA, USA.","DOI":"10.1145\/1167350.1167397"},{"key":"ref_23","unstructured":"Steinbach, M., Karypis, G., and Kumar, V. (2000, January 20\u201323). A Comparison of Document Clustering Techniques. Proceedings of the KDD Workshop on Text Mining, Boston, MA, USA."},{"key":"ref_24","unstructured":"Porter, M.F. (1997). Readings in Information Retrieval, Morgan Kaufmann Publishers Inc."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Deng, S., and Peng, H. (2006, January 18\u201322). Document Classification Based on Support Vector Machine Using a Concept Vector Model. WI\u2019 06: Proceedings of the 2006 IEEE\/WIC\/ACM International Conference on Web Intelligence, Hong Kong, China.","DOI":"10.1109\/WI.2006.65"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hearst, M.A., and Pedersen, J.O. (1996, January 18\u201322). Reexamining the Cluster Hypothesis: Scatter\/gather on Retrieval Results. SIGIR\u2019 96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland.","DOI":"10.1145\/243199.243216"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/3\/364\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:52:00Z","timestamp":1760219520000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/5\/3\/364"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8,24]]},"references-count":26,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2012,9]]}},"alternative-id":["a5030364"],"URL":"https:\/\/doi.org\/10.3390\/a5030364","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2012,8,24]]}}}