{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T19:04:04Z","timestamp":1754161444220,"version":"3.41.2"},"reference-count":43,"publisher":"Emerald","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,23]]},"abstract":"<jats:sec>\n                  <jats:title>Purpose<\/jats:title>\n                  <jats:p>Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic-based framework to address this problem.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Design\/methodology\/approach<\/jats:title>\n                  <jats:p>A two-stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single-topic summarization approach.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Findings<\/jats:title>\n                  <jats:p>The user study demonstrates that the clustering-summarization approach statistically significantly outperforms the plain summarization approach in the multi-topic web site summarization task. Text-based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Research limitations\/implications<\/jats:title>\n                  <jats:p>More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Practical implications<\/jats:title>\n                  <jats:p>The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Originality\/value<\/jats:title>\n                  <jats:p>Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic-based summarization was gained. A classification approach is used to minimize the number of parameters.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1108\/17440081011090220","type":"journal-article","created":{"date-parts":[[2010,11,20]],"date-time":"2010-11-20T07:12:27Z","timestamp":1290237147000},"page":"266-303","source":"Crossref","is-referenced-by-count":2,"title":["Topic-based web site summarization"],"prefix":"10.1108","volume":"6","author":[{"given":"Yongzheng","family":"Zhang","sequence":"first","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Evangelos","family":"Milios","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nur","family":"Zincir-Heywood","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University, Halifax, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"2025072819304662200_b1","doi-asserted-by":"crossref","unstructured":"Allan, J.\n           and Raghavan, H. (2002), \u201cUsing part-of-speech patterns to reduce query ambiguity\u201d, Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, pp. 307-14.","DOI":"10.1145\/564376.564430"},{"key":"2025072819304662200_b2","doi-asserted-by":"crossref","unstructured":"Amitay, E.\n           and Paris, C. (2000), \u201cAutomatically summarising web sites: is there a way around it?\u201d, Proceedings of the Ninth ACM International Conference on Information and Knowledge Management, McLean, VA, USA, pp. 173-9.","DOI":"10.1145\/354756.354816"},{"key":"2025072819304662200_b3","unstructured":"Avanzo, E.D.\n           and Magnini, B. (2005), \u201cA keyphrase-based approach to summarization: the LAKE system at DUC-2005\u201d, paper presented at the Document Understanding Conference (DUC), NIST, Vancouver."},{"key":"2025072819304662200_b4","doi-asserted-by":"crossref","unstructured":"Berger, A.\n           and Mittal, V. (2000), \u201cOCELOT: a system for summarizing web pages\u201d, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 144-51.","DOI":"10.1145\/345508.345565"},{"key":"2025072819304662200_b5","doi-asserted-by":"crossref","unstructured":"Brill, E.\n           (1992), \u201cA simple rule-based part of speech tagger\u201d, Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 152-5.","DOI":"10.3115\/974499.974526"},{"issue":"1","key":"2025072819304662200_b6","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1145\/503104.503109","article-title":"Efficient web browsing on handheld devices using page and form summarization","volume":"20","author":"Buyukkokten","year":"2002","journal-title":"ACM Transactions on Information Systems"},{"key":"2025072819304662200_b7","doi-asserted-by":"crossref","unstructured":"Candan, K.S.\n           and Li, W.-S. (2001), \u201cDiscovering web document associations for web site summarization\u201d, in Kambayashi, Y., Winiwarter, W. and Arikawa, M. (Eds), Third International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Vol. LNCS, Vol. 2114, Springer, Munich, pp. 152-61.","DOI":"10.1007\/3-540-44801-2_16"},{"key":"2025072819304662200_b8","doi-asserted-by":"crossref","unstructured":"Chuang, W.\n           and Yang, J. (2000), \u201cExtracting sentence segments for text summarization: a machine learning approach\u201d, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 152-9.","DOI":"10.1145\/345508.345566"},{"key":"2025072819304662200_b9","doi-asserted-by":"crossref","unstructured":"Delort, J.\n          , Bouchon-Meunier, B. and Rifqi, M. (2003), \u201cEnhanced web document summarization using hyperlinks\u201d, Proceedings of the 14th ACM Conference on Hypertext and Hypermedia, Nottingham, UK, pp. 208-15.","DOI":"10.1145\/900051.900097"},{"issue":"2","key":"2025072819304662200_b10","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s007999900023","article-title":"Automatic recognition of multi-word terms: the C-value\/NC-value method","volume":"3","author":"Frantzi","year":"2000","journal-title":"International Journal on Digital Libraries"},{"key":"2025072819304662200_b11","doi-asserted-by":"crossref","unstructured":"Hardy, H.\n          , Shimizu, N., Strzalkowski, T., Ting, L., Wise, G. and Zhang, X. (2002), \u201cSummarizing large document sets using concept-based clustering\u201d, Proceedings of the Second International Conference on Human Language Technology Research, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp. 235-40.","DOI":"10.3115\/1289189.1289204"},{"key":"2025072819304662200_b12","unstructured":"Hatzivassiloglou, V.\n          , Klavans, J., Holcombe, M., Barzilay, R., Kan, M. and McKeown, K. (2001), \u201cSimfinder: a flexible clustering tool for summarization\u201d, Proceedings of the NAACL'01 Workshop on Automatic Summarization, Pittsburgh, PA, USA, pp. 41-9."},{"key":"2025072819304662200_b13","doi-asserted-by":"crossref","unstructured":"Ji, H.\n          , Luo, Z., Wan, M. and Gao, X. (2002), \u201cSummarizing based on concept counting and hierarchy analysis\u201d, IEEE International Conference on Systems, Man and Cybernetics, Piscataway, NJ, Vol. 3, p. 6.","DOI":"10.1109\/ICSMC.2002.1176050"},{"issue":"5","key":"2025072819304662200_b14","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1145\/324133.324140","article-title":"Authoritative sources in a hyperlinked environment","volume":"46","author":"Kleinberg","year":"1999","journal-title":"Journal of the ACM"},{"issue":"6","key":"2025072819304662200_b15","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/MCISE.2003.1238704","article-title":"Text mining with information-theoretic clustering","volume":"5","author":"Kogan","year":"2003","journal-title":"Computing in Science and Engineering"},{"key":"2025072819304662200_b16","unstructured":"Li, S.\n          , Ouyang, Y., Wang, W. and Sun, B. (2007), \u201cMulti-document summarization using support vector regression\u201d, paper presented at the Document Understanding Conference (DUC), NIST, Rochester, NY."},{"key":"2025072819304662200_b17","doi-asserted-by":"crossref","unstructured":"Li, W.\n          , Ayan, N., Kolak, O., Vu, Q., Takano, H. and Shimamura, H. (2001), \u201cConstructing multi-granular and topic-focused web site maps\u201d, Proceedings of the Tenth International World Wide Web Conference, Hong Kong, China, pp. 343-54.","DOI":"10.1145\/371920.372086"},{"issue":"4","key":"2025072819304662200_b18","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1109\/TKDE.2005.66","article-title":"Toward integrating feature selection algorithms for classification and clustering","volume":"17","author":"Liu","year":"2005","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"1","key":"2025072819304662200_b19","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1007\/s10115-006-0023-9","article-title":"Node similarity in the citation graph","volume":"11","author":"Lu","year":"2007","journal-title":"Knowledge and Information Systems: An International Journal"},{"issue":"1","key":"2025072819304662200_b20","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1145\/1125857.1125861","article-title":"Summary in context: searching versus browsing","volume":"24","author":"McDonald","year":"2006","journal-title":"ACM Transactions on Information Systems"},{"key":"2025072819304662200_b21","doi-asserted-by":"crossref","unstructured":"McKeown, K.\n          , Passonneau, R., Elson, D., Nenkova, A. and Hirschberg, J. (2001), \u201cDo summaries help? A task-based evaluation of multi-document summarization\u201d, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, pp. 210-17.","DOI":"10.1145\/1076034.1076072"},{"key":"2025072819304662200_b22","doi-asserted-by":"crossref","unstructured":"Otterbacher, J.\n          , Radev, D. and Kareem, O. (2006), \u201cNews to go: hierarchical text summarization for mobile devices\u201d, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Seattle, WA, USA, pp. 589-96.","DOI":"10.1145\/1148170.1148271"},{"issue":"3","key":"2025072819304662200_b23","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1177\/0165551507084630","article-title":"Design and development of a concept-based multi-document summarization system for research abstracts","volume":"34","author":"Ou","year":"2008","journal-title":"Journal of Information Science"},{"key":"2025072819304662200_b24","unstructured":"Pelleg, D.\n           and Moore, A. (2000), \u201cX-means: extending K-means with efficient estimation of the number of clusters\u201d, Proceedings of the 17th International Conference on Machine Learning, Standord, CA, USA, pp. 727-34."},{"issue":"3","key":"2025072819304662200_b25","doi-asserted-by":"crossref","first-page":"752","DOI":"10.1016\/j.ipm.2006.06.001","article-title":"Topic discovery based on text mining techniques","volume":"43","author":"Pons-Porrata","year":"2007","journal-title":"Information Processing & Management"},{"key":"2025072819304662200_b27","doi-asserted-by":"crossref","unstructured":"Radev, D.R.\n          , Jing, H., Stys, M. and Tam, D. (2004), \u201cCentroid-based summarization of multiple documents\u201d, Information Processing & Management, Vol. 40, pp. 919-38.","DOI":"10.1016\/j.ipm.2003.10.006"},{"key":"2025072819304662200_b26","doi-asserted-by":"crossref","unstructured":"Radev, D.R.\n          , Fan, W., Qi, H., Wu, H. and Grewal, A. (2002), \u201cProbabilistic question answering on the web\u201d, Proceedings of the 11th International World Wide Web Conference, Honolulu, HI, USA, pp. 408-19.","DOI":"10.1145\/511446.511500"},{"key":"2025072819304662200_b28","doi-asserted-by":"crossref","unstructured":"Spertus, E.\n           (1997), \u201cParaSite: mining structural information on the web\u201d, Proceedings of the Sixth International World Wide Web Conference, Santa Clara, CA, USA, pp. 1205-15.","DOI":"10.1016\/S0169-7552(97)00033-0"},{"key":"2025072819304662200_b29","unstructured":"Stein, G.C.\n          , Bagga, A. and Wise, G.B. (2000a), \u201cMulti-document summarization: methodologies and evaluations\u201d, Proceedings of the Seventh Conference on Automatic Natural Language Processing, Lausanne, Switzerland, pp. 337-46."},{"issue":"4","key":"2025072819304662200_b30","doi-asserted-by":"crossref","first-page":"606","DOI":"10.1111\/0824-7935.00131","article-title":"Interactive, text-based summarization of multiple documents","volume":"16","author":"Stein","year":"2000","journal-title":"Computational Intelligence"},{"key":"2025072819304662200_b31","unstructured":"Steinbach, M.\n          , Karypis, G. and Kumar, V. (2000), \u201cA comparison of document clustering techniques\u201d, Proceedings of SIGKDD'00 Workshop on Text Mining, Boston, MA, USA."},{"key":"2025072819304662200_b33","unstructured":"Turney, P.\n           (2003), \u201cCoherent keyphrase extraction via web mining\u201d, Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, pp. 434-9."},{"issue":"1","key":"2025072819304662200_b34","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1108\/17440080580000083","article-title":"Towards the identification of keywords in the web site text content: a methodological approach","volume":"1","author":"Velasquez","year":"2005","journal-title":"International Journal of Web Information Systems"},{"key":"2025072819304662200_b35","unstructured":"Verma, R.\n          , Chen, P. and Lu, W. (2007), \u201cA semantic free-text summarization system using ontology knowledge\u201d, paper presented at the Document Understanding Conference (DUC), NIST, Rochester, NY."},{"key":"2025072819304662200_b36","doi-asserted-by":"crossref","unstructured":"Wang, Y.\n           and Kitsuregawa, M. (2002), \u201cEvaluating contents-link coupled web page clustering for web search results\u201d, Proceedings of the 11th ACM International Conference on Information and Knowledge Management, McLean, VA, USA, pp. 499-506.","DOI":"10.1145\/584792.584875"},{"issue":"2","key":"2025072819304662200_b37","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1108\/17440080680000102","article-title":"Clustering web documents using co-citation, coupling, incoming, and outgoing hyperlinks: a comparative performance analysis of algorithms","volume":"2","author":"Wijaya","year":"2006","journal-title":"International Journal of Web Information Systems"},{"key":"2025072819304662200_b38","doi-asserted-by":"crossref","unstructured":"Witten, I.\n          , Paynter, G., Frank, E., Gutwin, C. and Nevill-Manning, C. (1999), \u201cKEA: practical automatic keyphrase extraction\u201d, Proceedings of the Fourth ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 254-5.","DOI":"10.1145\/313238.313437"},{"key":"2025072819304662200_b39","unstructured":"Yang, Y.\n           and Pedersen, J. (1997), \u201cA comparative study on feature selection in text categorization\u201d, Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, pp. 412-20."},{"key":"2025072819304662200_b40","unstructured":"Ye, S.\n           and Chua, T.-S. (2006), \u201cNUS at DUC 2006: document concept lattice for summarization\u201d, paper presented at the Document Understanding Conference (DUC), NIST, Brooklyn, NY."},{"key":"2025072819304662200_b32","unstructured":"Yih, W.-T.\n          , Goodman, J., Vanderwende, L. and Suzuki, H. (2007), \u201cMulti-document summarization by maximizing informative content-words\u201d, IJCAI, Hyderabad, India, pp. 1776-82."},{"issue":"5","key":"2025072819304662200_b41","first-page":"323","article-title":"A comparative study on key phrase extraction methods in automatic web site summarization","volume":"5","author":"Zhang","year":"2007","journal-title":"Journal of Digital Information Management"},{"issue":"1","key":"2025072819304662200_b42","first-page":"39","article-title":"World wide web site summarization","volume":"2","author":"Zhang","year":"2004","journal-title":"Web Intelligence and Agent Systems: An International Journal"},{"key":"2025072819304662200_b43","doi-asserted-by":"crossref","unstructured":"Zhang, Y.\n          , Zincir-Heywood, N. and Milios, E. (2005), \u201cNarrative text classification for automatic key phrase extraction in web document corpora\u201d, Proceedings of the Seventh ACM International Workshop on Web Information and Data Management, Bremen, Germany, pp. 51-8.","DOI":"10.1145\/1097047.1097059"}],"container-title":["International Journal of Web Information Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/17440081011090220","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/17440081011090220\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/ijwis\/article-pdf\/6\/4\/266\/1115149\/17440081011090220.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ijwis\/article-pdf\/6\/4\/266\/1115149\/17440081011090220.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T23:31:05Z","timestamp":1753745465000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ijwis\/article\/6\/4\/266\/163791\/Topic-based-web-site-summarization"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11,23]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,11,23]]}},"URL":"https:\/\/doi.org\/10.1108\/17440081011090220","relation":{},"ISSN":["1744-0084","1744-0092"],"issn-type":[{"type":"print","value":"1744-0084"},{"type":"electronic","value":"1744-0092"}],"subject":[],"published":{"date-parts":[[2010,11,23]]}}}