{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:27:38Z","timestamp":1777854458889,"version":"3.51.4"},"reference-count":45,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2016,9,1]],"date-time":"2016-09-01T00:00:00Z","timestamp":1472688000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2017,10]]},"abstract":"<jats:p>Extracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection\/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.<\/jats:p>","DOI":"10.1177\/0165551516666446","type":"journal-article","created":{"date-parts":[[2016,9,2]],"date-time":"2016-09-02T21:48:20Z","timestamp":1472852900000},"page":"696-712","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":8,"title":["A novel algorithm for extracting the user reviews from web pages"],"prefix":"10.1177","volume":"43","author":[{"given":"Erdem","family":"U\u00e7ar","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Trakya University, Turkey"}]},{"given":"Erdin\u00e7","family":"Uzun","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, \u00c7orlu Faculty of Engineering, Nam\u0131k Kemal University, Turkey"}]},{"given":"P\u0131nar","family":"T\u00fcfekci","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, \u00c7orlu Faculty of Engineering, Nam\u0131k Kemal University, Turkey"}]}],"member":"179","published-online":{"date-parts":[[2016,9,1]]},"reference":[{"key":"bibr1-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515595742"},{"key":"bibr2-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775134"},{"key":"bibr3-0165551516666446","unstructured":"Cai D, Yu S, Wen JR, Ma WY. VIPS: a vision based page segmentation algorithm. Microsoft Technical Report MSR-TR-2003-79, 2003. Available at: https:\/\/www.microsoft.com\/en-us\/research\/publication\/vips-a-vision-based-page-segmentation-algorithm\/."},{"key":"bibr4-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775184"},{"key":"bibr5-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956785"},{"key":"bibr6-0165551516666446","first-page":"43","volume-title":"Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03)","author":"Yi L","year":"2003"},{"key":"bibr7-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2011.07.044"},{"key":"bibr8-0165551516666446","first-page":"379","volume":"7","author":"Da\u015f R","year":"2007","journal-title":"Istanbul University: Journal of Electrical & Electronics Engineering (IU-JEEE)"},{"key":"bibr9-0165551516666446","first-page":"310","volume":"3","author":"Da\u015f R","year":"2008","journal-title":"e-Journal of New World Sciences Academy (NWSA), Natural and Applied Sciences"},{"key":"bibr10-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.08.067"},{"key":"bibr11-0165551516666446","first-page":"1037","volume":"9","author":"Da\u015f R","year":"2009","journal-title":"Istanbul University: Journal of Electrical & Electronics Engineering (IU-JEEE)"},{"key":"bibr12-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48298-9_32"},{"key":"bibr13-0165551516666446","first-page":"759","volume":"2","author":"Inamdar SA","year":"2010","journal-title":"International Journal on Computer Science and Engineering"},{"key":"bibr14-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/565117.565137"},{"key":"bibr15-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183654"},{"key":"bibr16-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511522"},{"key":"bibr17-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135788"},{"key":"bibr18-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458237"},{"key":"bibr19-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988700"},{"key":"bibr20-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.82"},{"key":"bibr21-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.109"},{"key":"bibr22-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.140"},{"key":"bibr23-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.135"},{"key":"bibr24-0165551516666446","first-page":"250","volume-title":"Proceedings of 2002 IEEE International Conference on Data Mining","author":"Kovacevic M"},{"key":"bibr25-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36901-5_42"},{"key":"bibr26-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/775152.775155"},{"key":"bibr27-0165551516666446","first-page":"587","volume-title":"Proceedings of the 10th World Wide Web conference, Hong Kong, Hong Kong","author":"Chen J"},{"key":"bibr28-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367549"},{"key":"bibr29-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718542"},{"key":"bibr30-0165551516666446","doi-asserted-by":"crossref","unstructured":"Chakrabarti D, Kumar R, Punera K. Page-level template detection via isotonic smoothing. In: WWW 2007: Proceeding of the 16th international conference on World Wide Web, 2007, pp. 61\u201370.","DOI":"10.1145\/1242572.1242582"},{"key":"bibr31-0165551516666446","first-page":"638","volume-title":"Proceedings of the Sixth International Language Resources and Evaluation (LREC\u201908)","author":"Baroni M"},{"key":"bibr32-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2195"},{"key":"bibr33-0165551516666446","doi-asserted-by":"crossref","unstructured":"Cai R, Yang JM, Lai W. iRobot: an intelligent crawler for web forums. In: WWW 2008: Proceedings of the 17th international conference on World Wide Web, 2008, pp. 447\u2013456.","DOI":"10.1145\/1367497.1367558"},{"key":"bibr34-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1109\/WI.2006.52"},{"key":"bibr35-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390413"},{"key":"bibr36-0165551516666446","first-page":"110","volume-title":"Proceedings of the 5th Hellenic Conference on Artificial Intelligence","author":"Kokkoras F"},{"key":"bibr37-0165551516666446","doi-asserted-by":"crossref","unstructured":"Kushal D, Steve L, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web (WWW \u201803), 2003, pp. 519\u2013528.","DOI":"10.1145\/775152.775226"},{"key":"bibr38-0165551516666446","unstructured":"Mueller C, Liggett MM. For B659: Web Mining. 2005. Available at: http:\/\/osl.iu.edu\/~chemuell\/classes\/b659\/project\/web-evolution.pdf."},{"key":"bibr39-0165551516666446","first-page":"151","volume":"16","author":"Uzun E","year":"2011","journal-title":"Journal of the Technical University Sofia"},{"key":"bibr40-0165551516666446","first-page":"66","volume-title":"2nd International Symposium on Computing in Science & Engineering \u2013 ISCSE","author":"Uzun E","year":"2011"},{"key":"bibr41-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-08-050058-4.50007-3"},{"key":"bibr42-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2015.07.012"},{"key":"bibr43-0165551516666446","doi-asserted-by":"publisher","DOI":"10.1145\/1316902.1316920"},{"key":"bibr44-0165551516666446","doi-asserted-by":"crossref","unstructured":"Uzun E, Agun VH, Yerlikaya T. A hybrid approach for extracting informative content from web pages. Information Processing & Management 2013; 49: 928\u2013944.","DOI":"10.1016\/j.ipm.2013.02.005"},{"key":"bibr45-0165551516666446","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques","author":"Witten IH","year":"2005","edition":"2"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551516666446","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551516666446","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551516666446","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:08:24Z","timestamp":1777504104000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551516666446"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,1]]},"references-count":45,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2017,10]]}},"alternative-id":["10.1177\/0165551516666446"],"URL":"https:\/\/doi.org\/10.1177\/0165551516666446","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,1]]}}}