{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T00:41:31Z","timestamp":1654130491905},"reference-count":73,"publisher":"IGI Global","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,4,1]]},"abstract":"<p>This paper develops and evaluates a BPMN-based process model which identifies and extracts blog content from the web and stores its textual data in a data warehouse for further analyses. Depending on the characteristics of the technologies used to create the weblogs, the process has to perform specific tasks in order to extract blog content correctly. The paper describes three phases: extraction, transformation and loading of data in a repository specifically adapted for blog content extraction. It highlights the objectives in these phases which must be achieved to ensure the correct extraction. The authors integrate the described process in a previously developed framework for blog mining. The authors' process model closes the conceptual gap in this framework as well as the gap in current research of blog mining process models. Furthermore, it can easily be adapted for other web extraction proposals.<\/p>","DOI":"10.4018\/ijiit.2014040102","type":"journal-article","created":{"date-parts":[[2014,9,15]],"date-time":"2014-09-15T13:04:14Z","timestamp":1410786254000},"page":"20-36","source":"Crossref","is-referenced-by-count":0,"title":["Process Model for Content Extraction from Weblogs"],"prefix":"10.4018","volume":"10","author":[{"given":"Andreas","family":"Schieber","sequence":"first","affiliation":[{"name":"University of Technology Dresden, Dresden, Germany"}]},{"given":"Andreas","family":"Hilbert","sequence":"additional","affiliation":[{"name":"University of Technology Dresden, Dresden, Germany"}]}],"member":"2432","reference":[{"key":"ijiit.2014040102-0","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8535.2009.00980.x"},{"key":"ijiit.2014040102-1","doi-asserted-by":"publisher","DOI":"10.1145\/1341531.1341559"},{"key":"ijiit.2014040102-2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32584-7_1"},{"key":"ijiit.2014040102-3","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2009070102"},{"key":"ijiit.2014040102-4","unstructured":"Attardi, G., & Simi, M. (2006). Blog mining through opinionated words. Retrieved from http:\/\/trec.nist.gov\/pubs\/trec15\/papers\/upisa.blog.final.pdf"},{"key":"ijiit.2014040102-5","unstructured":"Azevedo, A., & Santos, M. F. (2008). KDD, SEMMA and CRISP-DM: A parallel overview. Retrieved from http:\/\/www.iadis.net\/dl\/final_uploads\/200812P033.pdf"},{"key":"ijiit.2014040102-6","doi-asserted-by":"publisher","DOI":"10.1080\/10580530801941058"},{"key":"ijiit.2014040102-7","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2010100102"},{"key":"ijiit.2014040102-8","first-page":"1410","article-title":"BlogScope - A system for online analysis of high volume text streams.","author":"N.Bansal","year":"2007","journal-title":"Proceedings of the 33rd International Conference on Very Large Data Bases"},{"key":"ijiit.2014040102-9","article-title":"BlogHarvest: Blog mining and search framework.","author":"N.Belsare","year":"2006","journal-title":"Proceedings of the International Conference on Management of Data"},{"key":"ijiit.2014040102-10","doi-asserted-by":"publisher","DOI":"10.1109\/RCIS.2012.6240440"},{"key":"ijiit.2014040102-11","doi-asserted-by":"crossref","unstructured":"Blei, D., & Lafferty, J. (2009). Topic models. Retrieved from http:\/\/www.cs.princeton.edu\/~blei\/papers\/BleiLafferty2009.pdf","DOI":"10.1201\/9781420059458.ch4"},{"key":"ijiit.2014040102-12","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-68636-1_29"},{"key":"ijiit.2014040102-13","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-1286(99)00052-3"},{"key":"ijiit.2014040102-14","unstructured":"Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 - Step-by-step data mining guide. Retrieved from http:\/\/www.crisp-dm.org\/CRISPWP-0800.pdf"},{"key":"ijiit.2014040102-15","doi-asserted-by":"publisher","DOI":"10.1109\/MITP.2009.1"},{"key":"ijiit.2014040102-16","doi-asserted-by":"publisher","DOI":"10.1145\/1978542.1978562"},{"key":"ijiit.2014040102-17","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2007.07.015"},{"issue":"9","key":"ijiit.2014040102-18","doi-asserted-by":"crossref","first-page":"491","DOI":"10.17705\/1jais.00236","article-title":"A hybrid attribute selection approach for text classification.","volume":"11","author":"C.-H.Chou","year":"2010","journal-title":"Journal of the Association for Information Systems"},{"key":"ijiit.2014040102-19","doi-asserted-by":"publisher","DOI":"10.1016\/j.compind.2009.05.006"},{"key":"ijiit.2014040102-20","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.09.135"},{"key":"ijiit.2014040102-21","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2007.10.010"},{"key":"ijiit.2014040102-22","unstructured":"Davis, H., & Oberholtzer, M. (2008). What are they saying about us? Retrieved from http:\/\/greenfield-ciaosurveys.com\/assets\/pdfs\/Davis_0108-Blogmining.pdf"},{"key":"ijiit.2014040102-23","unstructured":"Dinter, B., & Lorenz, A. (2012). Social business intelligence: A literature review and research agenda. In F. G. Joey (Ed.), Proceedings of the International Conference on Information Systems (ICIS 2012). Orlando, FL: Association for Information Systems; Retrieved from http:\/\/aisel.aisnet.org\/icis2012\/proceedings\/ResearchInProgress\/104\/"},{"key":"ijiit.2014040102-24","first-page":"269","author":"K.Esmaili","year":"2007","journal-title":"BlogDisc: A system for automatic discovery and accumulation of Persian blogs"},{"key":"ijiit.2014040102-25","unstructured":"Fayyad, U. M. (1996). Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press; Retrieved from http:\/\/www.gbv.de\/dms\/goettingen\/190022256.pdf"},{"key":"ijiit.2014040102-26"},{"key":"ijiit.2014040102-27","unstructured":"Fischer, E. (2007). Weblog & Co: Eine neue Mediengeneration und ihr Einfluss auf Wirtschaft und Journalismus (1. Auflage). Saarbr\u00fccken: VDM M\u00fcller. Retrieved from http:\/\/deposit.d-nb.de\/cgi-bin\/dokserv?id=2907285&prov=M&dok_var=1&dok_ext=htm"},{"key":"ijiit.2014040102-28","author":"N.Glance","year":"2005","journal-title":"Deriving marketing intelligence from online discussion"},{"key":"ijiit.2014040102-29","author":"N.Glance","year":"2004","journal-title":"BlogPulse - Automated trend discovery for weblogs"},{"key":"ijiit.2014040102-30","doi-asserted-by":"crossref","unstructured":"Hevner, A., March, S., Park, J., & Ram, S. (2004). Design science in information systems research. Retrieved from http:\/\/www.hec.unil.ch\/yp\/HCI\/articles\/hevner04.pdf","DOI":"10.2307\/25148625"},{"key":"ijiit.2014040102-31","unstructured":"Heyer, G., Quasthoff, U., & Wittig, T. (2006). Text mining: Wissensrohstoff text: Konzepte, Algorithmen, Ergebnisse (1. Auflage). IT lernen. Herdecke: W3L-Verlag. Retrieved from http:\/\/deposit.ddb.de\/cgi-bin\/dokserv?id=2783785&prov=M&dok_var=1&dok_ext=htm"},{"key":"ijiit.2014040102-32","doi-asserted-by":"publisher","DOI":"10.1007\/s00287-006-0091-y"},{"key":"ijiit.2014040102-33","first-page":"1615","article-title":"Social streams blog crawler.","author":"M.Hurst","year":"2009","journal-title":"Proceedings of the 25th International Conference on Data Engineering"},{"key":"ijiit.2014040102-34","first-page":"379","article-title":"Analyse von Meinungen in sozialen Netzwerken des Web 2.0","author":"C.Kaiser","year":"2009","journal-title":"Business services: Konzepte, Technologien, Anwendungen. 9. Internationale Tagung Wirtschaftsinformatik"},{"key":"ijiit.2014040102-35","doi-asserted-by":"crossref","unstructured":"Kaiser, C. (2009b). Opinion mining im Web 2.0 - Konzept und Fallbeispiel. HMD - Praxis der Wirtschaftsinformatik, 268, 90\u201399.","DOI":"10.1007\/BF03340384"},{"key":"ijiit.2014040102-36","author":"H.-G.Kemper","year":"2006","journal-title":"Business Intelligence - Grundlagen und praktische Anwendungen (2. erg\u00e4nzte Auflage)"},{"key":"ijiit.2014040102-37","unstructured":"Kimball, R., & Ross, M. (2002). The data warehouse toolkit: The complete guide to dimensional modeling (2. Auflage). New York, NY: Wiley; Retrieved from http:\/\/www.loc.gov\/catdir\/toc\/wiley021\/2002002284.html"},{"key":"ijiit.2014040102-38","unstructured":"Kolari, P., Finin, T., & Joshi, A. (2006). SVMs for the blogosphere: Blog identification and splog detection weblogs. In AAAI (Ed.), Computational Approaches to Analyzing Weblogs,Papers from the 2006 AAAI Spring Symposium, Stanford, CA (pp. 92\u201399)."},{"key":"ijiit.2014040102-39","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2010.26"},{"key":"ijiit.2014040102-40","doi-asserted-by":"crossref","unstructured":"Nanno, T., Suzuki, Y., Fujiki, T., & Okumura, M. (2004). Automatic collection and monitoring of Japanese weblogs. WWW2004.","DOI":"10.1145\/1013367.1013455"},{"key":"ijiit.2014040102-41"},{"key":"ijiit.2014040102-42","doi-asserted-by":"crossref","unstructured":"Nottingham, M., & Sayre, R. (2005). The atom syndication format. Retrieved from http:\/\/www.atomenabled.org\/developers\/syndication\/atom-format-spec.php","DOI":"10.17487\/rfc4287"},{"key":"ijiit.2014040102-43","doi-asserted-by":"publisher","DOI":"10.2753\/MIS0742-1222240302"},{"key":"ijiit.2014040102-44","unstructured":"Przepiorka, S. (2006). Weblogs, Wikis und die dritte Dimension. In A. Picot & T. Fischer (Eds.), Weblogs professionell. Grundlagen, Konzepte und Praxis im unternehmerischen Umfeld (1st ed., pp. 13\u201330). Heidelberg, Germany: dpunkt-Verlag."},{"key":"ijiit.2014040102-45","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2010070101"},{"key":"ijiit.2014040102-46","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2009.12.002"},{"key":"ijiit.2014040102-47","unstructured":"RSS Advisory Board. (2009). RSS 2.0 Specification (version 2.0.11). Retrieved from http:\/\/www.rssboard.org\/rss-specification"},{"key":"ijiit.2014040102-48"},{"key":"ijiit.2014040102-49","unstructured":"Schieber, A., & Hilbert, A. (2009). Generierung von neuartigem Wissen durch die Analyse von Weblogs. In H. Baars & B. Rieger (Eds.), Perspektiven der betrieblichen Management-und Entscheidungsunterst\u00fctzung. Tagungsband des Forschungskolloquiums Business Intelligence (FKBI09) (Vol. 542, pp. 117\u2013130). CEUR Workshop Proceedings; Retrieved from http:\/\/sunsite.informatik.rwth-aachen.de\/Publications\/CEUR-WS\/Vol-542\/"},{"key":"ijiit.2014040102-50","first-page":"1157","article-title":"Identifikation und Analyse von ironischen und sarkastischen Kundenrezensionen im Web","author":"A.Schieber","year":"2012","journal-title":"Multikonferenz Wirtschaftsinformatik 2012. Tagungsband der MKWI 2012"},{"key":"ijiit.2014040102-51","unstructured":"Schmidt, J. (2006). Weblogs: Eine kommunikationssoziologische Studie (1. Auflage). Kommunikationswissenschaft. Konstanz: UVK Verlagsgesellschaft. Retrieved from http:\/\/www.loc.gov\/catdir\/toc\/fy0711\/2006421196.html"},{"key":"ijiit.2014040102-52","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2012010102"},{"key":"ijiit.2014040102-53","first-page":"971","article-title":"Identifying opinion leaders in the blogosphere.","author":"X.Song","year":"2007","journal-title":"Proceedings of the 16th ACM Conference on Information and Knowledge Management"},{"key":"ijiit.2014040102-54","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2011070105"},{"key":"ijiit.2014040102-55","doi-asserted-by":"crossref","unstructured":"Sriphaew, K., Takamura, H., & Okumura, M. (2008). Cool blog identification using topic-based models. In IEEE\/WIC\/ACM (Ed.), International Conference on Web Intelligence and Intelligent Agent Technology (pp. 402\u2013406). IEEE Press.","DOI":"10.1109\/WIIAT.2008.401"},{"key":"ijiit.2014040102-56","doi-asserted-by":"publisher","DOI":"10.1007\/11362197_10"},{"key":"ijiit.2014040102-57","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2006.02.001"},{"key":"ijiit.2014040102-58","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2013070104"},{"key":"ijiit.2014040102-59","doi-asserted-by":"publisher","DOI":"10.1145\/1183512.1183522"},{"key":"ijiit.2014040102-60","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2005.02.011"},{"key":"ijiit.2014040102-61","first-page":"217","article-title":"Linguistische Annotation","author":"T.Ule","year":"2004","journal-title":"Texttechnologie. Perspektiven und Anwendungen"},{"key":"ijiit.2014040102-62","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84996-226-1"},{"key":"ijiit.2014040102-63","author":"S. M.Weiss","year":"2005","journal-title":"Text mining - Predicitve methods for analyzing unstructured information (1. Auflage)"},{"key":"ijiit.2014040102-64","unstructured":"WordPress. (2013a). Posting activity. Retrieved from http:\/\/en.wordpress.com\/stats\/posting\/"},{"key":"ijiit.2014040102-65","unstructured":"WordPress. (2013b). Stats. Retrieved from http:\/\/en.wordpress.com\/stats\/"},{"key":"ijiit.2014040102-66","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2011010101"},{"key":"ijiit.2014040102-67","doi-asserted-by":"publisher","DOI":"10.1109\/FSKD.2008.371"},{"key":"ijiit.2014040102-68","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2006.10.007"},{"key":"ijiit.2014040102-69","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2010.151"},{"key":"ijiit.2014040102-70","first-page":"1123","article-title":"Topic cube - Topic modeling for OLAP on multidimensional text databases.","author":"D.Zhang","year":"2009","journal-title":"Proceedings of the 2009 SIAM International Conference on Data Mining"},{"key":"ijiit.2014040102-71","doi-asserted-by":"publisher","DOI":"10.4018\/jiit.2013010101"},{"key":"ijiit.2014040102-72","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135993"}],"container-title":["International Journal of Intelligent Information Technologies"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=114957","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T00:24:50Z","timestamp":1654129490000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/ijiit.2014040102"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2014,4,1]]},"references-count":73,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,4]]}},"URL":"https:\/\/doi.org\/10.4018\/ijiit.2014040102","relation":{},"ISSN":["1548-3657","1548-3665"],"issn-type":[{"value":"1548-3657","type":"print"},{"value":"1548-3665","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,4,1]]}}}