{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T22:08:50Z","timestamp":1740175730936,"version":"3.37.3"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T00:00:00Z","timestamp":1656979200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T00:00:00Z","timestamp":1656979200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61872298"],"award-info":[{"award-number":["61872298"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004829","name":"Science and Technology Department of Sichuan Province","doi-asserted-by":"crossref","award":["2021YFQ0008"],"award-info":[{"award-number":["2021YFQ0008"]}],"id":[{"id":"10.13039\/501100004829","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Education and Teaching Reform Research Project of Xihua University","award":["xjjg2019026"],"award-info":[{"award-number":["xjjg2019026"]}]},{"name":"the College Student Innovation and Entrepreneurship Training Project of Sichuan Province","award":["S202110650044"],"award-info":[{"award-number":["S202110650044"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The focused crawler grabs continuously web pages related to the given topic according to priorities of unvisited hyperlinks. In many previous studies, the focused crawlers predict priorities of unvisited hyperlinks based on the text similarity models. However, the representation terms of the web page ignore the phenomenon of polysemy, and the topic similarity of the text cannot combine the cosine similarity and the semantic similarity effectively. To address these problems, this paper proposes a focused crawler based on semantic disambiguation vector space model (SDVSM). The SDVSM method combines the semantic disambiguation graph (SDG) and the semantic vector space model (SVSM). The SDG is used to remove the ambiguation terms irrelevant to the given topic from representation terms of retrieved web pages. The SVSM is used to calculate the topic similarity of the text by constructing text and topic semantic vectors based on TF\u2009\u00d7\u2009IDF weights of terms and semantic similarities between terms. The experiment results indicate that the SDVSM method can improve the performance of the focused crawler by comparing different evaluation indicators for four focused crawlers. In conclusion, the proposed method can make the focused crawler grab the higher quality and more quantity web pages related to the given topic from the Internet.<\/jats:p>","DOI":"10.1007\/s40747-022-00707-8","type":"journal-article","created":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T05:02:33Z","timestamp":1656997353000},"page":"345-366","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A focused crawler based on semantic disambiguation vector space model"],"prefix":"10.1007","volume":"9","author":[{"given":"Wenjun","family":"Liu","sequence":"first","affiliation":[]},{"given":"Yu","family":"He","sequence":"additional","affiliation":[]},{"given":"Jing","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Yajun","family":"Du","sequence":"additional","affiliation":[]},{"given":"Xing","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Tiejun","family":"Xi","sequence":"additional","affiliation":[]},{"given":"Zurui","family":"Gan","sequence":"additional","affiliation":[]},{"given":"Pengjun","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Xiaoping","family":"Huang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,5]]},"reference":[{"issue":"2","key":"707_CR1","first-page":"461","volume":"21","author":"W Wang","year":"2021","unstructured":"Wang W, Yu LH (2021) UCrawler: a learning-based web crawler using a URL knowledge base. J Comput Methods Sci Eng 21(2):461\u2013474","journal-title":"J Comput Methods Sci Eng"},{"issue":"10","key":"707_CR2","doi-asserted-by":"publisher","first-page":"8175","DOI":"10.1007\/s11227-019-02787-9","volume":"76","author":"JG Lee","year":"2020","unstructured":"Lee JG, Bae D, Kim S et al (2020) An effective approach to enhancing a focused crawler using Google. J Supercomputing 76(10):8175\u20138192","journal-title":"J Supercomputing"},{"issue":"2","key":"707_CR3","first-page":"105","volume":"21","author":"KSS Prabha","year":"2021","unstructured":"Prabha KSS, Mahesh C, Raja SP (2021) An enhanced semantic focused web crawler based on hybrid string matching algorithm. Cybern Inf Technol 21(2):105\u2013120","journal-title":"Cybern Inf Technol"},{"issue":"11\u201312","key":"707_CR4","doi-asserted-by":"publisher","first-page":"7577","DOI":"10.1007\/s11042-019-08252-2","volume":"79","author":"A Capuano","year":"2020","unstructured":"Capuano A, Rinaldi AM, Russo C (2020) An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques. Multimed Tools Appl 79(11\u201312):7577\u20137598","journal-title":"Multimed Tools Appl"},{"issue":"3","key":"707_CR5","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1504\/IJBIC.2021.114877","volume":"17","author":"N Kuze","year":"2021","unstructured":"Kuze N, Ishikura S, Yagi T et al (2021) Classification of diversified web crawler accesses inspired by biological adaptation. Int J Bio-Inspir Comput 17(3):165\u2013173","journal-title":"Int J Bio-Inspir Comput"},{"issue":"3","key":"707_CR6","first-page":"23","volume":"9","author":"S Gupta","year":"2019","unstructured":"Gupta S, Duhan N, Bansal P (2019) An approach for focused crawler to harvest digital academic documents in online digital libraries. Int J Inf Retr Res 9(3):23\u201347","journal-title":"Int J Inf Retr Res"},{"key":"707_CR7","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.patrec.2020.12.003","volume":"142","author":"S Rajiv","year":"2021","unstructured":"Rajiv S, Navaneethan C (2021) Keyword weight optimization using gradient strategies in event focused web crawling. Pattern Recogn Lett 142:3\u201310","journal-title":"Pattern Recogn Lett"},{"issue":"1","key":"707_CR8","first-page":"58","volume":"27","author":"AQ Zhou","year":"2020","unstructured":"Zhou AQ, Zhou YS (2020) Research on the relationship network in customer innovation community based on text mining and social network analysis. Teh Vjesn-Tech Gaz 27(1):58\u201366","journal-title":"Teh Vjesn-Tech Gaz"},{"issue":"11","key":"707_CR9","doi-asserted-by":"publisher","first-page":"3837","DOI":"10.3390\/app10113837","volume":"10","author":"J Hernandez","year":"2020","unstructured":"Hernandez J, Marin-Castro HM, Morales-Sandoval M (2020) A semantic focused web crawler based on a knowledge representation schema. Appl Sci-Basel 10(11):3837","journal-title":"Appl Sci-Basel"},{"issue":"6","key":"707_CR10","first-page":"122","volume":"6","author":"PRJ Dhanith","year":"2021","unstructured":"Dhanith PRJ, Surendiran B, Raja SP (2021) A word embedding based approach for focused web crawling using the recurrent neural network. Int J Interact Multimed Artif Intell 6(6):122\u2013132","journal-title":"Int J Interact Multimed Artif Intell"},{"issue":"1","key":"707_CR11","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.3233\/JIFS-182683","volume":"37","author":"ME ElAraby","year":"2019","unstructured":"ElAraby ME, Abuelenin SM, Moftah HM et al (2019) A new architecture for improving focused crawling using deep neural network. J Intell Fuzzy Syst 37(1):1233\u20131245","journal-title":"J Intell Fuzzy Syst"},{"key":"707_CR12","doi-asserted-by":"publisher","first-page":"115560","DOI":"10.1016\/j.eswa.2021.115560","volume":"184","author":"I Bifulco","year":"2021","unstructured":"Bifulco I, Cirillo S, Esposito C et al (2021) An intelligent system for focused crawling from big data sources. Expert Syst Appl 184:115560","journal-title":"Expert Syst Appl"},{"issue":"4","key":"707_CR13","doi-asserted-by":"publisher","first-page":"608","DOI":"10.1109\/TSC.2015.2414931","volume":"9","author":"F Zhao","year":"2016","unstructured":"Zhao F, Zhou JY, Nie C et al (2016) SmartCrawler: a two-stage crawler for efficiently harvesting deep-web interfaces. IEEE Trans Serv Comput 9(4):608\u2013620","journal-title":"IEEE Trans Serv Comput"},{"issue":"11","key":"707_CR14","first-page":"613","volume":"18","author":"G Salton","year":"1975","unstructured":"Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun Assoc Comput Mach 18(11):613\u2013620","journal-title":"Commun Assoc Comput Mach"},{"key":"707_CR15","doi-asserted-by":"crossref","unstructured":"Varelas G, Voutsakis E, Raftopoulou P et al (2005) Semantic similarity methods in WordNet and their application to information retrieval on the web. In: Proceedings of the 7th annual ACM international workshop on Web information and data management, Bremen, Germany, p 10\u201316.","DOI":"10.1145\/1097047.1097051"},{"issue":"1\u20137","key":"707_CR16","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/S0169-7552(98)00110-X","volume":"30","author":"S Brin","year":"1998","unstructured":"Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1\u20137):107\u2013117","journal-title":"Comput Netw ISDN Syst"},{"key":"707_CR17","unstructured":"Diligenti M, Coetzee FM, Lawrence S et al (2000) Focused crawling using context graphs. In: Proceedings of the 26th International Conference on Very Large Database (VLDB), Cairo, Egypt, p 527\u2013534."},{"issue":"3","key":"707_CR18","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1109\/TCDS.2019.2937796","volume":"12","author":"S Vashishtha","year":"2020","unstructured":"Vashishtha S, Susan S (2020) Sentiment cognition from words shortlisted by fuzzy entropy. IEEE Trans Cogn Dev Syst 12(3):541\u2013550","journal-title":"IEEE Trans Cogn Dev Syst"},{"key":"707_CR19","doi-asserted-by":"publisher","first-page":"140261","DOI":"10.1109\/ACCESS.2020.3007763","volume":"8","author":"Y Du","year":"2020","unstructured":"Du Y, Huo H (2020) News text summarization based on multi-feature and fuzzy logic. IEEE Access 8:140261\u2013140272","journal-title":"IEEE Access"},{"issue":"1","key":"707_CR20","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1109\/TSMC.1985.6313399","volume":"15","author":"T Takagi","year":"1985","unstructured":"Takagi T, Sugeno M (1985) Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 15(1):116\u2013132","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"707_CR21","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1016\/j.eswa.2018.07.047","volume":"115","author":"FB Goularte","year":"2019","unstructured":"Goularte FB, Nassar SM, Fileto R et al (2019) A text summarization method based on fuzzy rules and applicable to automated assessment. Expert Syst Appl 115:264\u2013275","journal-title":"Expert Syst Appl"},{"issue":"2","key":"707_CR22","doi-asserted-by":"publisher","first-page":"1983","DOI":"10.3233\/JIFS-189201","volume":"40","author":"C Nicolas","year":"2021","unstructured":"Nicolas C, Gil-Lafuente J, Urrutia A et al (2021) Using fuzzy Indicators in customer experience analytics. J Intell Fuzzy Syst 40(2):1983\u20131996","journal-title":"J Intell Fuzzy Syst"},{"key":"707_CR23","doi-asserted-by":"publisher","first-page":"145422","DOI":"10.1109\/ACCESS.2020.3014849","volume":"8","author":"BK Wang","year":"2020","unstructured":"Wang BK, He WN, Yang Z et al (2020) An unsupervised sentiment classification method based on multi-level fuzzy computing and multi-criteria fusion. IEEE Access 8:145422\u2013145434","journal-title":"IEEE Access"},{"issue":"11","key":"707_CR24","doi-asserted-by":"publisher","first-page":"1857","DOI":"10.1007\/s13042-018-0857-y","volume":"9","author":"XL He","year":"2018","unstructured":"He XL, Wei L, She YH (2018) L-fuzzy concept analysis for three-way decisions: basic definitions and fuzzy inference mechanisms. Int J Mach Learn Cybern 9(11):1857\u20131867","journal-title":"Int J Mach Learn Cybern"},{"key":"707_CR25","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/j.jal.2016.11.023","volume":"24","author":"D Alvarez","year":"2017","unstructured":"Alvarez D, Fernandez RA, Sanchez L (2017) Fuzzy system for intelligent word recognition using a regular grammar. J Appl Log 24:45\u201353","journal-title":"J Appl Log"},{"issue":"12","key":"707_CR26","doi-asserted-by":"publisher","first-page":"8655","DOI":"10.1007\/s00521-019-04357-9","volume":"32","author":"Y Madani","year":"2020","unstructured":"Madani Y, Erritali M, Bengourram J et al (2020) A multilingual fuzzy approach for classifying Twitter data using fuzzy logic and semantic similarity. Neural Comput Appl 32(12):8655\u20138673","journal-title":"Neural Comput Appl"},{"issue":"5","key":"707_CR27","doi-asserted-by":"publisher","first-page":"9831","DOI":"10.3233\/JIFS-202337","volume":"40","author":"FQ Zhao","year":"2021","unstructured":"Zhao FQ, Zhu ZY, Han P (2021) A novel model for semantic similarity measurement based on wordnet and word embedding. J Intell Fuzzy Syst 40(5):9831\u20139842","journal-title":"J Intell Fuzzy Syst"},{"key":"707_CR28","doi-asserted-by":"crossref","unstructured":"Wu ZB, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico, p 133\u2013138.","DOI":"10.3115\/981732.981751"},{"key":"707_CR29","unstructured":"Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, Madison, USA, p 296\u2013304."},{"issue":"2","key":"707_CR30","first-page":"290","volume":"84","author":"A Tversky","year":"1988","unstructured":"Tversky A (1988) Features of Similarity. Psychol Rev 84(2):290\u2013302","journal-title":"Psychol Rev"},{"key":"707_CR31","doi-asserted-by":"publisher","first-page":"265","DOI":"10.7551\/mitpress\/7287.001.0001","volume-title":"WordNet: an electronic lexical database","author":"C Fellbaum","year":"1998","unstructured":"Fellbaum C, Miller G (1998) Combining local context and wordnet similarity for word sense identification. WordNet: an electronic lexical database. The MIT Press, Cambridge, pp 265\u2013283"},{"key":"707_CR32","doi-asserted-by":"publisher","first-page":"305","DOI":"10.7551\/mitpress\/7287.001.0001","volume-title":"WordNet: an electronic lexical database","author":"C Fellbaum","year":"1998","unstructured":"Fellbaum C, Miller G (1998) Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: an electronic lexical database. The MIT Press, Cambridge, pp 305\u2013332"},{"key":"707_CR33","unstructured":"Resnik (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada."},{"key":"707_CR34","unstructured":"Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the 10th International Conference Research on Computational Linguistics, Taipei, Taiwan, p 1\u201315."},{"issue":"1","key":"707_CR35","first-page":"43","volume":"25","author":"ED Xun","year":"2006","unstructured":"Xun ED, Yan W (2006) English Word Similarity Calculation Based on Semantic Net. J China Soc Sci Tech Inf 25(1):43\u201348","journal-title":"J China Soc Sci Tech Inf"},{"key":"707_CR36","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/j.asoc.2016.12.028","volume":"53","author":"AI Saleh","year":"2017","unstructured":"Saleh AI, Abulwafa AE, Al Rahmawy MF (2017) A web page distillation strategy for efficient focused crawling based on optimized Na\u00efve bayes (ONB) classifier. Appl Soft Comput 53:181\u2013204","journal-title":"Appl Soft Comput"},{"issue":"13","key":"707_CR37","doi-asserted-by":"publisher","first-page":"4590","DOI":"10.3390\/app10134590","volume":"10","author":"HJ Kim","year":"2020","unstructured":"Kim HJ, Baek JW, Chung KY (2020) Optimization of associative knowledge graph using TF-IDF based ranking score. Appl Sci-Basel 10(13):4590","journal-title":"Appl Sci-Basel"},{"issue":"11","key":"707_CR38","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1016\/j.asoc.2015.07.026","volume":"36","author":"YJ Du","year":"2015","unstructured":"Du YJ, Liu WJ, Lv XJ et al (2015) An improved focused crawler based on semantic similarity vector space model. Appl Soft Comput 36(11):392\u2013407","journal-title":"Appl Soft Comput"},{"issue":"1","key":"707_CR39","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1016\/j.neucom.2013.06.039","volume":"123","author":"WJ Liu","year":"2014","unstructured":"Liu WJ, Du YJ (2014) A novel focused crawler based on cell-like membrane computing optimization algorithm. Neurocomputing 123(1):266\u2013280","journal-title":"Neurocomputing"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00707-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00707-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00707-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T18:50:16Z","timestamp":1677091816000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00707-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,5]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["707"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00707-8","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2022,7,5]]},"assertion":[{"value":"25 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 February 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}