{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:08:14Z","timestamp":1760242094624,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2018,12,13]],"date-time":"2018-12-13T00:00:00Z","timestamp":1544659200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Key Technology Support Program of China","award":["2012BAH09B02"],"award-info":[{"award-number":["2012BAH09B02"]}]},{"name":"the Natural Science Foundation of Hunan Province","award":["2017JJ5064"],"award-info":[{"award-number":["2017JJ5064"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website \u201cwww.jd.com\u201d, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).<\/jats:p>","DOI":"10.3390\/fi10120124","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T04:44:42Z","timestamp":1544762682000},"page":"124","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Method for Filtering Pages by Similarity Degree based on Dynamic Programming"],"prefix":"10.3390","volume":"10","author":[{"given":"Ziyun","family":"Deng","sequence":"first","affiliation":[{"name":"College of Economics and Trade, Changsha Commerce & Tourism College, Changsha 410116, China"},{"name":"National Supercomputing Center in Changsha, Hunan University, Changsha 410116, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7890-7567","authenticated-orcid":false,"given":"Tingqin","family":"He","sequence":"additional","affiliation":[{"name":"National Supercomputing Center in Changsha, Hunan University, Changsha 410116, China"}]}],"member":"1968","published-online":{"date-parts":[[2018,12,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"21552","DOI":"10.1109\/ACCESS.2018.2815992","article-title":"An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy","volume":"6","author":"Rios","year":"2018","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1080\/13614568.2016.1152316","article-title":"An efficient scheme for automatic web pages categorization using the support vector machine","volume":"22","author":"Bhalla","year":"2016","journal-title":"New Rev. Hypermedia Multimedia"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"6649","DOI":"10.1109\/ACCESS.2017.2669263","article-title":"Study on Self-Tuning Tyre Friction Control for Developing Main-Servo Loop Integrated Chassis Control System","volume":"5","author":"Zhang","year":"2017","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1638","DOI":"10.1016\/j.asoc.2010.05.003","article-title":"Intelligent classification of web pages using contextual and visual features","volume":"11","author":"Ahmadi","year":"2011","journal-title":"Appl. Soft Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1007\/s11280-016-0415-z","article-title":"A semantic based Web page classification strategy using multi-layered domain ontology","volume":"20","author":"Saleh","year":"2017","journal-title":"World Wide Web"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1007\/s10489-017-1008-y","article-title":"A tree-based algorithm for attribute selection","volume":"48","author":"Baranauskas","year":"2018","journal-title":"Appl. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yu, X., Li, M., Kim, K.A., Chung, J., and Ryu, K.H. (2016). Emerging Pattern-Based Clustering of Web Users Utilizing a Simple Page-Linked Graph. Sustainability, 8.","DOI":"10.3390\/su8030239"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1814","DOI":"10.1016\/j.tele.2017.09.004","article-title":"Classification of design parameters for e-commerce websites: A novel fuzzy Kano approach","volume":"38","author":"Ilbahar","year":"2017","journal-title":"Telematics Inform."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Popescu, D.A., and Radulescu, D. (2015, January 12). Approximately similarity measurement of web sites. Proceedings of the 2015 International Conference on Telecommunications & Signal Processing, Istanbul, Turkey.","DOI":"10.1007\/978-3-319-26561-2_73"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"37","DOI":"10.9790\/0661-0463742","article-title":"Clustering algorithm with a novel similarity measure","volume":"4","author":"Reddy","year":"2012","journal-title":"IOSR J. Comput. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1109\/90.650143","article-title":"Self-Similarity in World Wide Web traffic: Evidence and possible causes","volume":"5","author":"Crovella","year":"1997","journal-title":"IEEE\/ACM Trans. Network."},{"key":"ref_12","first-page":"1","article-title":"Automatic combination technology of fuzzy CPN for OWL-S web services in supercomputing cloud platform","volume":"31","author":"Deng","year":"2017","journal-title":"Int. J. Pattern Recogit. Artif. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jss.2012.07.040","article-title":"Semantic ranking of web pages based on formal concept analysis","volume":"86","author":"Du","year":"2013","journal-title":"J. Syst. Softw."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1007\/s00521-016-2444-z","article-title":"Web page recommendation via twofold clustering: Considering user behavior and topic relation","volume":"29","author":"Xie","year":"2018","journal-title":"Neural Comput. Appl."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/s10479-010-0704-3","article-title":"Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data","volume":"197","author":"Kou","year":"2012","journal-title":"Ann. Oper. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2574","DOI":"10.1109\/TKDE.2013.78","article-title":"Web-page recommendation based on web usage and domain knowledge","volume":"26","author":"Nguyen","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.datak.2013.09.003","article-title":"A topic-specific crawling strategy based on semantics similarity","volume":"88","author":"Du","year":"2013","journal-title":"Data Knowled. Eng."},{"key":"ref_18","first-page":"205","article-title":"Comparison of machine learning algorithms to classify web pages","volume":"8","author":"Hussien","year":"2017","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1007\/s13198-017-0629-1","article-title":"Quantitative evaluation of web metrics for automatic genre classification of web pages","volume":"8","author":"Ruchika","year":"2017","journal-title":"Int. J. Syst. Assurance Eng. Manag."},{"key":"ref_20","first-page":"285","article-title":"Semantic similarity based web document classification using support vector machine","volume":"14","author":"Kavitha","year":"2017","journal-title":"Int. Arab J. Inf. Technol."},{"key":"ref_21","first-page":"126","article-title":"An Automated web page classifier and an algorithm for the extraction of navigational pattern from the web data","volume":"16","author":"Wahab","year":"2017","journal-title":"J. Web Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"25781","DOI":"10.1109\/ACCESS.2017.2768564","article-title":"A fuzzy ontology and SVM-based web content classification system","volume":"5","author":"Farman","year":"2017","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.amc.2015.07.120","article-title":"Web page classification based on a simplified swarm optimization","volume":"270","author":"Lee","year":"2015","journal-title":"Appl. Math. Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1016\/j.future.2017.03.003","article-title":"An optimized approach for massive web page classification using entity similarity based on semantic network","volume":"76","author":"Li","year":"2017","journal-title":"Future Gener. Comput. Syst."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/10\/12\/124\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:33:52Z","timestamp":1760196832000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/10\/12\/124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,13]]},"references-count":24,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2018,12]]}},"alternative-id":["fi10120124"],"URL":"https:\/\/doi.org\/10.3390\/fi10120124","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2018,12,13]]}}}