{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:47:02Z","timestamp":1754156822742,"version":"3.41.2"},"reference-count":16,"publisher":"Emerald","issue":"2","license":[{"start":{"date-parts":[[2018,3,19]],"date-time":"2018-03-19T00:00:00Z","timestamp":1521417600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["DTA"],"published-print":{"date-parts":[[2018,3,22]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Research limitations\/implications<\/jats:title><jats:p>To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Practical implications<\/jats:title><jats:p>The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.<\/jats:p><\/jats:sec>","DOI":"10.1108\/dta-07-2017-0053","type":"journal-article","created":{"date-parts":[[2018,3,19]],"date-time":"2018-03-19T10:24:53Z","timestamp":1521455093000},"page":"266-277","source":"Crossref","is-referenced-by-count":4,"title":["Design and implementation of crawling algorithm to collect deep web information for web archiving"],"prefix":"10.1108","volume":"52","author":[{"given":"Hyo-Jung","family":"Oh","sequence":"first","affiliation":[]},{"given":"Dong-Hyun","family":"Won","sequence":"additional","affiliation":[]},{"given":"Chonghyuck","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Sung-Hee","family":"Park","sequence":"additional","affiliation":[]},{"given":"Yong","family":"Kim","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2018,3,19]]},"reference":[{"issue":"15","key":"key2021041413052999200_ref001","doi-asserted-by":"crossref","first-page":"9","DOI":"10.5120\/14238-2377","article-title":"Hidden web data extraction tools","volume":"82","year":"2013","journal-title":"International Journal of Computer Applications"},{"key":"key2021041413052999200_ref002","doi-asserted-by":"crossref","unstructured":"\u00c1lvarez, M., Raposo, J., Pan, A., Cacheda, F., Bellas, F. and Carneiro, V. (2007), \u201cDeepBot: a focused crawler for accessing hidden web content\u201d, paper presented at the DEECS \u201807 Proceedings of the 3rd International Workshop on Data Engineering Issues in E-commerce and Services, pp. 18-25.","DOI":"10.1145\/1278380.1278385"},{"issue":"2","key":"key2021041413052999200_ref003","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1016\/j.ipm.2016.11.006","article-title":"Sampling strategies for information extraction over the deep web","volume":"53","year":"2017","journal-title":"Information Processing and Management"},{"issue":"1","key":"key2021041413052999200_ref004","article-title":"The deep web: surfacing hidden value","volume":"7","year":"2001","journal-title":"Journal of Electronic Publishing"},{"issue":"3","key":"key2021041413052999200_ref005","first-page":"1199","article-title":"Web crawler: review of different types of web crawler, its issues","volume":"8","year":"2017","journal-title":"International Journal of Advanced Research in Computer Science"},{"key":"key2021041413052999200_ref006","unstructured":"Goodman, M. (2015), \u201cMost of the web is invisible to Google. Here\u2019s what it contains. A roadmap of the internet\u2019s darkest alleys. Popular science\u201d, available at: www.popsci.com\/Dark-web-revealed (accessed May 31, 2017)."},{"issue":"5","key":"key2021041413052999200_ref015","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1145\/1230819.1241670","article-title":"Accessing the deep web","volume":"50","year":"2007","journal-title":"Magazine Communications of the ACM"},{"issue":"9","key":"key2021041413052999200_ref007","doi-asserted-by":"crossref","first-page":"9","DOI":"10.5392\/JKCA.2011.11.9.009","article-title":"Development of web crawler for archiving web resources","volume":"11","year":"2011","journal-title":"The Journal of the Korea Contents Association"},{"issue":"1","key":"key2021041413052999200_ref016","first-page":"309","article-title":"Survey of web crawling algorithms","volume":"8","year":"2014","journal-title":"Advances in Vision Computing: An International Journal"},{"issue":"3","key":"key2021041413052999200_ref008","first-page":"753","article-title":"A framework for incremental hidden web crawler","volume":"2","year":"2010","journal-title":"International Journal on Computer Science and Engineering"},{"issue":"5","key":"key2021041413052999200_ref009","first-page":"52","article-title":"Deep web data scraper: search engine","volume":"2","year":"2014","journal-title":"International Journal of Computer Sciences and Engineering"},{"key":"key2021041413052999200_ref010","unstructured":"Netcraft (2017), \u201cApril 2017 web server survey. Retrieved May 10, 2017\u201d, available at: https:\/\/news.netcraft.com\/archives\/category\/web-server-survey\/ (accessed May 31, 2017)."},{"key":"key2021041413052999200_ref011","first-page":"140","article-title":"Threats that deep web possess to modern world","year":"2017","journal-title":"International Journal for Innovative Research in Science & Technology"},{"key":"key2021041413052999200_ref012","unstructured":"Raghavan, S. and Garcia-Molina., H. (2001), \u201cCrawling the hidden web\u201d, paper presented at the VLDB \u201801 Proceedings of the 27th International Conference on Very Large Data Bases, pp. 129-138."},{"issue":"2","key":"key2021041413052999200_ref013","first-page":"258","article-title":"Web crawlers and web crawling algorithms \u2013 a review","volume":"2","year":"2016","journal-title":"International Journal of Scientific Research in Science, Engineering and Technology"},{"issue":"9","key":"key2021041413052999200_ref014","first-page":"1544","article-title":"Comparison of open source crawlers \u2013 a review","volume":"Vol. 6","year":"2015","journal-title":"International Journal of Scientific & Engineering Research"}],"container-title":["Data Technologies and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-07-2017-0053\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/DTA-07-2017-0053\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:15:13Z","timestamp":1753398913000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/dta\/article\/52\/2\/266-277\/16888"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,19]]},"references-count":16,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2018,3,19]]},"published-print":{"date-parts":[[2018,3,22]]}},"alternative-id":["10.1108\/DTA-07-2017-0053"],"URL":"https:\/\/doi.org\/10.1108\/dta-07-2017-0053","relation":{},"ISSN":["2514-9288"],"issn-type":[{"type":"print","value":"2514-9288"}],"subject":[],"published":{"date-parts":[[2018,3,19]]}}}