{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T01:09:57Z","timestamp":1767143397814,"version":"build-2238731810"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2021,1,8]],"date-time":"2021-01-08T00:00:00Z","timestamp":1610064000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,8]],"date-time":"2021-01-08T00:00:00Z","timestamp":1610064000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Projekt DEAL"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SOCA"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Much data on the web is available in hidden databases. Users browse their contents by sending search queries to form-based interfaces or APIs. Yet, hidden databases just return the top-k result entries and limit the number of queries per time interval. Such access restrictions constrict those tasks that require many\/specific queries or need to access many\/all data entries. For a temporary solution, an unrestricted local snapshot can be created by crawling the hidden database. Yet, keeping the snapshot permanently consistent is challenging due to the access restrictions of its origin. In this paper, we propose a replication approach providing permanent unrestricted access to the local copy of a hidden database with dynamic changes. To this end, we present an algorithm to effectively crawl hidden databases that outperforms the state of the art. Furthermore, we propose a new way to continuously control the consistency of the replicated database in an efficient manner. We also introduce the cloud-based architecture of a replication service for hidden databases. We show the effectiveness of the approach through a variety of reproducible experimental evaluations.<\/jats:p>","DOI":"10.1007\/s11761-020-00313-x","type":"journal-article","created":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T21:05:24Z","timestamp":1610053524000},"page":"323-338","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A third-party replication service for dynamic hidden databases"],"prefix":"10.1007","volume":"15","author":[{"given":"Stefan","family":"Hintzen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yves","family":"Liesy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0838-2846","authenticated-orcid":false,"given":"Christian","family":"Zirpins","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,1,8]]},"reference":[{"key":"313_CR1","doi-asserted-by":"crossref","unstructured":"\u00c1lvarez M, Raposo J, Pan A, Cacheda F, Bellas F, Carneiro V (2007) DeepBot: A focused crawler for accessing hidden web content. In: ACM International Conference Proceeding Series, vol. 236, pp. 18\u201325. ACM","DOI":"10.1145\/1278380.1278385"},{"key":"313_CR2","unstructured":"Alwan AA, Ibrahim H, Udzir NI, Sidi F (2013) Estimating missing values of skylines in incomplete database. In: Proceedings of the 2th International Conference on Digital Enterprise and Information Systems, pp. 220\u2013229. SDIWC"},{"issue":"11","key":"313_CR3","doi-asserted-by":"publisher","first-page":"888","DOI":"10.14778\/2983200.2983205","volume":"9","author":"A Asudeh","year":"2016","unstructured":"Asudeh A, Zhang N, Das G (2016) Query reranking as a service. Proceedings of the VLDB Endowment 9(11):888\u2013899","journal-title":"Proceedings of the VLDB Endowment"},{"key":"313_CR4","doi-asserted-by":"crossref","unstructured":"Barbosa L, Freire J (2007) An adaptive crawler for locating hiddenwebentry points. In: C.L. Williamson, M.E. Zurko, P.F. Patel-Schneider, P.J. Shenoy (eds.) Proceedings of the 16th international conference on World Wide Web \u2013 WWW \u201907, p. 441. ACM Press","DOI":"10.1145\/1242572.1242632"},{"issue":"1","key":"313_CR5","first-page":"133","volume":"1","author":"L Barbosa","year":"2010","unstructured":"Barbosa L, Freire J (2010) Siphoning hidden-web data through keyword-based interfaces. Journal of Information and Data Management 1(1):133\u2013144","journal-title":"Journal of Information and Data Management"},{"issue":"2","key":"313_CR6","doi-asserted-by":"publisher","first-page":"610","DOI":"10.1214\/aoms\/1177706645","volume":"29","author":"GEP Box","year":"1958","unstructured":"Box GEP, Muller ME (1958) A Note on the generation of random normal deviates. The Annals of Mathematical Statistics 29(2):610\u2013611","journal-title":"The Annals of Mathematical Statistics"},{"key":"313_CR7","doi-asserted-by":"crossref","unstructured":"Durairaj Gunasekaran Y, Asudeh A, Hasani S, Zhang N, Jaoua A, Das G (2018) QR2: A Third-Party Query Reranking Service over Web Databases. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1653\u20131656. IEEE","DOI":"10.1109\/ICDE.2018.00199"},{"key":"313_CR8","doi-asserted-by":"crossref","unstructured":"Jin X, Zhang N, Das G (2011) Attribute domain discovery for hidden web databases. In: Proceedings of the 2011 international conference on Management of data \u2013 SIGMOD \u201911, p. 553. ACM Press, New York, New York, USA","DOI":"10.1145\/1989323.1989381"},{"key":"313_CR9","unstructured":"Kambhampati S, Wolf G, Chen Y, Khatri H, Chokshi B, Fan J, Nambiar U (2007) QUIC: Handling query imprecision & data incompleteness in autonomous databases. In: CIDR 2007 \u2013 3rd Biennial Conference on Innovative Data Systems Research, pp. 263\u2013268"},{"key":"313_CR10","doi-asserted-by":"publisher","first-page":"584","DOI":"10.1016\/j.procs.2017.12.075","volume":"125","author":"M Kumar","year":"2018","unstructured":"Kumar M, Bindal A, Gautam R, Bhatia R (2018) Keyword query based focused web crawler. Procedia Computer Science 125:584\u2013590","journal-title":"Procedia Computer Science"},{"issue":"12","key":"313_CR11","doi-asserted-by":"publisher","first-page":"1107","DOI":"10.14778\/2732977.2732985","volume":"7","author":"W Liu","year":"2014","unstructured":"Liu W, Thirumuruganathan S, Zhang N, Das G (2014) Aggregate estimation over dynamic hidden web databases. Proceedings of the VLDB Endowment 7(12):1107\u20131118","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"3","key":"313_CR12","first-page":"84","volume":"38","author":"Y Lu","year":"2015","unstructured":"Lu Y, Thirumuruganathan S, Zhang N, Das G (2015) Hidden database research and analytics (hydra) system. IEEE Data Eng. Bull. 38(3):84\u2013102","journal-title":"IEEE Data Eng. Bull."},{"issue":"2","key":"313_CR13","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.14778\/1454159.1454163","volume":"1","author":"J Madhavan","year":"2008","unstructured":"Madhavan J, Ko D, Kot \u0141, Ganapathy V, Rasmussen A, Halevy A (2008) Google\u2019s Deep web crawl. Proceedings of the VLDB Endowment 1(2):1241\u20131252","journal-title":"Proceedings of the VLDB Endowment"},{"key":"313_CR14","doi-asserted-by":"crossref","unstructured":"Meng X, Ma ZM, Yan L (2009) Answering approximate queries over autonomous web databases. In: Proceedings of the 18th international conference on World wide web \u2013 WWW \u201909, p. 1021. ACM Press, New York, New York, USA","DOI":"10.1145\/1526709.1526846"},{"issue":"1","key":"313_CR15","doi-asserted-by":"publisher","first-page":"684","DOI":"10.14778\/1453856.1453931","volume":"1","author":"H Nguyen","year":"2008","unstructured":"Nguyen H, Nguyen T, Freire J (2008) Learning to extract form labels. Proceedings of the VLDB Endowment 1(1):684\u2013694","journal-title":"Proceedings of the VLDB Endowment"},{"key":"313_CR16","doi-asserted-by":"crossref","unstructured":"Rezk E, Aqle A, Jaoua A, Das G, Zhang N (2017) Optimized Processing of a Batch of Aggregate Queries over Hidden Databases. In: 2017 International Conference on Computer and Applications (ICCA), pp. 317\u2013324. IEEE, IEEE","DOI":"10.1109\/COMAPP.2017.8079754"},{"issue":"12","key":"313_CR17","doi-asserted-by":"publisher","first-page":"1378","DOI":"10.14778\/2536274.2536320","volume":"6","author":"O Savkovi\u0107","year":"2013","unstructured":"Savkovi\u0107 O, Mirza P, Tomasi A, Nutt W (2013) Complete approximations of incomplete queries. Proceedings of the VLDB Endowment 6(12):1378\u20131381","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"11","key":"313_CR18","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.14778\/2350229.2350232","volume":"5","author":"C Sheng","year":"2012","unstructured":"Sheng C, Zhang N, Tao Y, Jin X (2012) Optimal algorithms for crawling a hidden database in the web. Proceedings of the VLDB Endowment 5(11):1112\u20131123","journal-title":"Proceedings of the VLDB Endowment"},{"issue":"11","key":"313_CR19","doi-asserted-by":"publisher","first-page":"1286","DOI":"10.14778\/2809974.2809989","volume":"8","author":"S Song","year":"2015","unstructured":"Song S, Zhang A, Chen L, Wang J (2015) Enriching data imputation with extensive similarity neighbors. Proceedings of the VLDB Endowment 8(11):1286\u20131297","journal-title":"Proceedings of the VLDB Endowment"},{"key":"313_CR20","unstructured":"Suhaim SB, Liu W, Zhang N (2016) Discover Aggregates Exceptions over Hidden Web Databases. arXiv preprint arXiv:1611.06417"},{"issue":"5","key":"313_CR21","doi-asserted-by":"publisher","first-page":"1167","DOI":"10.1007\/s00778-009-0155-0","volume":"18","author":"G Wolf","year":"2009","unstructured":"Wolf G, Kalavagattu A, Khatri H, Balakrishnan R, Chokshi B, Fan J, Chen Y, Kambhampati S (2009) Query processing over incomplete autonomous databases: Query rewriting using learned data dependencies. The VLDB Journal 18(5):1167\u20131190","journal-title":"The VLDB Journal"},{"issue":"3","key":"313_CR22","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1145\/566340.566342","volume":"20","author":"H Yu","year":"2002","unstructured":"Yu H, Vahdat A (2002) Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Trans. Comput. Syst. 20(3):239\u2013282","journal-title":"ACM Trans. Comput. Syst."},{"issue":"4","key":"313_CR23","doi-asserted-by":"publisher","first-page":"608","DOI":"10.1109\/TSC.2015.2414931","volume":"9","author":"F Zhao","year":"2016","unstructured":"Zhao F, Zhou J, Nie C, Huang H, Jin H (2016) SmartCrawler: A two-stage crawler for efficiently harvesting deep-web interfaces. IEEE Transactions on Services Computing 9(4):608\u2013620","journal-title":"IEEE Transactions on Services Computing"}],"updated-by":[{"DOI":"10.1007\/s11761-021-00316-2","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2021,4,1]],"date-time":"2021-04-01T00:00:00Z","timestamp":1617235200000}}],"container-title":["Service Oriented Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11761-020-00313-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11761-020-00313-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11761-020-00313-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T01:52:55Z","timestamp":1697507575000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11761-020-00313-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,8]]},"references-count":23,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["313"],"URL":"https:\/\/doi.org\/10.1007\/s11761-020-00313-x","relation":{},"ISSN":["1863-2386","1863-2394"],"issn-type":[{"value":"1863-2386","type":"print"},{"value":"1863-2394","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,8]]},"assertion":[{"value":"21 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 December 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 April 2021","order":5,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":6,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s11761-021-00316-2","URL":"https:\/\/doi.org\/10.1007\/s11761-021-00316-2","order":8,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}