{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T13:32:51Z","timestamp":1760707971400},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2008,9,24]],"date-time":"2008-09-24T00:00:00Z","timestamp":1222214400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Sign Process Syst Sign Image Video Technol"],"published-print":{"date-parts":[[2010,4]]},"DOI":"10.1007\/s11265-008-0270-y","type":"journal-article","created":{"date-parts":[[2008,9,23]],"date-time":"2008-09-23T16:04:01Z","timestamp":1222185841000},"page":"123-137","source":"Crossref","is-referenced-by-count":20,"title":["Finding and Extracting Data Records from Web Pages"],"prefix":"10.1007","volume":"59","author":[{"given":"Manuel","family":"\u00c1lvarez","sequence":"first","affiliation":[]},{"given":"Alberto","family":"Pan","sequence":"additional","affiliation":[]},{"given":"Juan","family":"Raposo","sequence":"additional","affiliation":[]},{"given":"Fernando","family":"Bellas","sequence":"additional","affiliation":[]},{"given":"Fidel","family":"Cacheda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,9,24]]},"reference":[{"key":"270_CR1","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1007\/978-3-540-77092-3_41","volume":"4808","author":"M. \u00c1lvarez","year":"2007","unstructured":"\u00c1lvarez, M., Pan, A., Raposo, J., Bellas, F., & Cacheda, F. (2007). Finding and extracting data records from web pages. Proc. of 2007 IFIP International Conference on Embedded and Ubiquitous Computing (EUC 2007). Lecture Notes in Computer Science, 4808, 466\u2013478 ISSN: 0302-9743.","journal-title":"Lecture Notes in Computer Science"},{"issue":"2","key":"270_CR2","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1007\/978-3-540-74477-1_31","volume":"4706","author":"M. \u00c1lvarez","year":"2007","unstructured":"\u00c1lvarez, M., Pan, A., Raposo, J., Cacheda, F., Bellas, F., & Carneiro, V. (2007). Crawling the content hidden behind web forms. In Proceedings of the 2007 International Conference on Computational Science and its Applications (ICCSA). Lecture Notes in Computer Science, 4706(2), 322\u2013333 Springer Berlin\/Heidelberg, ISSN: 0302-9743, ISBN-10: 3-540-74475-4, ISBN-13: 978-3-540-74475-7.","journal-title":"Lecture Notes in Computer Science"},{"key":"270_CR3","doi-asserted-by":"crossref","unstructured":"Arasu, A., & Garcia-Molina, H. (2003). Extracting structured data from web pages. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data.","DOI":"10.1145\/872757.872799"},{"key":"270_CR4","unstructured":"Arlota, L., Crescenzi, V., Mecca, G., & Merialdo, P. (2003). Automatic annotation of data extracted from large websites. In Proceedings of the WebDB Workshop, pp. 7\u201312."},{"key":"270_CR5","unstructured":"Baumgartner, R., Flesca, S., Gottlob, G. (2001). Visual web information extraction with lixto. In Proc. of Very Large DataBases (VLDB)."},{"key":"270_CR6","volume-title":"Mining the web: Discovering knowledge from hypertext data","author":"S. Chakrabarti","year":"2003","unstructured":"Chakrabarti, S. (2003). Mining the web: Discovering knowledge from hypertext data. San Francisco: Morgan Kaufmann ISBN: 1-55860-754-4."},{"key":"270_CR7","doi-asserted-by":"crossref","unstructured":"Chang, C., & Lui, S. (2001). IEPAD: Information extraction based on pattern discovery. In Proc. of 2001 Int. World Wide Web Conf., pp. 681\u2013688.","DOI":"10.1145\/371920.372182"},{"key":"270_CR8","unstructured":"Chang, K., He, B., & Zhang, Z. (2004). MetaQuerier over the deep web: Shallow integration across holistic sources. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb)."},{"key":"270_CR9","unstructured":"Crescenzi, V., Mecca, G., & Merialdo, P. (2001). ROADRUNNER: Towards automatic data extraction from large web sites. In Proc. of the 2001 Int. VLDB Conf, pp. 109\u2013118."},{"issue":"3","key":"270_CR10","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.datak.2004.11.004","volume":"54","author":"V. Crescenzi","year":"2005","unstructured":"Crescenzi, V., Merialdo, P., & Missier, P. (2005). Clustering web pages based on their structure. Data & Knowledge Engineering Journal, 54(3), 279\u2013299. September.","journal-title":"Data & Knowledge Engineering Journal"},{"key":"270_CR11","volume-title":"New indices for text: Pat trees and pat arrays. Information retrieval: Data structures and algorithms","author":"G. H. Gonnet","year":"1992","unstructured":"Gonnet, G. H., Baeza-Yates, R. A., & Snider, T. (1992). New indices for text: Pat trees and pat arrays. Information retrieval: Data structures and algorithms. Upper Saddle River: Prentice Hall."},{"key":"270_CR12","doi-asserted-by":"crossref","unstructured":"Hammer, J., McHugh, J., & Garcia-Molina, H. (1997). Semistructured data: The Tsimmis experience. In Proceedings of the 1st East-European Symposium on Advances in Databases and Information Systems (ADBIS), pp. 1\u20138.","DOI":"10.14236\/ewic\/ADBIS1997.22"},{"key":"270_CR13","doi-asserted-by":"crossref","unstructured":"Hogue, A., & Karger, D. (2005). Thresher: Automating the unwrapping of semantic content from the world wide web. In Proceedings of the 14th International World Wide Web Conference.","DOI":"10.1145\/1060745.1060762"},{"issue":"8","key":"270_CR14","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1016\/S0306-4379(98)00027-1","volume":"23","author":"C. N. Hsu","year":"1998","unstructured":"Hsu, C. N., & Dung, M. T. (1998). Generating finite-state transducers for semi-structured data extraction from the web. Information System, 23(8), 521\u2013538. doi: 10.1016\/S0306-4379(98)00027-1 .","journal-title":"Information System"},{"key":"270_CR15","unstructured":"Jung, Y., Geller, J., Wu, Y., & Ae Chun, S. (2007). Semantic deep web: Automatic attribute extraction from the deep web data sources. In Proceedings of the International SAC Conference, pp. 1667\u20131672."},{"issue":"2","key":"270_CR16","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/j.datak.2005.01.001","volume":"54","author":"V. Kovalev","year":"2005","unstructured":"Kovalev, V., Bhowmick, S., & Madria, S. (2005). HW-STALKER: A machine learning-based system for transforming QURE-Pagelets to XML. Data & Knowledge Engineering Journal, 54(2), 241\u2013276, August.","journal-title":"Data & Knowledge Engineering Journal"},{"key":"270_CR17","doi-asserted-by":"crossref","unstructured":"Kistlera, T., & Marais, H. (1998). WebL: A Programming Language for the Web. In Proceedings of the 7th International World Wide Web Conference (WWW7), pp. 259\u2013270.","DOI":"10.1016\/S0169-7552(98)00018-X"},{"key":"270_CR18","unstructured":"Kushmerick, N., Weld, D. S., & Doorenbos, R. B. (1997). Wrapper induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI), pp. 729\u2013737."},{"issue":"2","key":"270_CR19","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/565117.565137","volume":"31","author":"A. H. F. Laender","year":"2002","unstructured":"Laender, A. H. F., Ribeiro-Neto, B. A., Soares da Silva, A., & Teixeira, J. S. (2002). A brief survey of web data extraction tools. SIGMOD Record, 31(2), 84\u201393. doi: 10.1145\/565117.565137 .","journal-title":"SIGMOD Record"},{"key":"270_CR20","first-page":"707","volume":"10","author":"V. I. Levenshtein","year":"1966","unstructured":"Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10, 707\u2013710.","journal-title":"Soviet Physics Doklady"},{"key":"270_CR21","unstructured":"Liddle, S., Yau, S., & Embley, D. (2001). On the automatic extraction of data from the hidden web. ER (Workshops), pp. 212\u2013226."},{"key":"270_CR22","doi-asserted-by":"crossref","unstructured":"Muslea, I., Minton, S., & Knoblock, C. (2001). Hierarchical wrapper induction for semistructured information sources. Auton. Agent. Multi Agent Syst., 93\u2013114. doi: 10.1023\/A:1010022931168 .","DOI":"10.1023\/A:1010022931168"},{"key":"270_CR23","unstructured":"Notredame, C. (2002). Recent progresses in multiple sequence alignment: A survey. Technical report, Information Genetique et."},{"key":"270_CR24","doi-asserted-by":"crossref","unstructured":"Pan, A., et al. (2002). Semi-automatic wrapper generation for commercial web sources. In Proc. of IFIP WG8.1 Conf. on Engineering Inf. Systems in the Internet Context (EISIC).","DOI":"10.1007\/978-0-387-35614-3_16"},{"key":"270_CR25","unstructured":"Raghavan, S., & Garc\u00eda-Molina, H. (2001). Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Databases (VLDB)."},{"issue":"2","key":"270_CR26","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/j.datak.2006.06.006","volume":"61","author":"J. Raposo","year":"2007","unstructured":"Raposo, J., Pan, A., \u00c1lvarez, M., & Hidalgo, J. (2007). Automatically maintaining wrappers for web sources. Data & Knowledge Engineering, 61(2), 331\u2013358. doi: 10.1016\/j.datak.2006.06.006 .","journal-title":"Data & Knowledge Engineering"},{"issue":"3","key":"270_CR27","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/S0169-023X(00)00051-3","volume":"36","author":"A. Sahuguet","year":"2001","unstructured":"Sahuguet, A., & Azavant, F. (2001). Building intelligent web applications using lightweight wrappers. Data & Knowledge Engineering Journal, 36(3), 283\u2013316. doi: 10.1016\/S0169-023X(00)00051-3 .","journal-title":"Data & Knowledge Engineering Journal"},{"key":"270_CR28","doi-asserted-by":"crossref","unstructured":"Wang, J., & Lochovsky, F. (2003). Data extraction and label assignment for web databases. In Proceedings of the 12th International World Wide Web Conference (WWW12).","DOI":"10.1145\/775152.775179"},{"key":"270_CR29","doi-asserted-by":"crossref","unstructured":"Zhai, Y., & Liu, B. (2005). Extracting web data using instance-based learning. In Proc. of Web Information Systems Engineering (WISE), pp. 318\u2013331.","DOI":"10.1007\/11581062_24"},{"issue":"12","key":"270_CR30","doi-asserted-by":"crossref","first-page":"1614","DOI":"10.1109\/TKDE.2006.197","volume":"18","author":"Y. Zhai","year":"2006","unstructured":"Zhai, Y., & Liu, B. (2006). Structured data extraction from the web based on partial tree alignment. IEEE Transactions on Knowledge and Data Engineering, 18(12), 1614\u20131628. doi: 10.1109\/TKDE.2006.197 .","journal-title":"IEEE Transactions on Knowledge and Data Engineering"}],"container-title":["Journal of Signal Processing Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-008-0270-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11265-008-0270-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-008-0270-y","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,8]],"date-time":"2020-05-08T12:49:23Z","timestamp":1588942163000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11265-008-0270-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,9,24]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,4]]}},"alternative-id":["270"],"URL":"https:\/\/doi.org\/10.1007\/s11265-008-0270-y","relation":{},"ISSN":["1939-8018","1939-8115"],"issn-type":[{"value":"1939-8018","type":"print"},{"value":"1939-8115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,9,24]]}}}