{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T12:59:42Z","timestamp":1776085182846,"version":"3.50.1"},"publisher-location":"New York, New York, USA","reference-count":43,"publisher":"ACM Press","license":[{"start":{"date-parts":[[2018,1,1]],"date-time":"2018-01-01T00:00:00Z","timestamp":1514764800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"EPSRC","award":["EP\/M025268\/1"],"award-info":[{"award-number":["EP\/M025268\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018]]},"DOI":"10.1145\/3178876.3186008","type":"proceedings-article","created":{"date-parts":[[2018,4,13]],"date-time":"2018-04-13T15:53:48Z","timestamp":1523634828000},"page":"1095-1104","source":"Crossref","is-referenced-by-count":12,"title":["Browserless Web Data Extraction"],"prefix":"10.1145","author":[{"given":"Ruslan R.","family":"Fayzrakhmanov","sequence":"first","affiliation":[{"name":"University of Oxford, Oxford, United Kingdom"}]},{"given":"Emanuel","family":"Sallinger","sequence":"additional","affiliation":[{"name":"University of Oxford, Oxford, United Kingdom"}]},{"given":"Ben","family":"Spencer","sequence":"additional","affiliation":[{"name":"University of Oxford, Oxford, United Kingdom"}]},{"given":"Tim","family":"Furche","sequence":"additional","affiliation":[{"name":"University of Oxford & Meltwater, Oxford, United Kingdom"}]},{"given":"Georg","family":"Gottlob","sequence":"additional","affiliation":[{"name":"University of Oxford & TU Wien, Oxford, United Kingdom"}]}],"member":"320","reference":[{"key":"key-10.1145\/3178876.3186008-1","doi-asserted-by":"crossref","unstructured":"Shaon Barman, Sarah Chasins, Rastislav Bod&#305;k, and Sumit Gulwani. 2016. Ringer: web automation by demonstration. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016. 748--764.","DOI":"10.1145\/2983990.2984020"},{"key":"key-10.1145\/3178876.3186008-2","unstructured":"Alberto Bartoli, Eric Medvet, and Marco Mauri. 2012. Recording and replaying navigations on AJAX web sites International Conference on Web Engineering. Springer, 370--377."},{"key":"key-10.1145\/3178876.3186008-3","unstructured":"Robert Baumgartner, Oliver Fr&#246;lich, and Georg Gottlob. 2007. The Lixto Systems Applications in Business Intelligence and Semantic Web The Semantic Web: Research and Applications, 4th European Semantic Web Conference, ESWC 2007, Innsbruck, Austria, June 3--7, 2007, Proceedings. 16--26."},{"key":"key-10.1145\/3178876.3186008-4","doi-asserted-by":"crossref","unstructured":"Amina Bekkouche, Sidi Mohammed, Benslimane Marianne, Chouki Tibermacine, Fethallah Hadjila, and Mohammed Merzoug. 2017. QoS-aware optimal and automated semantic web service composition with user's constraints. Service Oriented Computing and Applications (2017), 1--19.","DOI":"10.1007\/s11761-017-0205-1"},{"key":"key-10.1145\/3178876.3186008-5","doi-asserted-by":"crossref","unstructured":"Tim Berners-Lee, Roy Fielding, and Larry Masinter. 2005. Uniform Resource Identifier (URI): Generic Syntax. Standard RFC 3986. The Internet Society (ISOC) \/ Internet Engineering Task Force (IETF).","DOI":"10.17487\/rfc3986"},{"key":"key-10.1145\/3178876.3186008-6","doi-asserted-by":"crossref","unstructured":"Tim Berners-Lee, Larry Masinter, and M. McCahill. 1994. Uniform Resource Identifier (URI). Standard RFC 1738. Network Working Group.","DOI":"10.17487\/rfc1738"},{"key":"key-10.1145\/3178876.3186008-7","unstructured":"Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic Schema Matching, Ten Years Later. PVLDB Vol. 4, 11 (2011), 695--701."},{"key":"key-10.1145\/3178876.3186008-8","unstructured":"Jeffrey P. Bigham, T. Lau, and J. Nichols. 2009. Trailblazer: enabling blind users to blaze trails through the web Proceedings of the 13th international conference on Intelligent user interfaces, Vol. 09. ACM, 177--186."},{"key":"key-10.1145\/3178876.3186008-9","unstructured":"Michal Ceresna. 2005. Supervised Learning of Wrappers from Structured Data Sources. PhD Thesis. Vienna University of Technology."},{"key":"key-10.1145\/3178876.3186008-10","doi-asserted-by":"crossref","unstructured":"Mustafa Emre Dincturk, Suryakant Choudhary, Gregor von Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut. 2012. A statistical approach for efficient crawling of rich internet applications. In Web Engineering, Marco Brambilla, Takehiro Tokuda, and Robert Tolksdorf (Eds.). Springer, Berlin, Heidelberg, 362--369.","DOI":"10.1007\/978-3-642-31753-8_29"},{"key":"key-10.1145\/3178876.3186008-11","doi-asserted-by":"crossref","unstructured":"Cristian Duda, Gianni Frey, Donald Kossmann, Reto Matter, and Chong Zhou. 2009. AJAX Crawl: Making AJAX applications searchable. In Proceeding of the IEEE 25th International Conference on Data Engineering (ICDE '09). IEEE, Washington, DC, USA, 78--89.","DOI":"10.1109\/ICDE.2009.90"},{"key":"key-10.1145\/3178876.3186008-12","doi-asserted-by":"crossref","unstructured":"Ruslan R. Fayzrakhmanov. 2015. Models and Approaches for Web Information Extraction and Web Page Understanding. In The Evolution of the Internet in the Business Sector: Web 1.0 to Web 3.0, Pedro Isa&#237;as, Piet Kommers, and Tomayess Issa (Eds.). IGI Global, Chapter 2, 25--50.","DOI":"10.4018\/978-1-4666-7262-8.ch002"},{"key":"key-10.1145\/3178876.3186008-13","doi-asserted-by":"crossref","unstructured":"Emilio Ferrara, Pasquale De Meo, Giacomo Fiumara, and Robert Baumgartner. 2014. Web data extraction, applications and techniques: A survey. Knowledge-Based Systems Vol. 70 (2014), 301--323.","DOI":"10.1016\/j.knosys.2014.07.007"},{"key":"key-10.1145\/3178876.3186008-14","doi-asserted-by":"crossref","unstructured":"Tim Furche, Georg Gottlob, Giovanni Grasso, Omer Gunes, Xiaonan Guo, Andrey Kravchenko, Giorgio Orsi, Christian Schallhart, Andrew Sellers, and Cheng Wang. 2012. DIADEM: domain-centric, intelligent, automated data extraction methodology Proceedings of the 21st International Conference Companion on World Wide Web (WWW '12 Companion). ACM, New York, NY, USA, 267--270.","DOI":"10.1145\/2187980.2188025"},{"key":"key-10.1145\/3178876.3186008-15","doi-asserted-by":"crossref","unstructured":"Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart, and Cheng Wang. 2014. DIADEM: Thousands of Websites to a Single Database. PVLDB Vol. 7, 14 (2014), 1845--1856.","DOI":"10.14778\/2733085.2733091"},{"key":"key-10.1145\/3178876.3186008-16","doi-asserted-by":"crossref","unstructured":"Tim Furche, Georg Gottlob, Giovanni Grasso, Christian Schallhart, and Andrew Sellers. 2013. OXPath: A language for scalable data extraction, automation, and crawling on the deep web. VLDB Journal Vol. 22, 1 (2013), 47--72.","DOI":"10.1007\/s00778-012-0286-6"},{"key":"key-10.1145\/3178876.3186008-17","doi-asserted-by":"crossref","unstructured":"Georg Gottlob, Christoph Koch, Robert Baumgartner, Marcus Herzog, and Sergio Flesca. 2004. The Lixto Data Extraction Project: Back and Forth Between Theory and Practice Proceedings of the Twenty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '04). ACM, New York, NY, USA, 1--12.","DOI":"10.1145\/1055558.1055560"},{"key":"key-10.1145\/3178876.3186008-18","unstructured":"Andrew W. Hogue and David R. Karger. 2005. Thresher: automating the unwrapping of semantic content from the World Wide Web Proceedings of the 14th international conference on World Wide Web, WWW 2005, Chiba, Japan, May 10--14, 2005. 86--95."},{"key":"key-10.1145\/3178876.3186008-19","doi-asserted-by":"crossref","unstructured":"Ekaterini Ioannou, Nataliya Rassadko, and Yannis Velegrakis. 2013. On Generating Benchmark Data for Entity Matching. J. Data Semantics Vol. 2, 1 (2013), 37--56.","DOI":"10.1007\/s13740-012-0015-8"},{"key":"key-10.1145\/3178876.3186008-20","doi-asserted-by":"crossref","unstructured":"Hanna K&#246;pcke and Erhard Rahm. 2010. Frameworks for entity matching: A comparison. Data and Knowledge Engineering Vol. 69, 2 (2010), 197--210.","DOI":"10.1016\/j.datak.2009.10.003"},{"key":"key-10.1145\/3178876.3186008-21","unstructured":"Iraklis Kordomatis, Christoph Herzog, Ruslan R. Fayzrakhmanov, Bernhard Kr&#252;pl-Sypien, Wolfgang Holzinger, and Robert Baumgartner. 2013. Web object identification for web automation and meta-search 3rd International Conference on Web Intelligence, Mining and Semantics, WIMS '13, Madrid, Spain, June 12--14, 2013. 13."},{"key":"key-10.1145\/3178876.3186008-22","doi-asserted-by":"crossref","unstructured":"Jochen Kranzdorf, Andrew Sellers, Giovanni Grasso, Christian Schallhart, and Tim Furche. 2012. Visual OXPath: Robust Wrapping by Example. In Proc. of WWW. 369--372.","DOI":"10.1145\/2187980.2188051"},{"key":"key-10.1145\/3178876.3186008-23","doi-asserted-by":"crossref","unstructured":"Bernhard Kr&#252;pl-Sypien, Ruslan R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenb&#246;ck, and Robert Baumgartner. 2011. A versatile model for web page representation, information extraction and content re-packaging. In Proceedings of the 2011 ACM Symposium on Document Engineering, Mountain View, CA, USA, September 19--22, 2011. 129--138.","DOI":"10.1145\/2034691.2034721"},{"key":"key-10.1145\/3178876.3186008-24","doi-asserted-by":"crossref","unstructured":"Nicholas Kushmerick. 2003. Finite-State Approaches to Web Information Extraction. Information Extraction in the Web Era Vol. 2700 (2003), 77--91.","DOI":"10.1007\/978-3-540-45092-4_4"},{"key":"key-10.1145\/3178876.3186008-25","doi-asserted-by":"crossref","unstructured":"Tessa Lau, Juli&#225;n Cerruti, Guillermo Manzato, Mateo Bengualid, Jeffrey Bigham, and Jeffrey Nichols. 2010. A conversational interface to web automation. Proceedings of the 23nd annual ACM symposium on User interface software and technology (2010), 229--238.","DOI":"10.1145\/1866029.1866067"},{"key":"key-10.1145\/3178876.3186008-26","unstructured":"A. Lemay, J. Niehren, and R. Gilleron. 2006. Learning n-Ary Node Selecting Tree Transducers from Completely Annotated Examples. International Colloquium on Grammatical Inference (ICGI 2006) Vol. 4201 (2006), 253--267."},{"key":"key-10.1145\/3178876.3186008-27","unstructured":"Angel Lagares Lemos, Florian Daniel, and Boualem Benatallah. 2016. Web Service Composition: A Survey of Techniques and Tools. ACM Comput. Surv. Vol. 48, 3 (2016), 33:1--33:41."},{"key":"key-10.1145\/3178876.3186008-28","doi-asserted-by":"crossref","unstructured":"Gilly Leshed, Eben M. Haber, Tara Matthews, and Tessa A. Lau. 2008. CoScripter: automating &#38; sharing how-to knowledge in the enterprise Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5--10, 2008. 1719--1728.","DOI":"10.1145\/1357054.1357323"},{"key":"key-10.1145\/3178876.3186008-29","doi-asserted-by":"crossref","unstructured":"Jun Liu, Cheng Fang, and Nirwan Ansari. 2014. Identifying user clicks based on dependency graph. 2014 23rd Wireless and Optical Communication Conference, WOCC 2014 (2014).","DOI":"10.1109\/WOCC.2014.6839915"},{"key":"key-10.1145\/3178876.3186008-30","unstructured":"Jorn Lyseggen. 2017. Outside Insight: Navigating a World Drowning in Data. Penguin Books Limited. 336 pages."},{"key":"key-10.1145\/3178876.3186008-31","doi-asserted-by":"crossref","unstructured":"Ali Mesbah, Arie van Deursen, and Stefan Lenselink. 2012. Crawling Ajax-based Web applications through dynamic analysis of user interface state changes. ACM Transactions on the Web (TWEB) Vol. 6, 1 (2012), 1--30.","DOI":"10.1145\/2109205.2109208"},{"key":"key-10.1145\/3178876.3186008-32","doi-asserted-by":"crossref","unstructured":"Ion Muslea, Steven Minton, and Craig A. Knoblock. 1999. A Hierarchical Approach to Wrapper Induction. In Agents. 190--197.","DOI":"10.1145\/301136.301191"},{"key":"key-10.1145\/3178876.3186008-33","doi-asserted-by":"crossref","unstructured":"Adi Omari, Sharon Shoham, and Eran Yahav. 2017. Synthesis of forgiving data extractors. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM '17). ACM, New York, 385--394.","DOI":"10.1145\/3018661.3018740"},{"key":"key-10.1145\/3178876.3186008-34","unstructured":"Changhee Park and Sukyoung Ryu. 2015. Scalable and Precise Static Analysis of JavaScript Applications via Loop-Sensitivity 29th European Conference on Object-Oriented Programming, ECOOP 2015, July 5--10, 2015, Prague, Czech Republic. 735--756."},{"key":"key-10.1145\/3178876.3186008-35","unstructured":"Richard Penman. 2016. Web Data Extraction Optimization: From User Interaction To Web Server Communication. MSc Thesis. University of Oxford."},{"key":"key-10.1145\/3178876.3186008-36","unstructured":"Gregor Richards, Sylvain Lebresne, Brian Burg, and Jan Vitek. 2010. An analysis of the dynamic behavior of JavaScript programs Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, Toronto, Ontario, Canada, June 5--10, 2010. 1--12."},{"key":"key-10.1145\/3178876.3186008-37","doi-asserted-by":"crossref","unstructured":"Sunita Sarawagi. 2008. Information extraction. Foundations and Trends in Databases Vol. 1, 3 (2008), 261--377.","DOI":"10.1561\/1900000003"},{"key":"key-10.1145\/3178876.3186008-38","doi-asserted-by":"crossref","unstructured":"Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. 2010. A Symbolic Execution Framework for JavaScript. In 31st IEEE Symposium on Security and Privacy, S&#38;P 2010, 16--19 May 2010, Berleley\/Oakland, California, USA. 513--528.","DOI":"10.1109\/SP.2010.38"},{"key":"key-10.1145\/3178876.3186008-39","doi-asserted-by":"crossref","unstructured":"Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs. 2013. Jalangi: A selective record-replay and dynamic analysis framework for JavaScript Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 488--498.","DOI":"10.1145\/2491411.2491447"},{"key":"key-10.1145\/3178876.3186008-40","doi-asserted-by":"crossref","unstructured":"Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering Vol. 27, 2 (2015), 443--460.","DOI":"10.1109\/TKDE.2014.2327028"},{"key":"key-10.1145\/3178876.3186008-41","unstructured":"Jui Yuan Su, Der Johng Sun, I. Chen Wu, and Lung Pin Chen. 2010. On design of browser-oriented data extraction system and the plug-ins. Journal of Marine Science and Technology Vol. 18, 2 (2010), 189--200."},{"key":"key-10.1145\/3178876.3186008-42","unstructured":"Guowu Xie, Marios Iliofotou, Thomas Karagiannis, Michalis Faloutsos, and Yaohui Jin. 2013. ReSurf: Reconstructing Web-Surfing Activity From Network Traffic. Proc. IFIP Networking Conference (2013), 1--9."},{"key":"key-10.1145\/3178876.3186008-43","doi-asserted-by":"crossref","unstructured":"Yuhong Yan and Min Chen. 2013. Anytime QoS-aware service composition over the GraphPlan. Service Oriented Computing and Applications Vol. 9, 1 (2013), 1--19.","DOI":"10.1007\/s11761-013-0134-6"}],"event":{"name":"the 2018 World Wide Web Conference","location":"Lyon, France","acronym":"WWW '18","number":"2018","sponsor":["SIGWEB, ACM Special Interest Group on Hypertext, Hypermedia, and Web","IW3C2, International World Wide Web Conference Committee"],"start":{"date-parts":[[2018,4,23]]},"end":{"date-parts":[[2018,4,27]]}},"container-title":["Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3178876.3186008","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/dl.acm.org\/ft_gateway.cfm?id=3186008&ftid=1957369&dwn=1","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T17:27:14Z","timestamp":1751563634000},"score":1,"resource":{"primary":{"URL":"http:\/\/dl.acm.org\/citation.cfm?doid=3178876.3186008"}},"subtitle":["Challenges and Opportunities"],"proceedings-subject":"World Wide Web","short-title":[],"issued":{"date-parts":[[2018]]},"references-count":43,"URL":"https:\/\/doi.org\/10.1145\/3178876.3186008","relation":{},"subject":[],"published":{"date-parts":[[2018]]}}}