{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T00:53:14Z","timestamp":1774399994903,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T00:00:00Z","timestamp":1603065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,19]]},"DOI":"10.1145\/3340531.3412782","type":"proceedings-article","created":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T05:31:06Z","timestamp":1603085466000},"page":"3047-3054","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Web Page Segmentation Revisited"],"prefix":"10.1145","author":[{"given":"Johannes","family":"Kiesel","sequence":"first","affiliation":[{"name":"Bauhaus-Universit\u00e4t Weimar, Weimar, Germany"}]},{"given":"Florian","family":"Kneist","sequence":"additional","affiliation":[{"name":"Bauhaus-Universit\u00e4t Weimar, Weimar, Germany"}]},{"given":"Lars","family":"Meyer","sequence":"additional","affiliation":[{"name":"Bauhaus-Universit\u00e4t Weimar, Weimar, Germany"}]},{"given":"Kristof","family":"Komlossy","sequence":"additional","affiliation":[{"name":"Bauhaus-Universit\u00e4t Weimar, Weimar, Germany"}]},{"given":"Benno","family":"Stein","sequence":"additional","affiliation":[{"name":"Bauhaus-Universit\u00e4t Weimar, Weimar, Germany"}]},{"given":"Martin","family":"Potthast","sequence":"additional","affiliation":[{"name":"Leipzig University, Leipzig, Germany"}]}],"member":"320","published-online":{"date-parts":[[2020,10,19]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Current Trends in Web Engineering - ICWE 2013 International Workshops ComposableWeb, QWE, MDWE, DMSSW, EMotions, CSE, SSN, and PhD Symposium, Aalborg, Denmark, July 8--12","author":"Elgin Akpinar M.","year":"2013","unstructured":"M. Elgin Akpinar and Yeliz Yesilada . 2013. Vision Based Page Segmentation Algorithm: Extended and Perceived Success . In Current Trends in Web Engineering - ICWE 2013 International Workshops ComposableWeb, QWE, MDWE, DMSSW, EMotions, CSE, SSN, and PhD Symposium, Aalborg, Denmark, July 8--12 , 2013 . Revised Selected Papers . 238--252. https:\/\/doi.org\/10.1007\/978--3--319-04244--2_22 10.1007\/978--3--319-04244--2_22 M. Elgin Akpinar and Yeliz Yesilada. 2013. Vision Based Page Segmentation Algorithm: Extended and Perceived Success. In Current Trends in Web Engineering - ICWE 2013 International Workshops ComposableWeb, QWE, MDWE, DMSSW, EMotions, CSE, SSN, and PhD Symposium, Aalborg, Denmark, July 8--12, 2013. Revised Selected Papers. 238--252. https:\/\/doi.org\/10.1007\/978--3--319-04244--2_22"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-008-9066-8"},{"key":"e_1_3_2_2_3_1","volume-title":"Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019. 423--431","author":"Andrew Judith Jeyafreeda","year":"2019","unstructured":"Judith Jeyafreeda Andrew , Stephane Ferrari , Fabrice Maurel , Gael Dias , and Emmanuel Giguet . 2019 . Web Page Segmentation for non-visual Skimming . In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019. 423--431 . Judith Jeyafreeda Andrew, Stephane Ferrari, Fabrice Maurel, Gael Dias, and Emmanuel Giguet. 2019. Web Page Segmentation for non-visual Skimming. In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019. 423--431."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135788"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609630"},{"key":"e_1_3_2_2_6_1","volume-title":"5th Asian-Pacific Web Conference, APWeb 2003, Xian, China, April 23--25, 2002, Proceedings. 406--417","author":"Cai Deng","year":"2003","unstructured":"Deng Cai , Shipeng Yu , Ji-Rong Wen , and Wei-Ying Ma . 2003 . Extracting Content Structure for Web Pages Based on Visual Representation. In Web Technologies and Applications , 5th Asian-Pacific Web Conference, APWeb 2003, Xian, China, April 23--25, 2002, Proceedings. 406--417 . https:\/\/doi.org\/10.1007\/3--540--36901--5_42 10.1007\/3--540--36901--5_42 Deng Cai, Shipeng Yu, Ji-Rong Wen, and Wei-Ying Ma. 2003. Extracting Content Structure for Web Pages Based on Visual Representation. In Web Technologies and Applications, 5th Asian-Pacific Web Conference, APWeb 2003, Xian, China, April 23--25, 2002, Proceedings. 406--417. https:\/\/doi.org\/10.1007\/3--540--36901--5_42"},{"key":"e_1_3_2_2_7_1","first-page":"6","article-title":"A Computational Approach to Edge Detection","volume":"8","author":"Canny John","year":"1986","unstructured":"John Canny . 1986 . A Computational Approach to Edge Detection . IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 8 , 6 (June 1986), 679--698. https:\/\/doi.org\/10.1109\/TPAMI.1986.4767851 10.1109\/TPAMI.1986.4767851 John Canny. 1986. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. , Vol. 8, 6 (June 1986), 679--698. https:\/\/doi.org\/10.1109\/TPAMI.1986.4767851","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_2_8_1","first-page":"93","article-title":"A Segmentation Method for Web Page Analysis Using Shrinking and Dividing","volume":"25","author":"Cao Jiuxin","year":"2010","unstructured":"Jiuxin Cao , Bo Mao , and Junzhou Luo . 2010 . A Segmentation Method for Web Page Analysis Using Shrinking and Dividing . IJPEDS , Vol. 25 , 2 (2010), 93 -- 104 . https:\/\/doi.org\/10.1080\/17445760802429585 10.1080\/17445760802429585 Jiuxin Cao, Bo Mao, and Junzhou Luo. 2010. A Segmentation Method for Web Page Analysis Using Shrinking and Dividing. IJPEDS, Vol. 25, 2 (2010), 93--104. https:\/\/doi.org\/10.1080\/17445760802429585","journal-title":"IJPEDS"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367549"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372161"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.229"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CRV.2017.38"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.02.007"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159661"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304223"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009949"},{"key":"e_1_3_2_2_17_1","unstructured":"E. Bruce Goldstein. 2009. Sensation and Perception 8 ed.). Cengage Learning.  E. Bruce Goldstein. 2009. Sensation and Perception 8 ed.). Cengage Learning."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242622"},{"key":"e_1_3_2_2_19_1","volume-title":"Dubes","author":"Jain Anil K.","year":"1988","unstructured":"Anil K. Jain and Richard C . Dubes . 1988 . Algorithms for Clustering Data .Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data .Prentice-Hall, Inc., Upper Saddle River, NJ, USA."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/JCDL.2019.00065"},{"key":"e_1_3_2_2_21_1","first-page":"4","article-title":"Reproducible Web Corpora: Interactive Archiving with Automatic Quality Assessment","volume":"10","author":"Kiesel Johannes","year":"2018","unstructured":"Johannes Kiesel , Florian Kneist , Milad Alshomary , Benno Stein , Matthias Hagen , and Martin Potthast . 2018 . Reproducible Web Corpora: Interactive Archiving with Automatic Quality Assessment . Journal of Data and Information Quality (JDIQ) , Vol. 10 , 4 (Oct. 2018), 17:1--17:25. https:\/\/doi.org\/10.1145\/3239574 10.1145\/3239574 Johannes Kiesel, Florian Kneist, Milad Alshomary, Benno Stein, Matthias Hagen, and Martin Potthast. 2018. Reproducible Web Corpora: Interactive Archiving with Automatic Quality Assessment. Journal of Data and Information Quality (JDIQ), Vol. 10, 4 (Oct. 2018), 17:1--17:25. https:\/\/doi.org\/10.1145\/3239574","journal-title":"Journal of Data and Information Quality (JDIQ)"},{"key":"e_1_3_2_2_22_1","volume-title":"Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008","author":"Christian","year":"2008","unstructured":"Christian Kohlsch\u00fc tter and Wolfgang Nejdl. 2008. A Densitometric Approach to Web Page Segmentation . In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008 , Napa Valley, California, USA, October 26--30 , 2008 . 1173--1182. https:\/\/doi.org\/10.1145\/1458082.1458237 10.1145\/1458082.1458237 Christian Kohlsch\u00fc tter and Wolfgang Nejdl. 2008. A Densitometric Approach to Web Page Segmentation. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, Napa Valley, California, USA, October 26--30, 2008. 1173--1182. https:\/\/doi.org\/10.1145\/1458082.1458237"},{"key":"e_1_3_2_2_23_1","volume-title":"Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM). 250--257","author":"Kovacevic Milos","year":"2002","unstructured":"Milos Kovacevic , Michelangelo Diligenti , Marco Gori , and Veljko M. Milutinovic . 2002. Recognition of Common Areas in a Web Page Using Visual Information: A Possible Application in a Page Classification . In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM). 250--257 . https:\/\/doi.org\/10.1109\/ICDM. 2002 .1183910 10.1109\/ICDM.2002.1183910 Milos Kovacevic, Michelangelo Diligenti, Marco Gori, and Veljko M. Milutinovic. 2002. Recognition of Common Areas in a Web Page Using Visual Information: A Possible Application in a Page Classification. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM). 250--257. https:\/\/doi.org\/10.1109\/ICDM.2002.1183910"},{"key":"e_1_3_2_2_24_1","volume-title":"ICWE 2015","author":"Kreuzer Robert","year":"2015","unstructured":"Robert Kreuzer , Jurriaan Hage , and Ad Feelders . 2015 . A Quantitative Comparison of Semantic Web Page Segmentation Approaches. In Engineering the Web in the Big Data Era - 15th International Conference , ICWE 2015 . 374--391. https:\/\/doi.org\/10.1007\/978--3--319--19890--3_24 10.1007\/978--3--319--19890--3_24 Robert Kreuzer, Jurriaan Hage, and Ad Feelders. 2015. A Quantitative Comparison of Semantic Web Page Segmentation Approaches. In Engineering the Web in the Big Data Era - 15th International Conference, ICWE 2015. 374--391. https:\/\/doi.org\/10.1007\/978--3--319--19890--3_24"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.2307\/271061"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584885"},{"key":"e_1_3_2_2_27_1","volume-title":"Piotr Doll\u00e1 r, and C. Lawrence Zitnick","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin , Michael Maire , Serge J. Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1 r, and C. Lawrence Zitnick . 2014 . Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference . 740--755. https:\/\/doi.org\/10.1007\/978--3--319--10602--1_48 http:\/\/cocodataset.org\/. 10.1007\/978--3--319--10602--1_48 Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1 r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference. 740--755. https:\/\/doi.org\/10.1007\/978--3--319--10602--1_48 http:\/\/cocodataset.org\/."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824058"},{"key":"e_1_3_2_2_29_1","volume-title":"Proceedings of the 8th International Conference on Computer Vision","volume":"2","author":"Martin D.","unstructured":"D. Martin , C. Fowlkes , D. Tal , and J. Malik . 2001. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics . In Proceedings of the 8th International Conference on Computer Vision , Vol. 2 . 416--423. D. Martin, C. Fowlkes, D. Tal, and J. Malik. 2001. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In Proceedings of the 8th International Conference on Computer Vision, Vol. 2. 416--423."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.3844\/jcssp.2012.2053.2061"},{"key":"e_1_3_2_2_31_1","volume-title":"BDA'13","author":"Sanoja Andr\u00e9s","year":"2013","unstructured":"Andr\u00e9s Sanoja and St\u00e9phane Ganc carski. 2013 . Block-o-Matic: A Web Page Segmentation Tool and its Evaluation. In 29\u00e8me journ\u00e9es \u201dBase de donn\u00e9es avanc\u00e9es \u201d, BDA'13 . 5. Andr\u00e9s Sanoja and St\u00e9phane Ganc carski. 2013. Block-o-Matic: A Web Page Segmentation Tool and its Evaluation. In 29\u00e8me journ\u00e9es \u201dBase de donn\u00e9es avanc\u00e9es\u201d, BDA'13. 5."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2695664.2695786"},{"key":"e_1_3_2_2_33_1","volume-title":"ADBIS 2017, Nicosia, Cyprus, September 24--27, 2017, Proceedings. 375--393","author":"Sanoja Andr\u00e9","year":"2017","unstructured":"Andr\u00e9 s Sanoja and St\u00e9 phane Gancc arski. 2017 . Migrating Web Archives from HTML4 to HTML5: A Block-Based Approach and Its Evaluation. In Advances in Databases and Information Systems - 21st European Conference , ADBIS 2017, Nicosia, Cyprus, September 24--27, 2017, Proceedings. 375--393 . https:\/\/doi.org\/10.1007\/978--3--319--66917--5_25 10.1007\/978--3--319--66917--5_25 Andr\u00e9 s Sanoja and St\u00e9 phane Gancc arski. 2017. Migrating Web Archives from HTML4 to HTML5: A Block-Based Approach and Its Evaluation. In Advances in Databases and Information Systems - 21st European Conference, ADBIS 2017, Nicosia, Cyprus, September 24--27, 2017, Proceedings. 375--393. https:\/\/doi.org\/10.1007\/978--3--319--66917--5_25"},{"key":"e_1_3_2_2_34_1","first-page":"1409","article-title":"A Statistical Method for Evaluating Systematic Relationships","volume":"38","author":"Sokal Robert R.","year":"1958","unstructured":"Robert R. Sokal and Charles D. Michener . 1958 . A Statistical Method for Evaluating Systematic Relationships . Univ. of Kansas Science Bulletin , Vol. 38 (1958), 1409 -- 1438 . Robert R. Sokal and Charles D. Michener. 1958. A Statistical Method for Evaluating Systematic Relationships. Univ. of Kansas Science Bulletin , Vol. 38 (1958), 1409--1438.","journal-title":"Univ. of Kansas Science Bulletin"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1860559.1860590"},{"key":"e_1_3_2_2_36_1","volume-title":"6th International Conference on Web Information Systems Engineering","author":"Vadrevu Srinivas","year":"2005","unstructured":"Srinivas Vadrevu , Fatih Gelgi , and Hasan Davulcu . 2005 . Semantic Partitioning of Web Pages. In Web Information Systems Engineering - WISE 2005 , 6th International Conference on Web Information Systems Engineering , New York, NY, USA, November 20--22 , 2005, Proceedings. 107--118. https:\/\/doi.org\/10.1007\/11581062_9 10.1007\/11581062_9 Srinivas Vadrevu, Fatih Gelgi, and Hasan Davulcu. 2005. Semantic Partitioning of Web Pages. In Web Information Systems Engineering - WISE 2005, 6th International Conference on Web Information Systems Engineering, New York, NY, USA, November 20--22, 2005, Proceedings. 107--118. https:\/\/doi.org\/10.1007\/11581062_9"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.02.002"}],"event":{"name":"CIKM '20: The 29th ACM International Conference on Information and Knowledge Management","location":"Virtual Event Ireland","acronym":"CIKM '20","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340531.3412782","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3340531.3412782","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:56Z","timestamp":1750197776000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340531.3412782"}},"subtitle":["Evaluation Framework and Dataset"],"short-title":[],"issued":{"date-parts":[[2020,10,19]]},"references-count":37,"alternative-id":["10.1145\/3340531.3412782","10.1145\/3340531"],"URL":"https:\/\/doi.org\/10.1145\/3340531.3412782","relation":{},"subject":[],"published":{"date-parts":[[2020,10,19]]},"assertion":[{"value":"2020-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}