{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:55Z","timestamp":1750220095557,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003246","name":"Nederlandse Organisatie voor Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["CISC.CC.016"],"award-info":[{"award-number":["CISC.CC.016"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,8,23]]},"DOI":"10.1145\/3539813.3545150","type":"proceedings-article","created":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T22:18:32Z","timestamp":1661465912000},"page":"24-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["WooIR"],"prefix":"10.1145","author":[{"given":"Ruben","family":"van Heusden","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]},{"given":"Jaap","family":"Kamps","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]},{"given":"Maarten","family":"Marx","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2022,8,25]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Sixth International Conference on Graphic and Image Processing (ICGIP","volume":"9443","author":"Agin Onur","year":"2015","unstructured":"Onur Agin , Cagdas Ulas , Mehmet Ahat , and Can Bekar . 2015 . An approach to the segmentation of multi-page document flow using binary classification . In Sixth International Conference on Graphic and Image Processing (ICGIP 2014), Yulin Wang, Xudong Jiang, and David Zhang (Eds.) , Vol. 9443 . International Society for Optics and Photonics, SPIE, 216 -- 222. https:\/\/doi.org\/10.1117\/12.2178778 Onur Agin, Cagdas Ulas, Mehmet Ahat, and Can Bekar. 2015. An approach to the segmentation of multi-page document flow using binary classification. In Sixth International Conference on Graphic and Image Processing (ICGIP 2014), Yulin Wang, Xudong Jiang, and David Zhang (Eds.), Vol. 9443. International Society for Optics and Photonics, SPIE, 216 -- 222. https:\/\/doi.org\/10.1117\/12.2178778"},{"key":"e_1_3_2_1_2_1","volume-title":"A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval","author":"Amig\u00f3 Enrique","year":"2009","unstructured":"Enrique Amig\u00f3 , Julio Gonzalo , Javier Artiles , and Felisa Verdejo . 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval , Vol. 12 , 4 ( 2009 ), 461--486. Enrique Amig\u00f3, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information retrieval , Vol. 12, 4 (2009), 461--486."},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics -","volume":"1","author":"Bagga Amit","year":"1998","unstructured":"Amit Bagga and Breck Baldwin . 1998 . Entity-Based Cross-Document Coreferencing Using the Vector Space Model . In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada) (ACL '98\/COLING '98). Association for Computational Linguistics, USA, 79--85. https:\/\/doi.org\/10.3115\/980845.980859 Amit Bagga and Breck Baldwin. 1998. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada) (ACL '98\/COLING '98). Association for Computational Linguistics, USA, 79--85. https:\/\/doi.org\/10.3115\/980845.980859"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_4_1","DOI":"10.18653\/v1\/2020.acl-main.29"},{"key":"e_1_3_2_1_5_1","volume-title":"Text Segmentation Using Exponential Models. In Second Conference on Empirical Methods in Natural Language Processing . https:\/\/aclanthology.org\/W97-0304","author":"Beeferman Doug","year":"1997","unstructured":"Doug Beeferman , Adam Berger , and John Lafferty . 1997 . Text Segmentation Using Exponential Models. In Second Conference on Empirical Methods in Natural Language Processing . https:\/\/aclanthology.org\/W97-0304 Doug Beeferman, Adam Berger, and John Lafferty. 1997. Text Segmentation Using Exponential Models. In Second Conference on Empirical Methods in Natural Language Processing . https:\/\/aclanthology.org\/W97-0304"},{"key":"e_1_3_2_1_6_1","volume-title":"Statistical models for text segmentation. Machine learning","author":"Beeferman Doug","year":"1999","unstructured":"Doug Beeferman , Adam Berger , and John Lafferty . 1999. Statistical models for text segmentation. Machine learning , Vol. 34 , 1 ( 1999 ), 177--210. Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine learning , Vol. 34, 1 (1999), 177--210."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_7_1","DOI":"10.1016\/j.engappai.2021.104394"},{"key":"e_1_3_2_1_8_1","volume-title":"1st Meeting of the North American Chapter of the Association for Computational Linguistics . https:\/\/aclanthology.org\/A00--2004","author":"Choi Freddy Y. Y.","year":"2000","unstructured":"Freddy Y. Y. Choi . 2000 . Advances in domain independent linear text segmentation . In 1st Meeting of the North American Chapter of the Association for Computational Linguistics . https:\/\/aclanthology.org\/A00--2004 Freddy Y. Y. Choi. 2000. Advances in domain independent linear text segmentation. In 1st Meeting of the North American Chapter of the Association for Computational Linguistics . https:\/\/aclanthology.org\/A00--2004"},{"key":"e_1_3_2_1_9_1","volume-title":"SIGIR 2002 Workshop on Information Retrieval and OCR: From Converting Content to Grasping, Meaning","author":"Collins-Thompson Kevyn","year":"2002","unstructured":"Kevyn Collins-Thompson and Radoslav Nickolov . 2002 . A clustering-based algorithm for automatic document separation . In SIGIR 2002 Workshop on Information Retrieval and OCR: From Converting Content to Grasping, Meaning . Tampere, Finland. Kevyn Collins-Thompson and Radoslav Nickolov. 2002. A clustering-based algorithm for automatic document separation. In SIGIR 2002 Workshop on Information Retrieval and OCR: From Converting Content to Grasping, Meaning . Tampere, Finland."},{"key":"e_1_3_2_1_10_1","volume-title":"Document Recognition and Retrieval XXI","volume":"9021","author":"Daher Hani","year":"2014","unstructured":"Hani Daher and Abdel Bela\"id. 2014 . Document flow segmentation for business applications . In Document Recognition and Retrieval XXI , Vol. 9021 . International Society for Optics and Photonics, 90210G. Hani Daher and Abdel Bela\"id. 2014. Document flow segmentation for business applications. In Document Recognition and Retrieval XXI, Vol. 9021. International Society for Optics and Photonics, 90210G."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_11_1","DOI":"10.1145\/363958.363994"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_12_1","DOI":"10.1109\/ICDAR.2011.163"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_13_1","DOI":"10.3115\/1075096.1075167"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_14_1","DOI":"10.1109\/DICTA.2016.7797031"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_15_1","DOI":"10.1109\/ICDAR.2013.128"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_16_1","DOI":"10.1109\/ACCESS.2022.3144185"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_17_1","DOI":"10.1109\/DAS.2018.66"},{"key":"e_1_3_2_1_18_1","volume-title":"Error detecting and error correcting codes. The Bell system technical journal","author":"Hamming Richard W","year":"1950","unstructured":"Richard W Hamming . 1950. Error detecting and error correcting codes. The Bell system technical journal , Vol. 29 , 2 ( 1950 ), 147--160. Richard W Hamming. 1950. Error detecting and error correcting codes. The Bell system technical journal , Vol. 29, 2 (1950), 147--160."},{"key":"e_1_3_2_1_19_1","volume-title":"Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics","author":"Hearst Marti A","year":"1997","unstructured":"Marti A Hearst . 1997. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics , Vol. 23 , 1 ( 1997 ), 33--64. Marti A Hearst. 1997. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics , Vol. 23, 1 (1997), 33--64."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_20_1","DOI":"10.21437\/ICSLP.1998-582"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_21_1","DOI":"10.1109\/DAS.2016.21"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_22_1","DOI":"10.1145\/3340531.3412782"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_23_1","DOI":"10.1109\/CVPR.2019.00963"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_24_1","DOI":"10.1145\/1244002.1244140"},{"volume-title":"Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Lewis D.","unstructured":"D. Lewis , G. Agam , S. Argamon , O. Frieder , D. Grossman , and J. Heard . 2006. Building a Test Collection for Complex Document Information Processing . In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 665--666. https:\/\/doi.org\/10.1145\/1148170.1148307 D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard. 2006. Building a Test Collection for Complex Document Information Processing. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 665--666. https:\/\/doi.org\/10.1145\/1148170.1148307","key":"e_1_3_2_1_25_1"},{"key":"e_1_3_2_1_26_1","volume-title":"Li and Anil Jain (Eds.)","author":"Stan","year":"2009","unstructured":"Stan Z. Li and Anil Jain (Eds.) . 2009 . Hamming Distance .Springer US, Boston, MA , 668--668. https:\/\/doi.org\/10.1007\/978-0--387--73003--5_956 Stan Z. Li and Anil Jain (Eds.). 2009. Hamming Distance .Springer US, Boston, MA, 668--668. https:\/\/doi.org\/10.1007\/978-0--387--73003--5_956"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_28_1","DOI":"10.1117\/12.805646"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_29_1","DOI":"10.1016\/j.ipm.2010.11.008"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_30_1","DOI":"10.5220\/0009146402200227"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_31_1","DOI":"10.1109\/ICFHR-2018.2018.00011"},{"volume-title":"Machine Learning and Data Mining in Pattern Recognition","author":"Paliwal Shashank","unstructured":"Shashank Paliwal and Vikram Pudi . 2012. Investigating Usage of Text Segmentation and Inter-passage Similarities to Improve Text Document Clustering . In Machine Learning and Data Mining in Pattern Recognition , Petra Perner (Ed.). Springer Berlin Heidelberg , Berlin, Heidelberg , 555--565. Shashank Paliwal and Vikram Pudi. 2012. Investigating Usage of Text Segmentation and Inter-passage Similarities to Improve Text Document Clustering. In Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 555--565.","key":"e_1_3_2_1_32_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_33_1","DOI":"10.1007\/3-540-45683-X_46"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_34_1","DOI":"10.1162\/089120102317341756"},{"key":"e_1_3_2_1_35_1","volume-title":"An Automatic Method of Finding Topic Boundaries. In 32nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics","author":"Reynar Jeffrey C.","year":"1994","unstructured":"Jeffrey C. Reynar . 1994 . An Automatic Method of Finding Topic Boundaries. In 32nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics , Las Cruces, New Mexico, USA, 331--333. https:\/\/doi.org\/10.3115\/981732.981783 Jeffrey C. Reynar. 1994. An Automatic Method of Finding Topic Boundaries. In 32nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, New Mexico, USA, 331--333. https:\/\/doi.org\/10.3115\/981732.981783"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_36_1","DOI":"10.1007\/s10032-014-0225-8"},{"key":"e_1_3_2_1_37_1","volume-title":"Adaptive document image binarization. Pattern recognition","author":"Sauvola Jaakko","year":"2000","unstructured":"Jaakko Sauvola and Matti Pietik\"ainen. 2000. Adaptive document image binarization. Pattern recognition , Vol. 33 , 2 ( 2000 ), 225--236. Jaakko Sauvola and Matti Pietik\"ainen. 2000. Adaptive document image binarization. Pattern recognition , Vol. 33, 2 (2000), 225--236."},{"volume-title":"Proceedings of the 33rd European Conference on Advances in Information Retrieval","author":"Song Fei","unstructured":"Fei Song , William M. Darling , Adnan Duric , and Fred W. Kroon . 2011. An Iterative Approach to Text Segmentation . In Proceedings of the 33rd European Conference on Advances in Information Retrieval ( Dublin, Ireland) (ECIR'11). Springer-Verlag, Berlin, Heidelberg, 629--640. Fei Song, William M. Darling, Adnan Duric, and Fred W. Kroon. 2011. An Iterative Approach to Text Segmentation. In Proceedings of the 33rd European Conference on Advances in Information Retrieval (Dublin, Ireland) (ECIR'11). Springer-Verlag, Berlin, Heidelberg, 629--640.","key":"e_1_3_2_1_38_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_39_1","DOI":"10.1016\/j.asoc.2004.10.009"},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Wiedemann Gregor","year":"2018","unstructured":"Gregor Wiedemann and Gerhard Heyer . 2018 . Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features . In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) . European Language Resources Association (ELRA), Miyazaki, Japan. https:\/\/aclanthology.org\/L18--1581 Gregor Wiedemann and Gerhard Heyer. 2018. Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) . European Language Resources Association (ELRA), Miyazaki, Japan. https:\/\/aclanthology.org\/L18--1581"},{"volume-title":"Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion","author":"Zhu Jingbo","unstructured":"Jingbo Zhu , Muhua Zhu , Huizhen Wang , and Benjamin K. Tsou . 2009. Aspect-Based Sentence Segmentation for Sentiment Summarization . In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion ( Hong Kong, China) (TSA '09). Association for Computing Machinery, New York, NY, USA, 65--72. https:\/\/doi.org\/10.1145\/1651461.1651474 Jingbo Zhu, Muhua Zhu, Huizhen Wang, and Benjamin K. Tsou. 2009. Aspect-Based Sentence Segmentation for Sentiment Summarization. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion (Hong Kong, China) (TSA '09). Association for Computing Machinery, New York, NY, USA, 65--72. https:\/\/doi.org\/10.1145\/1651461.1651474","key":"e_1_3_2_1_41_1"}],"event":{"sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"],"acronym":"ICTIR '22","name":"ICTIR '22: The 2022 ACM SIGIR International Conference on the Theory of Information Retrieval","location":"Madrid Spain"},"container-title":["Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539813.3545150","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3539813.3545150","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:04Z","timestamp":1750183804000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539813.3545150"}},"subtitle":["A New Open Page Stream Segmentation Dataset"],"short-title":[],"issued":{"date-parts":[[2022,8,23]]},"references-count":40,"alternative-id":["10.1145\/3539813.3545150","10.1145\/3539813"],"URL":"https:\/\/doi.org\/10.1145\/3539813.3545150","relation":{},"subject":[],"published":{"date-parts":[[2022,8,23]]},"assertion":[{"value":"2022-08-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}