{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T07:40:31Z","timestamp":1774424431705,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,7,13]]},"DOI":"10.1145\/3726302.3729882","type":"proceedings-article","created":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T14:55:26Z","timestamp":1752504926000},"page":"170-179","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["A New HOPE: Domain-agnostic Automatic Evaluation of Text Chunking"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1601-8884","authenticated-orcid":false,"given":"Henrik","family":"Br\u00e5dland","sequence":"first","affiliation":[{"name":"Centre for Artificial Intelligence Research, University of Agder, Kristiansand, Agder, Norway and Norkart AS, Oslo, Oslo, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6331-702X","authenticated-orcid":false,"given":"Morten","family":"Goodwin","sequence":"additional","affiliation":[{"name":"Centre for Artificial Intelligence Research, University of Agder, Kristiansand, Agder, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7742-4907","authenticated-orcid":false,"given":"Per-Arne","family":"Andersen","sequence":"additional","affiliation":[{"name":"Centre for Artificial Intelligence Research, University of Agder, Kristiansand, Agder, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1769-3194","authenticated-orcid":false,"given":"Alexander S.","family":"Nossum","sequence":"additional","affiliation":[{"name":"Norkart AS, Oslo, Norway"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3128-2517","authenticated-orcid":false,"given":"Aditya","family":"Gupta","sequence":"additional","affiliation":[{"name":"Centre for Artificial Intelligence Research, University of Agder, Kristiansand, Agder, Norway"}]}],"member":"320","published-online":{"date-parts":[[2025,7,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","unstructured":"Scott Barnett Stefanus Kurniawan Srikanth Thudumu Zach Brannelly and Mohamed Abdelrazek. 2024. Seven Failure Points When Engineering a Retrieval Augmented Generation System. section 5 194--199. isbn: 9798400705915. doi: 10.1145\/3644815.3644945.","DOI":"10.1145\/3644815.3644945"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1253\/jcj.34.1213"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657834"},{"key":"e_1_3_2_1_4_1","volume-title":"Kristina Toutanova Google, and A I Language","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova Google, and A I Language. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Tech. rep. https:\/\/github.com\/tensorflow\/tensor2ten sor."},{"key":"e_1_3_2_1_5_1","unstructured":"Abhimanyu Dubey et al. 2024. The Llama 3 Herd of Models 1--92. http:\/\/arxiv.org\/abs\/2407.21783."},{"key":"e_1_3_2_1_6_1","unstructured":"Darren Edge Ha Trinh Newman Cheng Joshua Bradley Alex Chao Apurva Mody Steven Truitt and Jonathan Larson. 2024. From Local to Global: A Graph RAG Approach to Query-Focused Summarization 1--15. http:\/\/arxiv.org\/abs\/2404.16130."},{"key":"e_1_3_2_1_7_1","volume-title":"RAGAS: Automated Evaluation of Retrieval Augmented Generation. EACL 2024 - 18th Conference of the European","author":"Es Shahul","year":"2024","unstructured":"Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. 2024. RAGAS: Automated Evaluation of Retrieval Augmented Generation. EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations, 150--158. isbn: 9798891760912."},{"key":"e_1_3_2_1_8_1","volume-title":"Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification","author":"Gunjal Anisha","year":"2024","unstructured":"Anisha Gunjal and Greg Durrett. 2024. Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification. http:\/\/arxiv.org\/abs\/2406.20079."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3486250"},{"key":"e_1_3_2_1_10_1","volume-title":"Ho","author":"Peter","year":"2022","unstructured":"Peter Henderson*, Mark S. Krass*, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, and Daniel E. Ho. 2022. Pile of law: learning responsible data filtering from the law and a 256gb open-source legal dataset. (2022). https:\/\/arxiv.org\/abs\/2207.00220."},{"key":"e_1_3_2_1_11_1","volume-title":"Measuring Massive Multitask Language Understanding. ICLR 2021 - 9th International Conference on Learning Representations.","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. ICLR 2021 - 9th International Conference on Learning Representations."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/p19--1612"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3390\/math11041006"},{"key":"e_1_3_2_1_14_1","unstructured":"Zehan Li Xin Zhang Yanzhao Zhang Dingkun Long Pengjun Xie and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. http:\/\/arxiv.org\/abs\/2308.03281."},{"key":"e_1_3_2_1_15_1","volume-title":"KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, 1--33","author":"Lei Liang","year":"2024","unstructured":"Lei Liang et al. 2024. KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation, 1--33. http:\/\/arxiv.org\/abs\/2409.13731."},{"key":"e_1_3_2_1_16_1","unstructured":"Lin Long Rui Wang Ruixuan Xiao Junbo Zhao Xiao Ding Gang Chen and HaoboWang. 2024. On LLMs-Driven Synthetic Data Generation Curation and Evaluation: A Survey. isbn: 9798891760998. http:\/\/arxiv.org\/abs\/2406.15126."},{"key":"e_1_3_2_1_17_1","unstructured":"Anurag Mishra. [n. d.] Five Levels of Chunking Strategies in RAG| Notes from Greg's Video - anuragmishra_27746. https:\/\/medium.com\/@anuragmishra_27 746\/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d. [Accessed 20--11--2024]. ()."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.148"},{"key":"e_1_3_2_1_19_1","unstructured":"Arvind Neelakantan et al. 2022. Text and Code Embeddings by Contrastive Pre-Training. http:\/\/arxiv.org\/abs\/2201.10005."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.193"},{"key":"e_1_3_2_1_21_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. 4 1--100. http:\/\/arxiv.org\/abs\/2303.08774."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1002\/andp.19223712302"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3266377"},{"key":"e_1_3_2_1_24_1","volume-title":"BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS","author":"Thakur Nandan","year":"2021","unstructured":"Nandan Thakur, Nils Reimers, Andreas R\u00fcckl\u00e9, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS. http:\/\/arxiv.org\/abs\/2104.08663."},{"key":"e_1_3_2_1_25_1","unstructured":"Hugo Touvron et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. http:\/\/arxiv.org\/abs\/2307.09288."},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020","author":"Lu Lucy","year":"2020","unstructured":"Lucy Lu Wang et al. 2020. CORD-19: the COVID-19 open research dataset. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics, Online (July 2020). https:\/\/www.aclweb.org\/anthology\/2020.nlpcovid19-acl.1."},{"key":"e_1_3_2_1_27_1","unstructured":"Zhilin Wang Alexander Bukharin Olivier Delalleau Daniel Egert Gerald Shen Jiaqi Zeng Oleksii Kuchaiev and Yi Dong. 2024. HelpSteer2-Preference: Complementing Ratings with Preferences 1--26. http:\/\/arxiv.org\/abs\/2410.01257."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","unstructured":"Miriam Wanner Seth Ebner Zhengping Jiang Mark Dredze and Benjamin Van Durme. 2024. A Closer Look at Claim Decomposition 153--175. isbn: 9798891761063. doi: 10.18653\/v1\/2024.starsem-1.13.","DOI":"10.18653\/v1\/2024.starsem-1.13"},{"key":"e_1_3_2_1_29_1","volume-title":"Quoc V. Le, and Denny Zhou.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35, NeurIPS, 1--14. isbn: 9781713871088."},{"key":"e_1_3_2_1_30_1","unstructured":"Jules White Quchen Fu Sam Hays Michael Sandborn Carlos Olea Henry Gilbert Ashraf Elnashar Jesse Spencer-Smith and Douglas C Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. http:\/\/arxiv.org\/abs\/2302.11382."},{"key":"e_1_3_2_1_31_1","unstructured":"ShangyuWu et al. 2024. Retrieval-Augmented Generation for Natural Language Processing: A Survey. http:\/\/arxiv.org\/abs\/2407.13193."},{"key":"e_1_3_2_1_32_1","unstructured":"Siye Wu Jian Xie Jiangjie Chen Tinghui Zhu Kai Zhang and Yanghua Xiao. 2024. How Easily do Irrelevant Inputs Skew the Responses of Large Language Models? 1--20. http:\/\/arxiv.org\/abs\/2404.03302."},{"key":"e_1_3_2_1_33_1","unstructured":"Zhiheng Xi et al. 2023. The Rise and Potential of Large Language Model Based Agents: A Survey. http:\/\/arxiv.org\/abs\/2309.07864."},{"key":"e_1_3_2_1_34_1","unstructured":"Antonio Jimeno Yepes Yao You Jan Milczek Sebastian Laverde and Renyu Li. 2024. Financial Report Chunking for Effective Retrieval Augmented Generation. http:\/\/arxiv.org\/abs\/2402.05131."},{"key":"e_1_3_2_1_35_1","unstructured":"Siyun Zhao Yuqing Yang ZilongWang Zhiyuan He Luna K. Qiu and Lili Qiu. 2024. Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely. http:\/\/arxiv.org\/abs\/2409.14924."},{"key":"e_1_3_2_1_36_1","unstructured":"Wayne Xin Zhao et al. 2023. A Survey of Large Language Models (Mar. 2023). http:\/\/arxiv.org\/abs\/2303.18223."},{"key":"e_1_3_2_1_37_1","unstructured":"Zijie Zhong Hanwen Liu Xiaoya Cui Xiaofan Zhang and Zengchang Qin. 2024. Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval- Augmented Generation 1--17. http:\/\/arxiv.org\/abs\/2406.00456."}],"event":{"name":"SIGIR '25: The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval","location":"Padua Italy","acronym":"SIGIR '25","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3726302.3729882","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T18:36:05Z","timestamp":1755887765000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3726302.3729882"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,13]]},"references-count":37,"alternative-id":["10.1145\/3726302.3729882","10.1145\/3726302"],"URL":"https:\/\/doi.org\/10.1145\/3726302.3729882","relation":{},"subject":[],"published":{"date-parts":[[2025,7,13]]},"assertion":[{"value":"2025-07-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}