{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T12:39:17Z","timestamp":1773491957071,"version":"3.50.1"},"publisher-location":"Singapore","reference-count":35,"publisher":"Springer Nature Singapore","isbn-type":[{"value":"9789819527243","type":"print"},{"value":"9789819527250","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026]]},"DOI":"10.1007\/978-981-95-2725-0_25","type":"book-chapter","created":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T05:19:15Z","timestamp":1761887955000},"page":"408-434","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Beyond Instruction Following: Evaluating Inferential Rule Following of\u00a0Large Language Models"],"prefix":"10.1007","author":[{"given":"Wangtao","family":"Sun","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenxiang","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xueyou","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuanqing","family":"Yu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziyang","family":"Huang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haotian","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shizhu","family":"He","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Zhao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kang","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,11,1]]},"reference":[{"key":"25_CR1","unstructured":"AI@Meta: LLaMA 3 model card (2024). https:\/\/github.com\/meta-llama\/llama3\/blob\/main\/MODEL_CARD.md"},{"key":"25_CR2","doi-asserted-by":"crossref","unstructured":"Chen, W., et al.: TheoremQA: a theorem-driven question answering dataset. In: The 2023 Conference on Empirical Methods in Natural Language Processing (2023)","DOI":"10.18653\/v1\/2023.emnlp-main.489"},{"key":"25_CR3","unstructured":"Chen, X., et al.: The SIFO benchmark: investigating the sequential instruction following ability of large language models. ArXiv abs\/2406.19999 (2024). https:\/\/api.semanticscholar.org\/CorpusID:270845502"},{"key":"25_CR4","doi-asserted-by":"crossref","unstructured":"Choi, S., Fang, T., Wang, Z., Song, Y.: KCTS: knowledge-constrained tree search decoding with token-level hallucination detection. arXiv preprint arXiv:2310.09044 (2023)","DOI":"10.18653\/v1\/2023.emnlp-main.867"},{"issue":"3","key":"25_CR5","doi-asserted-by":"publisher","first-page":"1018","DOI":"10.2307\/2275447","volume":"57","author":"R Fagin","year":"1992","unstructured":"Fagin, R., Halpern, J.Y., Vardi, M.Y.: What is an inference rule? J. Symb. Log. 57(3), 1018\u20131045 (1992)","journal-title":"J. Symb. Log."},{"key":"25_CR6","unstructured":"Hu, Y., Tang, X., Yang, H., Zhang, M.: Case-based or rule-based: how do transformers do the math? arXiv preprint arXiv:2402.17709 (2024)"},{"key":"25_CR7","unstructured":"Jiang, A.Q., et\u00a0al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)"},{"key":"25_CR8","doi-asserted-by":"crossref","unstructured":"Li, L., et al.: Salad-bench: a hierarchical and comprehensive safety benchmark for large language models. arXiv preprint arXiv:2402.05044 (2024)","DOI":"10.18653\/v1\/2024.findings-acl.235"},{"key":"25_CR9","unstructured":"Luo, L., Ju, J., Xiong, B., Li, Y.F., Haffari, G., Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning. arXiv preprint arXiv:2309.01538 (2023)"},{"key":"25_CR10","unstructured":"Madaan, A., et\u00a0al.: Self-refine: iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651 (2023)"},{"key":"25_CR11","doi-asserted-by":"crossref","unstructured":"Mishra, S., Khashabi, D., Baral, C., Hajishirzi, H.: Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773 (2021)","DOI":"10.18653\/v1\/2022.acl-long.244"},{"key":"25_CR12","unstructured":"Mu, N., et al.: Can LLMs follow simple rules? arXiv preprint arXiv:2311.04235 (2023)"},{"key":"25_CR13","unstructured":"OpenAI: GPT-4 technical report (2023)"},{"key":"25_CR14","doi-asserted-by":"crossref","unstructured":"Qin, Y., et al.: InfoBench: evaluating instruction following ability in large language models. arXiv preprint arXiv:2401.03601 (2024)","DOI":"10.18653\/v1\/2024.findings-acl.772"},{"key":"25_CR15","unstructured":"Ribes-Inesta, E.: Instructions, rules, and abstraction: a misconstrued relation. Behav. Philos. 41\u201355 (2000)"},{"key":"25_CR16","unstructured":"Shinn, N., Labash, B., Gopinath, A.: Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366 (2023)"},{"key":"25_CR17","doi-asserted-by":"crossref","unstructured":"Sinha, K., Sodhani, S., Dong, J., Pineau, J., Hamilton, W.L.: Clutrr: a diagnostic benchmark for inductive reasoning from text. arXiv preprint arXiv:1908.06177 (2019)","DOI":"10.18653\/v1\/D19-1458"},{"key":"25_CR18","doi-asserted-by":"crossref","unstructured":"Sun, W., Yu, X., He, S., Zhao, J., Liu, K.: ExpNote: black-box large language models are better task solvers with experience notebook. arXiv preprint arXiv:2311.07032 (2023)","DOI":"10.18653\/v1\/2023.findings-emnlp.1034"},{"key":"25_CR19","unstructured":"Touvron, H., et\u00a0al.: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)"},{"key":"25_CR20","doi-asserted-by":"crossref","unstructured":"Wang, S., Wei, Z., Choi, Y., Ren, X.: Can LLMs reason with rules? Logic scaffolding for stress-testing and improving LLMs. arXiv preprint arXiv:2402.11442 (2024)","DOI":"10.18653\/v1\/2024.acl-long.406"},{"key":"25_CR21","unstructured":"Wei, J., et al.: Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021)"},{"key":"25_CR22","unstructured":"Wei, J., et al.: Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022)"},{"key":"25_CR23","unstructured":"Xiao, C., et\u00a0al.: CAIL2018: a large-scale legal dataset for judgment prediction. arXiv preprint arXiv:1807.02478 (2018)"},{"key":"25_CR24","doi-asserted-by":"crossref","unstructured":"Xu, F., et al.: Symbol-LLM: towards foundational symbol-centric interface for large language models. arXiv preprint arXiv:2311.09278 (2023)","DOI":"10.18653\/v1\/2024.acl-long.707"},{"key":"25_CR25","doi-asserted-by":"crossref","unstructured":"Yang, Z., Li, P., Liu, Y.: Failures pave the way: enhancing large language models through tuning-free rule accumulation. arXiv preprint arXiv:2310.15746 (2023)","DOI":"10.18653\/v1\/2023.emnlp-main.109"},{"key":"25_CR26","unstructured":"Yang, Z., et al.: Language models as inductive reasoners. arXiv preprint arXiv:2212.10923 (2022)"},{"key":"25_CR27","unstructured":"Yao, S., et al.: Tree of thoughts: deliberate problem solving with large language models. Adv. Neural. Inf. Process. Syst. 36 (2024)"},{"key":"25_CR28","doi-asserted-by":"publisher","unstructured":"Yin, W., Ye, Q., Liu, P., Ren, X., Sch\u00fctze, H.: LLM-driven instruction following: progresses and concerns. In: Zhang, Q., Sajjad, H. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pp. 19\u201325. Association for Computational Linguistics, Singapore (2023). https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-tutorial.4, https:\/\/aclanthology.org\/2023.emnlp-tutorial.4","DOI":"10.18653\/v1\/2023.emnlp-tutorial.4"},{"key":"25_CR29","unstructured":"Young, A., et\u00a0al.: Yi: open foundation models by 01. AI. arXiv preprint arXiv:2403.04652 (2024)"},{"key":"25_CR30","first-page":"15476","volume":"35","author":"E Zelikman","year":"2022","unstructured":"Zelikman, E., Wu, Y., Mu, J., Goodman, N.: STaR: bootstrapping reasoning with reasoning. Adv. Neural. Inf. Process. Syst. 35, 15476\u201315488 (2022)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"25_CR31","doi-asserted-by":"crossref","unstructured":"Zhao, A., Huang, D., Xu, Q., Lin, M., Liu, Y.J., Huang, G.: Expel: LLM agents are experiential learners. arXiv preprint arXiv:2308.10144 (2023)","DOI":"10.1609\/aaai.v38i17.29936"},{"key":"25_CR32","unstructured":"Zhong, H., et\u00a0al.: Overview of CAIL2018: legal judgment prediction competition. arXiv preprint arXiv:1810.05851 (2018)"},{"key":"25_CR33","doi-asserted-by":"crossref","unstructured":"Zhong, R., Lee, K., Zhang, Z., Klein, D.: Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. arXiv preprint arXiv:2104.04670 (2021)","DOI":"10.18653\/v1\/2021.findings-emnlp.244"},{"key":"25_CR34","unstructured":"Zhou, J., et al.: Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911 (2023)"},{"key":"25_CR35","unstructured":"Zhu, Z., et al.: Large language models can learn rules. arXiv preprint arXiv:2310.07064 (2023)"}],"container-title":["Lecture Notes in Computer Science","Chinese Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-981-95-2725-0_25","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T05:19:28Z","timestamp":1761887968000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-981-95-2725-0_25"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,1]]},"ISBN":["9789819527243","9789819527250"],"references-count":35,"URL":"https:\/\/doi.org\/10.1007\/978-981-95-2725-0_25","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,1]]},"assertion":[{"value":"1 November 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"Our research aims to evaluate the inferential rule-following capability of LLMs. To mitigate risks associated with some sensitive content in the benchmark, we restrict access to authorized researchers who adhere to strict ethical guidelines. These measures safeguard research integrity while minimizing potential harm.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Statement"}},{"value":"CCL","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"China National Conference on Chinese Computational Linguistics","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Jinan","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"China","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"11 August 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"14 August 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"24","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cncl2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/link.springer.com\/conference\/cncl","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}