{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T04:09:23Z","timestamp":1748405363042,"version":"3.41.0"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T00:00:00Z","timestamp":1745193600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T00:00:00Z","timestamp":1745193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach. Intell. Res."],"published-print":{"date-parts":[[2025,6]]},"DOI":"10.1007\/s11633-024-1531-3","type":"journal-article","created":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T01:04:01Z","timestamp":1745197441000},"page":"571-584","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["CRMR: A Collaborative Multistep Reasoning Framework for Solving Mathematical Problems"],"prefix":"10.1007","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-1738-0931","authenticated-orcid":false,"given":"Yudi","family":"Zhang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7594-2241","authenticated-orcid":false,"given":"Xue-song","family":"Tang","sequence":"additional","affiliation":[]},{"given":"Kuangrong","family":"Hao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,21]]},"reference":[{"key":"1531_CR1","unstructured":"T. Zhao, M. Wei, J. S. Preston, H. Poon. Automatic calibration and error correction for large language models via Pareto optimal self-supervision, [Online], Available: https:\/\/arxiv.org\/abs\/2306.165641, 2023."},{"key":"1531_CR2","doi-asserted-by":"publisher","first-page":"3214","DOI":"10.18653\/v1\/2022.acl-long.229","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"S Lin","year":"2021","unstructured":"S. Lin, J. Hilton, O. Evans. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, pp. 3214\u2013252, 2021. DOI: https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.229."},{"key":"1531_CR3","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"O Golovneva","year":"2023","unstructured":"O. Golovneva, M. P. Chen, S. Poff, M. Corredor, L. Zettlemoyer, M. Fazel-Zarandi, A. Celikyilmaz. ROSCOE: A suite of metrics for scoring step-by-step reasoning. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR4","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"L Ouyang","year":"2022","unstructured":"L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wain-wright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, R. Lowe. Training language models to follow instructions with human feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2011, 2022."},{"key":"1531_CR5","volume-title":"Proceedings of International Conference on Machine Learning","author":"M Zhang","year":"2024","unstructured":"M. Zhang, O. Press, W. Merrill, A. Liu, N. A. Smith. How language model hallucinations can snowball. In Proceedings of International Conference on Machine Learning, Vienna, Austria, 2024."},{"key":"1531_CR6","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"J Wei","year":"2022","unstructured":"J. Wei, X. Z. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. D. H. Chi, Q. V. Le, D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 1800, 2022."},{"key":"1531_CR7","doi-asserted-by":"publisher","first-page":"2269","DOI":"10.18653\/v1\/2021.findings-emnlp.195","volume-title":"Proceedings of Findings of the Association for Computational Linguistics: EMNLP","author":"J H Shen","year":"2021","unstructured":"J. H. Shen, Y. C. Yin, L. Li, L. F. Shang, X. Jiang, M. Zhang, Q. Liu. Generate & rank: A multi-task framework for math word problems. In Proceedings of Findings of the Association for Computational Linguistics: EMNLP, Punta Cana, Dominican Republic, pp. 2269\u20132279, 2021. DOI: https:\/\/doi.org\/10.18653\/v1\/2021.findings-emnlp.195."},{"key":"1531_CR8","doi-asserted-by":"publisher","first-page":"2550","DOI":"10.18653\/v1\/2023.findings-emnlp.167","volume-title":"Proceedings of Findings of the Association for Computational Linguistics: EMNLP","author":"Y X Weng","year":"2023","unstructured":"Y. X. Weng, M. J. Zhu, F. Xia, B. Li, S. Z. He, S. P. Liu, B. Sun, K. Liu, J. Zhao. Large language models are better reasoners with self-verification. In Proceedings of Findings of the Association for Computational Linguistics: EMNLP, Singapore, pp. 2550\u20132575, 2023. DOI: https:\/\/doi.org\/10.18653\/v1\/2023.findings-emnlp.167."},{"key":"1531_CR9","unstructured":"Y. F. Li, Z. Q. Lin, S. Z. Zhang, Q. Fu, B. Chen, J. G. Lou, W. Z. Chen. On the advance of making language models better reasoners, [Online], Availiable: https:\/\/arxiv.org\/abs\/2206.02336, 2022."},{"key":"1531_CR10","doi-asserted-by":"publisher","first-page":"568","DOI":"10.18653\/v1\/2023.findings-emnlp.378","volume-title":"Proceedings of Findings of the Association for Computational Linguistics: EMNLP","author":"O Press","year":"2023","unstructured":"O. Press, M. R. Zhang, S. Min, L. Schmidt, N. Smith, M. Lewis. Measuring and narrowing the compositionality gap in language models. In Proceedings of Findings of the Association for Computational Linguistics: EMNLP, Singapore, pp. 568\u20135711, 2023. DOI: https:\/\/doi.org\/10.18653\/v1\/2023.findings-emnlp.378."},{"key":"1531_CR11","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"S Welleck","year":"2023","unstructured":"S. Welleck, X. M. Lu, P. West, L. Schmidt, N. Smith, M. Lewis. Generating sequences by learning to self-correct. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR12","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Z B Gou","year":"2023","unstructured":"Z. B. Gou, Z. H. Shao, Y. Y. Gong, Y. L. Shen, Y. J. Yang, N. Duan, W. Z. Chen. CRITIC: Large language models can self-correct with tool-interactive critiquing. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 2023."},{"key":"1531_CR13","doi-asserted-by":"publisher","first-page":"2080","DOI":"10.18653\/v1\/2021.naacl-main.168","volume-title":"Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"A Patel","year":"2021","unstructured":"A. Patel, S. Bhattamishra, N. Goyal. Are NLP models really able to solve simple math word problems? In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.2080\u20132094, 2021. DOI:https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.168."},{"key":"1531_CR14","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1162\/tacl_a_00160","volume":"3","author":"R Koncel-Kedziorski","year":"2015","unstructured":"R. Koncel-Kedziorski, H. Hajishirzi, A. Sabharwal, O. Etzioni, S. D. Ang. Parsmg algebrac word problems into equations. Transactions of the Association for Computational Linguistics, vol. 3, pp. 585\u2013597, 2015. DOI: https:\/\/doi.org\/10.1162\/tacl_a_00160.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"1531_CR15","doi-asserted-by":"publisher","first-page":"1743","DOI":"10.18653\/v1\/D15-1202","volume-title":"Proceedings of Conference on Empirical Methods in Natural Language Processing","author":"S Roy","year":"2015","unstructured":"S. Roy, D. Roth. Solving general arithmetic word problems. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1743\u20131752, 2015. DOI: https:\/\/doi.org\/10.18653\/v1\/D15-1202."},{"key":"1531_CR16","doi-asserted-by":"publisher","first-page":"523","DOI":"10.3115\/v1\/D14-1058","volume-title":"Proceedings of Conference on Empirical Methods in Natural Language Processing","author":"M J Hosseini","year":"2014","unstructured":"M. J. Hosseini, H. Hajishirzi, O. Etzioni, N. Kushman. Learning to solve arithmetic word problems with verb categorization. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 523\u2013533, 2014. DOI: https:\/\/doi.org\/10.3115\/v1\/D14-1058."},{"key":"1531_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/tacl_a_00118","volume":"3","author":"S Roy","year":"2015","unstructured":"S. Roy, T. Vieira, D. Roth. Reasoning about quantities in natural language. Transactions of the Association for Computational Linguistics, vol. 3, pp. 1\u201313, 2015. DOI: https:\/\/doi.org\/10.1162\/tacl_a_00118.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"1531_CR18","unstructured":"K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, J Schulman. Training verifiers to solve math word problems, [Online], Available: https:\/\/arxiv.org\/abs\/2110.14168, 2021."},{"key":"1531_CR19","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"X Z Wang","year":"2023","unstructured":"X. Z. Wang, J. Wei, D. Schuurmans, Q. Le, E. D. H. Chi, S. Narang, A. Chowdhery, D. Zhou. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR20","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Y Fu","year":"2023","unstructured":"Y. Fu, H. Peng, A. Sabharwal, P. Clark, T. Khot. Complexity-based prompting for multi-step reasoning. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR21","unstructured":"Z. Yuan, H. Y. Yuan, C. P. Li, G. T. Dong, C. Q. Tan, C. Zhou. Scaling relationship on learning mathematical reasoning with large language models, [Online], Available: https:\/\/arxiv.org\/abs\/2308.01825, 2023."},{"key":"1531_CR22","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"X Yue","year":"2024","unstructured":"X. Yue, X. W. Qu, G. Zhang, Y. Fu, W. H. Huang, H. Sun, Y. Su, W. H. Chen. MAmmoTH: Building math generalist models through hybrid instruction tuning. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 2024."},{"key":"1531_CR23","unstructured":"W. H. Chen, X. G. Ma, X. Y. Wang, W. W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, [Online], Available: https:\/\/arxiv.org\/abs\/2211.12588, 2022."},{"key":"1531_CR24","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"P Lu","year":"2024","unstructured":"P. Lu, H. Bansal, T. Xia, J. C. Liu, C. Y. Li, H. Hajishirzi, H. Cheng, K. W. Chang, M. Galley, J. F. Gao. MathVista: Evaluating mathematical reasoning of foundation models in visual contexts. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 2024."},{"key":"1531_CR25","first-page":"27903","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Y X Li","year":"2024","unstructured":"Y. X. Li, B. T. Hu, H. Y. Shi, W. Wang, L. Y. Wang, M. Zhang. VisionGraph: Leveraging large multimodal models for graph theory problems in visual context. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, pp. 27903\u201327919, 2024."},{"key":"1531_CR26","unstructured":"Z. Chu, J. C. Chen, Q. L. Chen, W. J. Yu, T. He, H. T. Wang, W. H. Peng, M. Liu, B. Qin, T. Liu. A survey of chain of thought reasoning: Advances, frontiers and future, [Online], Available: https:\/\/arxiv.org\/abs\/2309.15402, 2023."},{"key":"1531_CR27","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"A Madaan","year":"2023","unstructured":"A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Y. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. M. Yang, S. Gupta, B. P. Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, P. Clark. SELF-REFINE: Iterative refinement with self-feedback. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 2019, 2023."},{"key":"1531_CR28","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"N Shinn","year":"2023","unstructured":"N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, S. Y. Yao. Reflexion: Language agents with verbal reinforcement learning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 377, 2023."},{"key":"1531_CR29","first-page":"1100","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"D Paul","year":"2024","unstructured":"D. Paul, M. Ismayilzada, M. Peyrard, B. Borges, A. Bosselut, R. West, B. Faltings. Refiner: Reasoning feedback on intermediate representations. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), St. Julian\u2019s, Malta, pp. 1100\u20131126, 2024."},{"key":"1531_CR30","doi-asserted-by":"publisher","first-page":"4471","DOI":"10.18653\/v1\/2023.acl-long.245","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"X Y Zhu","year":"2023","unstructured":"X. Y. Zhu, J. J. Wang, L. Zhang, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, K. Cobbe. Solving math word problems via cooperative reasoning induced language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, pp.4471\u20134485, 2023. DOI: https:\/\/doi.org\/10.18653\/v1\/2023.acl-long.245."},{"key":"1531_CR31","unstructured":"H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, K. Cobbe. Let\u2019s verify step by step, [Online], Available: https:\/\/arxiv.org\/abs\/2305.20050, 2023."},{"key":"1531_CR32","volume-title":"In Proceedings of International Conference on Machine Learning","author":"C Y Zheng","year":"2024","unstructured":"C. Y. Zheng, Z. Y. Liu, E. Z. Xie, Z. G. Li, Y. Li. Progressive-hint prompting improves reasoning in large language models. In Proceedings of International Conference on Machine Learning, Vienna, Austria, 2024."},{"key":"1531_CR33","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"D Zhou","year":"2023","unstructured":"D. Zhou, N. Sch\u00e4rli, L. Hou, J. Wei, N. Scales, X. Z. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. V. Le, E. D. H. Chi. Least-to-most prompting enables complex reasoning in large language models. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR34","doi-asserted-by":"publisher","first-page":"2609","DOI":"10.18653\/v1\/2023.acllong.147","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"L Wang","year":"2023","unstructured":"L. Wang, W. Y. Xu, Y. H. Lan, Z. Q. Hu, Y. S. Lan, R. K. W. Lee, E. P. Lim. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, pp. 2609\u20132634, 2023. DOI: https:\/\/doi.org\/10.18653\/v1\/2023.acllong.147."},{"key":"1531_CR35","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"Z Ling","year":"2023","unstructured":"Z. Ling, Y. H. Fang, X. L. Li, Z. A. Huang, M. Lee, R. Memisevic, H. Su. Deductive verification of chain-of-thought reasoning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, Article number 1580, 2023."},{"key":"1531_CR36","unstructured":"Y. F. Li, Z. Q. Lin, S. Z. Zhang, Q. Fu, B. Chen, J. G. Lou, W. Z. Chen. Making large language models better reasoners with step-aware verifier, [Online], Available: https:\/\/arxiv.org\/abs\/2206.02336, 2022."},{"key":"1531_CR37","unstructured":"J. Uesato, N. Kushman, R. Kumar, F. Song, N. Siegel, L. S. Wang, A. Creswell, G. Irving, I. Higgins. Solving math word problems with process- and outcome-based feedback, [Online], Available: https:\/\/arxiv.org\/abs\/2211.14275, 2022."},{"key":"1531_CR38","unstructured":"A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Satskever. Language models are unsupervised multitask learners. OpenAI Blog, vol. 1, no. 8, Article number 9, 2019."},{"key":"1531_CR39","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"A H Zeng","year":"2023","unstructured":"A. H. Zeng, X. Liu, Z. X. Du, Z. H. Wang, H. Y. Lai, M. Ding, Z. Y. Yang, Y. F. Xu, W. D. Zheng, X. Xia, W. L. Tam, Z. X. Ma, Y. F. Xue, J. D. Zhai, W. G. Chen, Z. Y. Liu, P. Zhang, Y. X. Dong, J. Tang. GLM-130B: An open bilingual pre-trained model. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 2023."},{"key":"1531_CR40","unstructured":"H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Y. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Y. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. H. Lu, Y. N. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. X. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. X. Xu, Z. Yan, I. Zarov, Y. C. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, T. Scialom. Llama 2: Open foundation and fine-tuned chat models, Online], Available: https:\/\/arxiv.org\/abs\/2307.09288,2023."},{"key":"1531_CR41","unstructured":"J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom, P. Baltescu, H. M. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro, C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A. L. Brakman, G. Brockman, T. Brooks, M. Brundage, K. Button, T. Cai, R. Campbell, A. Cann, B. Carey, C. Carlson, R. Carmichael, Br. Chan, C. Chang, F. Chantzis, D. Chen, S. Chen, R. Chen, J. Chen, M. Chen, B. Chess, C. Cho, C. Chu, H. W. Chung, D. Cummings, J. Currier, Y. X. Dai, C. Decareaux, T. Degry, N. Deutsch, D. Deville, A. Dhar, D. Dohan, S. Dowling, S. Dunning, A. Ecoffet, A. Eleti, T. Eloundou, D. Farhi, L. Fedus, N. Felix, S. P. Fishman, J. Forte, I. Fulford, L. Gao, E. Georges, C. Gibson, V. Goel, T. Gogineni, G. Goh, R. Gontijo-Lopes, J. Gordon, M. Grafstein, S. Gray, R. Greene, J. Gross, S. X. Shane Gu, Y. F. Guo, C. Hallacy, J. Han, J. Harris, Y. C. He, M. Heaton, J. Heidecke, C. Hesse, A. Hickey, W. Hickey, P. Hoeschele, B. Houghton, K. Hsu, S. L. Hu, X. Hu, J. Huizinga, S. Jain, S. Jain, J. Jang, A. Jiang, R Jiang, H. Z. Jin, D. Jin, S. Jomoto, B. Jonn, H. Jun, T. Kaftan, \u0141. Kaiser, A. Kamali, I. Kanitscheider, N. S. Keskar, T. Khan, L. Kilpatrick, J. W. Kim, C. Kim, Y. Kim, J. H. Kirchner, J. Kiros, M. Knight, D. Kokotajlo, \u0141. Kondraciuk, A. Kondrich, A. Konstantinidis, K. Kosic, G. Krueger, V. Kuo, M. Lampe, I. Lan, T. Lee, J. Leike, J. Leung, D. Levy, C. M. Li, R. Lim, M. Lin, S. Lin, M. Litwin, T. Lopez, R. Lowe, P. Lue, A. Makanju, K. Malfacini, S. Manning, T. Markov, Y. Markovski, B. Martin, K. Mayer, A. Mayne, B. McGrew, S. M. McKinney, C. McLeavey, P. McMillan, J. McNeil, D. Medina, A. Mehta, J. Menick, L. Metz, A. Mishchenko, P. Mishkin, V. Monaco, E. Morikawa, D. Mossing, T. Mu, M. Murati, O. Murk, D. M\u00e9ly, A. Nair, R. Nakano, R. Nayak, A. Neelakantan, R Ngo, H. Noh, L. Ouyang, C. O\u2019Keefe, J. Pachocki, A. Paino, J. Palermo, A. Pantuliano, G. Parascandolo, J.Parish, E. Parparita, A. Passos, M. Pavlov, A. Peng, A. Perelman, F. de Avila Belbute Peres, M. Petrov, H. P. de Oliveira Pinto, M. Pokorny, M. Pokrass, V. H. Pong, T. Powell, A. Power, B. Power, E. Proehl, R. Puri, A. Radford, J. Rae, A. Ramesh, C. Raymond, F. Real, K. Rimbach, C. Ross, B. Rotsted, H. Roussez, N. Ryder, M. Saltarelli, T. Sanders, S. Santurkar, G. Sastry, H. Schmidt, D. Schnurr, J. Schulman, D. Selsam, K. Sheppard, T. Sherbakov, J. Shieh, S. Shoker, P. Shyam, S. Sidor, E. Sigler, M. Simens, J. Sitkin, K. Slama, I. Sohl, B. Sokolowsky, Y. Song, N. Staudacher, F. P. Such, N. Summers, I. Sutskever, J. Tang, N. Tezak, M. B. Thompson, P. Tillet, A. Tootoonchian, E. Tseng, P. Tuggle, N. Turley, J. Tworek, J. F. C. Uribe, A. Vallone, A. Vijayvergiya, C. Voss, C. Wainwright, J. J. Wang, A. Wang, B. Wang, J. Ward, J. Wei, C. J. Weinmann, A. Welihinda, P. Welinder, J. Y. Weng, L. L. Weng, M. Wiethoff, D. Willner, C. Winter, S. Wolrich, H. Wong, L. Workman, S. Wu, J. Wu, M.I Wu, K. Xiao, T. Xu, S. Yoo, K. Yu, Q. M. Yuan, W. Zaremba, R. Zellers, C. Zhang, M. Zhang, S. J. Zhao, T. H. Zheng, J. T. Zhuang, W. Zhuk, B. Zoph. GPT-4 technical report, [Online], Available: https:\/\/arxiv.org\/abs\/2303.08774, 2023."},{"key":"1531_CR42","unstructured":"G. Team, R. Anil, S. Borgeaud, et al. Gemini: A family of highly capable multimodal models, [Online], Available: https:\/\/arxiv.org\/abs\/2312.11805, 2023."},{"key":"1531_CR43","doi-asserted-by":"crossref","unstructured":"X. Liu, K. X. Ji, Y. C. Fu, W. L. Tam, Z. X. Du, Z. L. Yang, J. Tang. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks, [Online], Available: https:\/\/arxiv.org\/abs\/2110.07602, 2021.","DOI":"10.18653\/v1\/2022.acl-short.8"},{"key":"1531_CR44","doi-asserted-by":"publisher","first-page":"5784","DOI":"10.18653\/v1\/2024.naacl-long.323","volume-title":"Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"S Jiang","year":"2023","unstructured":"S. Jiang, Z. Shakeri, A. Chan, M. Sanjabi, H. Firooz, Y. L. Xia, B. Akyildiz, Y. Z. Sun, J. C. Li, Q. F. Wang, A. Celikyilmaz. RESPROMPT: Residual connection prompting advances multi-step reasoning in large language models. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, pp. 5784\u20135809, 2023. DOI: https:\/\/doi.org\/10.18653\/v1\/2024.naacl-long.323."},{"key":"1531_CR45","doi-asserted-by":"publisher","first-page":"1830","DOI":"10.18653\/v1\/2024.findings-acl.108","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics","author":"M Y Jin","year":"2024","unstructured":"M. Y. Jin, Q. K. Yu, D. Shu, H. Y. Zhao, W. Y. Hua, Y. D. Meng, Y. F. Zhang, M. N. Du. The impact of reasoning step length on large language models. In Proceedings of the Findings of the Association for Computational Linguistics, Bangkok, Thailand, pp. 1830\u20131842, 2024. DOI: https:\/\/doi.org\/10.18653\/v1\/2024.findings-acl.108."},{"key":"1531_CR46","unstructured":"M. Renze, E. Guven. Self-reflection in LLM Agents: Effects on problem-solving performance, [Online], Available: https:\/\/arxiv.org\/abs\/2405.06682, 2024."}],"container-title":["Machine Intelligence Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-024-1531-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11633-024-1531-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-024-1531-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T06:15:22Z","timestamp":1748326522000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11633-024-1531-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,21]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["1531"],"URL":"https:\/\/doi.org\/10.1007\/s11633-024-1531-3","relation":{},"ISSN":["2731-538X","2731-5398"],"issn-type":[{"type":"print","value":"2731-538X"},{"type":"electronic","value":"2731-5398"}],"subject":[],"published":{"date-parts":[[2025,4,21]]},"assertion":[{"value":"18 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 October 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declared that they have no conflicts of interest to this work.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations of conflict of interest"}}]}}