{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T21:04:08Z","timestamp":1778706248538,"version":"3.51.4"},"reference-count":41,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T00:00:00Z","timestamp":1772409600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"JST Moonshot R&D","award":["JPMJMS2236"],"award-info":[{"award-number":["JPMJMS2236"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Scientific research typically follows an iterative cycle where hypotheses are proposed, validated against experimental conclusions, and refined accordingly. While recent advances in large language models (LLMs) have enabled significant progress in automating individual stages of this process, existing systems are typically developed as standalone solutions, making it difficult to coordinate multiple research activities within a coherent research workflow. In this study, we present a modular framework for automated hypothesis validation and refinement in scientific research. Rather than introducing new task-specific models, the framework integrates established techniques, including natural language inference (NLI)-based hypothesis validation, attribution-guided hypothesis refinement, and retrieval-augmented generation (RAG)-based external evidence retrieval, into a unified and controllable workflow. We evaluate the proposed framework on scientific texts in the chemistry domain to assess its applicability in practical scientific research scenarios. Extensive experiments demonstrate the effectiveness of the proposed framework and suggest that it produces reliable intermediate signals that enhance transparency and traceability throughout hypothesis validation and refinement. Our work offers a modular solution for deploying LLM-based systems in scientific research workflows.<\/jats:p>","DOI":"10.3390\/info17030244","type":"journal-article","created":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T14:06:56Z","timestamp":1772460416000},"page":"244","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Modular Framework for Automated Hypothesis Validation and Refinement in Scientific Research"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-0811-4667","authenticated-orcid":false,"given":"Chenhao","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence and Robotics, Chubu University, Kasugai-shi 487-8501, Aichi, Japan"}]},{"given":"Taiga","family":"Masuda","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Robotics, Chubu University, Kasugai-shi 487-8501, Aichi, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3851-5221","authenticated-orcid":false,"given":"Tsubasa","family":"Hirakawa","sequence":"additional","affiliation":[{"name":"Center for Mathematical Science and Artificial Intelligence, Chubu University, Kasugai-shi 487-8501, Aichi, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2631-9856","authenticated-orcid":false,"given":"Takayoshi","family":"Yamashita","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Chubu University, Kasugai-shi 487-8501, Aichi, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7391-4725","authenticated-orcid":false,"given":"Hironobu","family":"Fujiyoshi","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Robotics, Chubu University, Kasugai-shi 487-8501, Aichi, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,2]]},"reference":[{"key":"ref_1","unstructured":"Lu, C., Lu, C., Lange, R.T., Foerster, J.N., Clune, J., and Ha, D. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. arXiv."},{"key":"ref_2","unstructured":"Lenat, D.B. (1977, January 22\u201325). Automated Theory Formation in Mathematics. Proceedings of the 5th International Joint Conference on Artificial Intelligence, Cambridge, MA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/0004-3702(84)90016-X","article-title":"Why AM and EURISKO Appear to Work","volume":"23","author":"Lenat","year":"1984","journal-title":"Artif. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/0004-3702(78)90010-3","article-title":"Dendral and Meta-Dendral: Their Applications Dimension","volume":"11","author":"Buchanan","year":"1978","journal-title":"Artif. Intell."},{"key":"ref_5","first-page":"3","article-title":"Large Language Models for Data Discovery and Integration: Challenges and Opportunities","volume":"49","author":"Freire","year":"2025","journal-title":"IEEE Data Eng. Bull."},{"key":"ref_6","unstructured":"Luo, Z., Yang, Z., Xu, Z., Yang, W., and Du, X. (2025). LLM4SR: A Survey on Large Language Models for Scientific Research. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Qiao, S., Ou, Y., Deng, S., Zhang, N., Lyu, S., Shen, Y., Liang, L., Gu, J., and Chen, H. (2025). KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents. Findings of the Association for Computational Linguistics: NAACL 2025, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2025.findings-naacl.205"},{"key":"ref_8","unstructured":"Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., Yih, W., and Rockt\u00e4schel, T. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv."},{"key":"ref_9","unstructured":"Qi, B., Zhang, K., Tian, K., Li, H., Chen, Z., Zeng, S., Hua, E., Hu, J., and Zhou, B. (2024). Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation. arXiv."},{"key":"ref_10","unstructured":"Hu, X., Fu, H., Wang, J., Wang, Y., Li, Z., Xu, R., Lu, Y., Jin, Y., Pan, L., and Lan, Z. (2024). Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Liu, H., Srivastava, T., Mei, H., and Tan, C. (2024). Hypothesis Generation with Large Language Models. arXiv.","DOI":"10.18653\/v1\/2024.nlp4science-1.10"},{"key":"ref_12","unstructured":"Xiong, G., Xie, E., Shariatmadari, A.H., Guo, S., Bekiranov, S., and Zhang, A. (2024). Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Baek, J., Jauhar, S.K., Cucerzan, S., and Hwang, S.J. (2024). ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models. arXiv.","DOI":"10.18653\/v1\/2025.naacl-long.342"},{"key":"ref_14","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., and Wang, M. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ma, X., Gong, Y., He, P., Zhao, H., and Duan, N. (2023). Query Rewriting for Retrieval-Augmented Large Language Models. arXiv.","DOI":"10.18653\/v1\/2023.emnlp-main.322"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Gao, L., Ma, X., Lin, J.J., and Callan, J. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/2023.acl-long.99"},{"key":"ref_17","unstructured":"Zheng, H.S., Mishra, S., Chen, X., Cheng, H., Chi, E.H., Le, Q.V., and Zhou, D. (2023). Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv."},{"key":"ref_18","unstructured":"Yu, W., Iter, D., Wang, S., Xu, Y., Ju, M., Sanyal, S., Zhu, C., Zeng, M., and Jiang, M. (2022). Generate rather than Retrieve: Large Language Models are Strong Context Generators. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.620"},{"key":"ref_20","unstructured":"Chen, C., Hirakawa, T., Yamashita, T., and Fujiyoshi, H. (2025, January 24\u201326). Hypothesis Alignment via Clause-level Attribution-guided Span Masking and Infilling. Proceedings of the 5th International Conference on Communications, Networking and Machine Learning, Singapore."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, C., Masuda, T., Ushiku, Y., Tanaka, S., Saito, K., Hirakawa, T., Yamashita, T., and Fujiyoshi, H. (2025). CRNLI: A Textual Entailment Dataset in the Chemistry Domain. Text, Speech and Dialogue, Springer.","DOI":"10.1007\/978-3-032-02551-7_9"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002). Bleu: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics.","DOI":"10.3115\/1073083.1073135"},{"key":"ref_23","unstructured":"Lin, C. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, Association for Computational Linguistics."},{"key":"ref_24","unstructured":"Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., and Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. arXiv."},{"key":"ref_25","unstructured":"Li, X., Du, M., Chen, J., Chai, Y., Lakkaraju, H., and Xiong, H. (2023). M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models. NIPS\u201923: Proceedings of the 37th International Conference on Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Beltagy, I., Lo, K., and Cohan, A. (2019). SciBERT: A Pretrained Language Model for Scientific Text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics.","DOI":"10.18653\/v1\/D19-1371"},{"key":"ref_27","unstructured":"Abdin, M., Jacobs, S.A., Awan, A.A., Aneja, J., Awadallah, A., Awadalla, H.H., Bach, N., Bahree, A., Bakhtiari, A., and Behl, H.S. (2024). Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv."},{"key":"ref_28","unstructured":"Lundberg, S.M., and Lee, S. (2017). A Unified Approach to Interpreting Model Predictions. NIPS\u201917: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_29","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_30","unstructured":"Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., and Brahma, S. (2022). Scaling Instruction-Finetuned Language Models. arXiv."},{"key":"ref_31","unstructured":"Yang, Q.A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., and Dong, G. (2024). Qwen2.5 Technical Report. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yin, C., and Zhang, Z. (2024). A Study of Sentence Similarity Based on the All-minilm-l6-v2 Model with \u201cSame Semantics, Different Structure\u201d After Fine Tuning. Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024), Atlantis Press.","DOI":"10.2991\/978-94-6463-540-9_69"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazar\u2019e, P., Lomeli, M., Hosseini, L., and J\u2019egou, H. (2024). The Faiss library. arXiv.","DOI":"10.1109\/TBDATA.2025.3618474"},{"key":"ref_35","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1038\/s41586-025-09422-z","article-title":"DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning","volume":"645","author":"Guo","year":"2025","journal-title":"Nature"},{"key":"ref_37","unstructured":"ChemRxiv (2026, February 09). Available online: https:\/\/chemrxiv.org\/."},{"key":"ref_38","unstructured":"Crossref (2026, February 09). Available online: https:\/\/www.crossref.org\/."},{"key":"ref_39","unstructured":"Blecher, L., Cucurull, G., Scialom, T., and Stojnic, R. (2023). Nougat: Neural Optical Understanding for Academic Documents. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Neumann, M., King, D., Beltagy, I., and Ammar, B.W. (2019). ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. arXiv.","DOI":"10.18653\/v1\/W19-5034"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Bowman, S.R., Angeli, G., Potts, C., and Manning, C.D. (2015). A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/D15-1075"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/244\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T14:22:25Z","timestamp":1772461345000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,2]]},"references-count":41,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["info17030244"],"URL":"https:\/\/doi.org\/10.3390\/info17030244","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,2]]}}}