{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:37:23Z","timestamp":1761176243104,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Multimodal large language models (MLLMs) have achieved significant progress in document understanding. However, complex layout reasoning, characterized by concise answers and cross-page integration, remains a challenge. Unlike conventional semantics-oriented tasks, this task demands accurate visual perception of fine-grained structural elements and logical reasoning across multi-page documents. Existing approaches primarily focus on information extraction and semantic understanding, limiting the capacity of fine-tuned autoregressive models to capture short-answer reasoning signals and generalize to complex layout structures. To address this, we propose the Layout-Aware Multi-Source Reasoning Decision Framework (LAMRD), which reframes complex layout reasoning as a decision-making task over multi-source reasoning paths. In the reasoning path construction stage, LAMRD generates layout-aware reasoning paths by integrating internal visual cues and external knowledge from three complementary perspectives: Visual Structural Awareness (VSA), Logical Reasoning Paths (LRP), and External Knowledge Augmentation (EKA). In the reasoning path decision stage, we employ Group Relative Policy Optimization (GRPO) to train a decision model that produces the final answer based on these paths. We conduct comprehensive evaluations using Qwen2.5-VL-7B-Instruct on the CEP-7K dataset, covering layout structure understanding, information extraction, and logical association. Experimental results demonstrate that LAMRD outperforms advanced MLLMs in accuracy, validating its effectiveness for complex document layout understanding.<\/jats:p>","DOI":"10.3233\/faia251212","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:54:50Z","timestamp":1761126890000},"source":"Crossref","is-referenced-by-count":0,"title":["Reframing Multimodal Complex Document Layout Understanding: A Layout-Aware Multi-Source Reasoning Decision Framework"],"prefix":"10.3233","author":[{"given":"Ran","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuyang","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingyang","family":"Deng","sequence":"additional","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeren","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuefei","family":"Tong","sequence":"additional","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinwen","family":"Ma","sequence":"additional","affiliation":[{"name":"Department of Information and Computational Sciences, School of Mathematical Sciences, Peking University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qinghui","family":"Shi","sequence":"additional","affiliation":[{"name":"Tongfang Knowledge Network Digital Technology Co., Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qian","family":"Xu","sequence":"additional","affiliation":[{"name":"Tongfang Knowledge Network Digital Technology Co., Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanjun","family":"Li","sequence":"additional","affiliation":[{"name":"Tongfang Knowledge Network Digital Technology Co., Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA251212","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:54:50Z","timestamp":1761126890000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA251212"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia251212","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}