{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:14:54Z","timestamp":1758672894071,"version":"3.44.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>Existing handwriting recognition methods only focus on learning visual patterns by modeling low-level relationships of adjacent pixels, while overlooking the intrinsic geometric structures of characters. In this paper, we propose a novel graph-enhanced cross-modal mutual learning network GCM to fully process handwritten text images alongside their corresponding geometric graphs, which consists of one shared cross-modal encoder and two parallel inverse decoders. Specifically, the encoder simultaneously extracts visual and geometric information from the cross-modal inputs, and the decoders fuse the multi-modal features for prediction under the guidance of cross-modal fusion. Moreover, two parallel decoders sequentially aggregate cross-modal features in inverse orders (V\u2192G and G\u2192V) but are enhanced through mutual distillation at each time-step, which involves one-to-one knowledge transfer and fully leverages complementary cross-modal information from both directions. Notably, only one branch of GCM is activated in inference, thus avoiding the increase of the model parameters and computation costs for testing. Experiments show that our method outperforms previous state-of-the-art methods on public benchmarks such as IAM, RIMES, and ICDAR-2013 when no extra training data is utilized.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/574","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"5154-5162","source":"Crossref","is-referenced-by-count":0,"title":["Structure-Aware Handwritten Text Recognition via Graph-Enhanced Cross-Modal  Mutual Learning"],"prefix":"10.24963","author":[{"given":"Ji","family":"Gan","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Chongqing University of Posts and Telecommunications"},{"name":"Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications"},{"name":"Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yupeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Chongqing University of Posts and Telecommunications"},{"name":"Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanming","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiaxu","family":"Leng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Chongqing University of Posts and Telecommunications"},{"name":"Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications"},{"name":"Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinbo","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Chongqing University of Posts and Telecommunications"},{"name":"Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications"},{"name":"Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2025","name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","start":{"date-parts":[[2025,8,16]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:34:27Z","timestamp":1758627267000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/574"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/574","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}