{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T01:11:45Z","timestamp":1759972305161,"version":"build-2065373602"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"10","funder":[{"name":"National Social Science Fund of China","award":["21VJXG043"],"award-info":[{"award-number":["21VJXG043"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,10,31]]},"abstract":"<jats:p>Script identification is a key step in document analysis and recognition in multilingual environments. This study proposed a new dataset for script identification algorithms, containing images of ancient documents in 12 different ethnic scripts, including Chinese script, Naxi Dongba script, Yi script, Shui script, Tangut script, ancient Zhuang script, ancient Buyi script, Tibetan script, Dai script, Chagatai script, Mongolian script, and Manchu script. Focusing on the high accuracy required for ancient script identification, this study proposed a method named multi-attention ghost pyramid fusion network (MAGPNet). MAGPNet consists of a feature extraction network, a channel feature pyramid, and a Multi-Headed Self-Attention Bottleneck Block. The feature extraction network utilizes lightweight convolutional modules and parameter-free attention modules to enhance MAGPNet's feature extraction capability while maintaining a lighter structure. The channel feature pyramid increases the model's robustness in processing ancient documents of different scales. The Multi-Headed Self-Attention Bottleneck Block, by introducing a Multi-Headed Self-Attention, focuses on effective features. Experiments demonstrate that MAGPNet achieves a 99.97% accuracy rate on the multilingual ancient document image script identification dataset, maintaining excellent classification performance across multiple datasets.<\/jats:p>","DOI":"10.1145\/3748314","type":"journal-article","created":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T12:42:32Z","timestamp":1757335352000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Multi-attention Ghost Pyramid Fusion Network for Script Identification of Chinese Ancient Document Images"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6380-1484","authenticated-orcid":false,"given":"Hai","family":"Guo","sequence":"first","affiliation":[{"name":"School of Data Science and Artificial Intelligence, Wenzhou University of Technology","place":["Wenzhou, China"]},{"name":"College of Computer Science and Engineering, Dalian Minzu University","place":["Wenzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6355-6628","authenticated-orcid":false,"given":"Dawei","family":"Zhu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Engineering, Dalian Minzu University","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1396-2134","authenticated-orcid":false,"given":"Jingying","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Computer Science and Engineering, Dalian Minzu University","place":["Dalian, China"]},{"name":"College of Electronic Information and Electrical Engineering, Dalian University of Technology","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3182-1040","authenticated-orcid":false,"given":"Lingling","family":"Tong","sequence":"additional","affiliation":[{"name":"College of Computer Science and Engineering, Dalian Minzu University","place":["Dalian, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,10,8]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-022-00421-8"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-024-05487-x"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","unstructured":"Galal M. Binmakhashen and Sabri A. Mahmoud. 2019. Document layout analysis: A comprehensive survey. ACM Computing Surveys 52 6 Article 109 (November 2020) 36 pages. DOI:10.1145\/3355610","DOI":"10.1145\/3355610"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3421558.3421572"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/1568292.1568294"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/2505377.2505382"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3396167"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490031"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-018-6764-0"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2034617.2034630"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3402891"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3506699"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.03.012"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3038884.3038900"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3406209"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1973.4309314"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2019.00175"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSSE.2018.8519972"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2023.04.015"},{"key":"e_1_3_1_21_2","first-page":"11863","article-title":"Simam: A simple, parameter-free attention module for convolutional neural networks","volume":"139","author":"Yang Lingxiao","year":"2021","unstructured":"Lingxiao Yang, Ru-Yuan Zhang, Lida Li, and Xiaohua Xie. 2021. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research). PMLR, 139, 11863\u201311874.","journal-title":"Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research)"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01625"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP42928.2021.9506485"},{"key":"e_1_3_1_24_2","volume-title":"Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning","author":"Netzer Y.","year":"2011","unstructured":"Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10044-023-01146-y"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.689305"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2005.206"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2008.01.012"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.5755\/j01.eee.21.4.12785"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.22266\/ijies2018.0831.12"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/iwssip.2019.8787267"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23117-4_32"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900268"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.68"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-019-04235-4"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2018.07.034"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578938"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_43_2","first-page":"6105","volume-title":"Proceedings of the 36th International Conference on Machine Learning. PMLR","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 6105\u20136114."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_1_45_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2020. An image is worth 16\u00d716 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01044"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_48_2","unstructured":"S. Mehta and M. Rastegari 2021. Mobilevit: light-weight general-purpose and mobile-friendly vision transformer. arXiv:2110.02178. Retrieved from https:\/\/arxiv.org\/abs\/2110.02178"},{"key":"e_1_3_1_49_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Ao","year":"2024","unstructured":"Wang, Ao et al. 2024. Repvit: Revisiting mobile cnn from vit perspective. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Vasu","key":"e_1_3_1_50_2","unstructured":"Vasu, Pavan Kumar Anasosalu et al. Mobileone: An improved one millisecond mobile backbone. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_3_1_53_2","unstructured":"S. H. Hasanpour M. Rouhani M. Fayyaz and M. Sabokrou. 2016. Lets keep it simple using simple architectures to outperform deeper and more complex architectures. arXiv: 1608.06037. Retrieved from https:\/\/arxiv.org\/abs\/1608.06037"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_1_55_2","unstructured":"F. N. Iandola S. Han M. W. Moskewicz and K. Ashraf. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv: 1602.07360. Retrieved from https:\/\/arxiv.org\/abs\/1602.07360"},{"key":"e_1_3_1_56_2","unstructured":"J. Redmon and A. Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv: 1804.02767. Retrieved from https:\/\/arxiv.org\/abs\/1804.02767"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00584"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00203"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748314","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T21:00:13Z","timestamp":1759957213000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748314"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,8]]},"references-count":57,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,31]]}},"alternative-id":["10.1145\/3748314"],"URL":"https:\/\/doi.org\/10.1145\/3748314","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2025,10,8]]},"assertion":[{"value":"2024-01-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-30","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}