{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T15:27:15Z","timestamp":1781018835678,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T00:00:00Z","timestamp":1774224000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,3,23]]},"DOI":"10.1145\/3748522.3779780","type":"proceedings-article","created":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T14:17:49Z","timestamp":1781014669000},"page":"934-943","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Structured Extraction from Business Process Diagrams Using Vision-Language Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8655-4762","authenticated-orcid":false,"given":"Pritam","family":"Deka","sequence":"first","affiliation":[{"name":"Queen's University Belfast, Belfast, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2128-8632","authenticated-orcid":false,"given":"Barry","family":"Devereux","sequence":"additional","affiliation":[{"name":"Queen's University Belfast, Belfast, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,6,9]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Pravesh Agrawal et al. 2024. Pixtral 12b. arXiv preprint arXiv:2410.07073."},{"key":"e_1_3_2_1_2_1","volume-title":"https:\/\/huggingface.co\/meta-llama\/Llama-3.2-11B-Vision-Instruct","author":"Meta AI.","year":"2024","unstructured":"Meta AI. 2024. Llama 3.2 11b vision instruct model. (2024). https:\/\/huggingface.co\/meta-llama\/Llama-3.2-11B-Vision-Instruct."},{"key":"e_1_3_2_1_3_1","volume-title":"introduction to the standard for business process modeling","author":"Allweyer Thomas","unstructured":"Thomas Allweyer. 2016. BPMN 2.0: introduction to the standard for business process modeling. BoD-Books on Demand."},{"key":"e_1_3_2_1_4_1","unstructured":"Alessandro Antinori Riccardo Coltrinari Flavio Corradini Fabrizio Fornari Barbara Re Marco Scarpetta et al. 2022. Bpmn-redrawer: from images to bpmn models. In."},{"key":"e_1_3_2_1_5_1","unstructured":"Jinze Bai Shuai Bai Shusheng Yang Shijie Wang Sinan Tan Peng Wang Junyang Lin Chang Zhou and Jingren Zhou. 2023. Qwen-vl: a versatile vision-language model for understanding localization text reading and beyond. arXiv preprint arXiv:2308.12966."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-17604-3_11"},{"key":"e_1_3_2_1_7_1","first-page":"12","article-title":"How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites","volume":"67","author":"Zhe Chen","year":"2024","unstructured":"Zhe Chen et al. 2024. How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites. Science China Information Sciences, 67, 12, 220101.","journal-title":"Science China Information Sciences"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. routledge.","DOI":"10.4324\/9780203771587"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Flavio Corradini Sara Pettinari Barbara Re Lorenzo Rossi and Francesco Tiezzi. 2024. A technique for discovering bpmn collaboration diagrams. Software and Systems Modeling 1\u201321.","DOI":"10.1007\/s10270-024-01153-5"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171\u20134186."},{"key":"e_1_3_2_1_11_1","unstructured":"Th\u00e9o Fagnoni Bellinda Mesbah Mahsun Altin and Phillip Kingston. 2024. Opus: a large work model for complex workflow generation. arXiv preprint arXiv:2412.00573."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1937.10503522"},{"key":"e_1_3_2_1_13_1","volume-title":"Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23","author":"Friedrich Fabian","year":"2011","unstructured":"Fabian Friedrich, Jan Mendling, and Frank Puhlmann. 2011. Process model generation from natural language text. In Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23. Springer, 482\u2013496."},{"key":"e_1_3_2_1_14_1","unstructured":"Chaoyou Fu et al. 2025. Vita-1.5: towards gpt-4o level real-time vision and speech interaction. arXiv preprint arXiv:2501.01957."},{"key":"e_1_3_2_1_15_1","unstructured":"Camunda Services GmbH. 2015. Bpmn for research. https:\/\/github.com\/camunda\/bpmn-for-research. Accessed: 2025-04-04. (2015)."},{"key":"e_1_3_2_1_16_1","unstructured":"Camunda Services GmbH. 2014. Bpmn-js: a bpmn 2.0 rendering toolkit and web modeler. https:\/\/github.com\/bpmn-io\/bpmn-js. Accessed: 2025-04-04. (2014)."},{"key":"e_1_3_2_1_17_1","unstructured":"2023. Gpt-4v(ision) system card. In https:\/\/api.semanticscholar.org\/CorpusID:263218031."},{"key":"e_1_3_2_1_18_1","volume-title":"International Conference on Business Process Management. Springer, 453\u2013465","author":"Grohs Michael","year":"2023","unstructured":"Michael Grohs, Luka Abb, Nourhan Elsayed, and Jana-Rebecca Rehse. 2023. Large language models can accomplish business process management tasks. In International Conference on Business Process Management. Springer, 453\u2013465."},{"key":"e_1_3_2_1_19_1","unstructured":"Object Management Group. 2014. Business process model and notation (bpmn) specification. (2014). https:\/\/www.omg.org\/spec\/BPMN\/2.0."},{"key":"e_1_3_2_1_20_1","unstructured":"Yucheng Han Chi Zhang Xin Chen Xu Yang Zhibin Wang Gang Yu Bin Fu and Hanwang Zhang. 2023. Chartllama: a multimodal llm for chart understanding and generation. arXiv preprint arXiv:2311.16483."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3680790"},{"key":"e_1_3_2_1_22_1","unstructured":"Aaron Hurst et al. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Maurice G Kendall and B Babington Smith. 1939. The problem of m rankings. The annals of mathematical statistics 10 3 275\u2013287.","DOI":"10.1214\/aoms\/1177732186"},{"key":"e_1_3_2_1_24_1","volume-title":"Wenhe Feng, Nicholas Yew Jin Tan, and Seung Ki Moon.","author":"Khan Muhammad Tayyab","year":"2024","unstructured":"Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, and Seung Ki Moon. 2024. Fine-tuning vision-language model for automated engineering drawing information extraction. arXiv preprint arXiv:2411.03707."},{"key":"e_1_3_2_1_25_1","volume-title":"Stephen SG Lee, and Eng Wah Lee","author":"Ko Ryan KL","year":"2009","unstructured":"Ryan KL Ko, Stephen SG Lee, and Eng Wah Lee. 2009. Business process management (bpm) standards: a survey. Business process management journal, 15, 5, 744\u2013791."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.2298\/CSIS140610006K"},{"key":"e_1_3_2_1_27_1","volume-title":"International Conference on Business Process Management. Springer, 259\u2013270","author":"K\u00f6pke Julius","year":"2024","unstructured":"Julius K\u00f6pke and Aya Safan. 2024. Efficient llm-based conversational process modeling. In International Conference on Business Process Management. Springer, 259\u2013270."},{"key":"e_1_3_2_1_28_1","unstructured":"Humam Kourani Alessandro Berti Daniel Schuster and Wil MP Van der Aalst. 2024. Promoai: process modeling with generative ai. arXiv preprint arXiv:2403.04327."},{"key":"e_1_3_2_1_29_1","volume-title":"Business process modeling, simulation and design","author":"Laguna Manuel","unstructured":"Manuel Laguna and Johan Marklund. 2018. Business process modeling, simulation and design. Chapman and Hall\/CRC."},{"key":"e_1_3_2_1_30_1","volume-title":"International Conference on Machine Learning. PMLR","author":"Kenton","unstructured":"Kenton Lee et al. 2023. Pix2struct: screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning. PMLR, 18893\u201318912."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Josip Tomo Licardo Nikola Tankovi\u0107 and Darko Etinger. 2024. A method for extracting bpmn models from textual descriptions using natural language processing. Procedia computer science 239 483\u2013490.","DOI":"10.1016\/j.procs.2024.06.196"},{"key":"e_1_3_2_1_32_1","unstructured":"Leilei Lin Yumeng Jin Yingming Zhou Wenlong Chen and Chen Qian. 2024. Mao: a framework for process model generation with multi-agent orchestration. arXiv preprint arXiv:2408.01916."},{"key":"e_1_3_2_1_33_1","unstructured":"Pan Lu et al. 2023. Mathvista: evaluating mathematical reasoning of foundation models in visual contexts. arXiv preprint arXiv:2310.02255."},{"key":"e_1_3_2_1_34_1","volume-title":"Jia Qing Tan, Shafiq Joty, and Enamul Hoque.","author":"Masry Ahmed","year":"2022","unstructured":"Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. Chartqa: a benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244."},{"key":"e_1_3_2_1_35_1","volume-title":"Enamul Hoque, and Shafiq Joty.","author":"Masry Ahmed","year":"2024","unstructured":"Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, and Shafiq Joty. 2024. Chartinstruct: instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028."},{"key":"e_1_3_2_1_36_1","unstructured":"Ahmed Masry Megh Thakkar Aayush Bajaj Aaryaman Kartha Enamul Hoque and Shafiq Joty. 2024. Chartgemma: visual instruction-tuning for chart reasoning in the wild. arXiv preprint arXiv:2407.04172."},{"key":"e_1_3_2_1_37_1","unstructured":"Carlos Matos and Reiko Heckel. 2009. Migrating legacy systems to service-oriented architectures. Electronic Communications of the EASST 16."},{"key":"e_1_3_2_1_38_1","unstructured":"Dipankar Medhi. 2024. Target prompting for information extraction with vision language model. arXiv preprint arXiv:2408.03834."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-63646-2_12"},{"key":"e_1_3_2_1_40_1","volume-title":"International Conference on Cooperative Information Systems. Springer, 398\u2013404","author":"Eldin Ali Nour","year":"2024","unstructured":"Ali Nour Eldin, Nour Assy, Olan Anesini, Benjamin Dalmas, and Walid Gaaloul. 2024. Nala2bpmn: automating bpmn model generation with large language models. In International Conference on Cooperative Information Systems. Springer, 398\u2013404."},{"key":"e_1_3_2_1_41_1","first-page":"115058","article-title":"Image2struct: benchmarking structure extraction for vision-language models","volume":"37","author":"Roberts Josselin","year":"2024","unstructured":"Josselin Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, and Percy S Liang. 2024. Image2struct: benchmarking structure extraction for vision-language models. Advances in Neural Information Processing Systems, 37, 115058\u2013115097.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3228308"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-78495-8_10"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2022.10.007"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Shubhankar Singh Purvi Chaurasia Yerram Varun Pranshu Pandya Vatsal Gupta Vivek Gupta and Dan Roth. 2024. Flowvqa: mapping multimodal logic in visual question answering with flowcharts. arXiv preprint arXiv:2406.19237.","DOI":"10.18653\/v1\/2024.findings-acl.78"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2007.4376991"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing, 34\u201346","author":"Tannert Simon","year":"2023","unstructured":"Simon Tannert, Marcelo G Feighelstein, Jasmina Bogojeska, Joseph Shtok, Assaf Arbelle, Peter WJ Staar, Anika Schumann, Jonas Kuhn, and Leonid Karlinsky. 2023. Flowchartqa: the first large-scale benchmark for reasoning over flowcharts. In Proceedings of the 1st Workshop on Linguistic Insights from and for Multimodal Language Processing, 34\u201346."},{"key":"e_1_3_2_1_48_1","unstructured":"Gemma Team et al. 2025. Gemma 3 technical report. arXiv preprint arXiv:2503.19786."},{"key":"e_1_3_2_1_49_1","unstructured":"Qwen Team. 2025. Qwen2.5-vl. (Jan. 2025). https:\/\/qwenlm.github.io\/blog\/qwen2.5-vl\/."},{"key":"e_1_3_2_1_50_1","unstructured":"RapidAI Team. 2021. Rapid OCR: ocr toolbox. https:\/\/github.com\/RapidAI\/RapidOCR. (2021)."},{"key":"e_1_3_2_1_51_1","article-title":"Chartgpt: leveraging llms to generate charts from abstract natural language","author":"Tian Yuan","year":"2024","unstructured":"Yuan Tian, Weiwei Cui, Dazhen Deng, Xinjing Yi, Yurun Yang, Haidong Zhang, and Yingcai Wu. 2024. Chartgpt: leveraging llms to generate charts from abstract natural language. IEEE Transactions on Visualization and Computer Graphics.","journal-title":"IEEE Transactions on Visualization and Computer Graphics."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2229156.2229157"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin 1 6 80\u201383.","DOI":"10.2307\/3001968"},{"key":"e_1_3_2_1_54_1","volume-title":"Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38\u201345","author":"Thomas","unstructured":"Thomas Wolf et al. 2020. Transformers: state-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38\u201345."},{"key":"e_1_3_2_1_55_1","unstructured":"Renqiu Xia et al. 2024. Chartx & chartvlm: a versatile benchmark and foundation model for complicated chart reasoning. arXiv preprint arXiv:2402.12185."},{"key":"e_1_3_2_1_56_1","unstructured":"Zhengzhuo Xu Sinan Du Yiyan Qi Chengjin Xu Chun Yuan and Jian Guo. 2023. Chartbench: a benchmark for complex visual reasoning in charts. arXiv preprint arXiv:2312.15915."},{"key":"e_1_3_2_1_57_1","volume-title":"European Conference on Computer Vision. Springer, 169\u2013186","author":"Renrui","unstructured":"Renrui Zhang et al. 2024. Mathverse: does your multi-modal llm truly see the diagrams in visual math problems? In European Conference on Computer Vision. Springer, 169\u2013186."}],"event":{"name":"SAC '26: 41st ACM\/SIGAPP Symposium on Applied Computing","location":"Grand Hotel Palace Thessaloniki Greece","acronym":"SAC '26","sponsor":["SIGAPP ACM Special Interest Group on Applied Computing"]},"container-title":["Proceedings of the 41st ACM\/SIGAPP Symposium on Applied Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748522.3779780","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T14:37:03Z","timestamp":1781015823000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748522.3779780"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,23]]},"references-count":57,"alternative-id":["10.1145\/3748522.3779780","10.1145\/3748522"],"URL":"https:\/\/doi.org\/10.1145\/3748522.3779780","relation":{},"subject":[],"published":{"date-parts":[[2026,3,23]]},"assertion":[{"value":"2026-06-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}