{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T03:16:28Z","timestamp":1758078988470,"version":"3.44.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Most enterprise graph data derives from relational databases, yet transforming relational tables into query-optimized graph schemas remains challenging. Existing approaches have notable limitations: (1) transformations based on primary and foreign keys often fail to generate schemas optimized for query performance; (2) manual schema design, although flexible, is costly and requires domain expertise; and (3) machine learning methods predict graph structures based on data patterns but heavily depend on large, high-quality training datasets. To address these challenges, we propose Galaxy-Weaver, a framework to automate query-aware graph schema generation. GalaxyWeaver utilizes the reasoning power of Large Language Models (LLMs) to align graph schema designs with specific query requirements, effectively integrating domain knowledge with optimization strategies. The framework employs prompt-guided analysis to enhance the decision-making accuracy of LLM agents, facilitating iterative schema refinement. Experiments across diverse domains show that GalaxyWeaver simplifies transformation while improving query performance and reducing storage costs.<\/jats:p>","DOI":"10.14778\/3750601.3750630","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"5100-5112","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GalaxyWeaver: Autonomous Table-to-Graph Conversion and Schema Optimization with Large Language Models"],"prefix":"10.14778","volume":"18","author":[{"given":"Bing","family":"Tong","sequence":"first","affiliation":[{"name":"CreateLink &amp; HKUST(GZ)"}]},{"given":"Yan","family":"Zhou","sequence":"additional","affiliation":[{"name":"CreateLink"}]},{"given":"Chen","family":"Zhang","sequence":"additional","affiliation":[{"name":"CreateLink"}]},{"given":"Jianheng","family":"Tang","sequence":"additional","affiliation":[{"name":"HKUST(GZ)"}]},{"given":"Jia","family":"Li","sequence":"additional","affiliation":[{"name":"HKUST(GZ)"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"HKUST(GZ)"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Neo4j. https:\/\/neo4j.com\/"},{"key":"e_1_2_1_2_1","unstructured":"Renzo Angles. 2018. The Property Graph Database Model.. In AMW."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589778"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 2021 International Conference on Management of Data. 2423\u20132436","author":"Angles Renzo","year":"2021","unstructured":"Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W Hare, Jan Hidders, Victor E Lee, Bei Li, Leonid Libkin, Wim Martens, et al. 2021. Pg-keys: Keys for property graphs. In Proceedings of the 2021 International Conference on Management of Data. 2423\u20132436."},{"key":"e_1_2_1_5_1","unstructured":"Anthropic. [n.d.]. The Claude 3 Model Family: Opus Sonnet Haiku. https:\/\/api.semanticscholar.org\/CorpusID:268232499. Accessed: 2 6."},{"key":"e_1_2_1_6_1","volume-title":"Systems for Graph Extraction from Tabular Data. Master's thesis","author":"Anzum Nafisa","unstructured":"Nafisa Anzum. 2020. Systems for Graph Extraction from Tabular Data. Master's thesis. University of Waterloo."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 2021 International Conference on Management of Data. 2821\u20132828","author":"Arenas Marcelo","year":"2021","unstructured":"Marcelo Arenas, Claudio Guti\u00e9rrez, and Juan F Sequeda. 2021. Querying in the age of graph databases and knowledge graphs. In Proceedings of the 2021 International Conference on Management of Data. 2821\u20132828."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465296"},{"key":"e_1_2_1_9_1","volume-title":"Relational database theory","author":"Atzeni Paolo","unstructured":"Paolo Atzeni and Valeria De Antonellis. 1993. Relational database theory. Benjamin-Cummings Publishing Co., Inc."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3584372.3588654"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3681972"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC'13). USENIX Association, USA, 49\u201360","author":"Bronson Nathan","year":"2013","unstructured":"Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's distributed data store for the social graph. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC'13). USENIX Association, USA, 49\u201360."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 30th Brazilian Symposium on Databases.","author":"Cavoto Patr\u00edcia","year":"2015","unstructured":"Patr\u00edcia Cavoto and Andr\u00e9 Santanch\u00e8. 2015. ReGraph: bridging relational and graph databases. In Proceedings of the 30th Brazilian Symposium on Databases."},{"key":"e_1_2_1_14_1","volume-title":"ACM Turing award lectures.","author":"Codd Edgar F","year":"1981","unstructured":"Edgar F Codd. 2007. Relational database: A practical foundation for productivity. In ACM Turing award lectures. 1981."},{"key":"e_1_2_1_15_1","volume-title":"Tigergraph: A native MPP graph database. arXiv preprint arXiv:1901.08248","author":"Deutsch Alin","year":"2019","unstructured":"Alin Deutsch, Yu Xu, Mingxi Wu, and Victor Lee. 2019. Tigergraph: A native MPP graph database. arXiv preprint arXiv:1901.08248 (2019)."},{"key":"e_1_2_1_16_1","unstructured":"DMDave Todd B. and Will Cukierski. 2014. Acquire Valued Shoppers Challenge. https:\/\/kaggle.com\/competitions\/acquire-valued-shoppers-challenge."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742786"},{"key":"e_1_2_1_18_1","volume-title":"2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA). IEEE, 674\u2013680","author":"Feng Hui","year":"2022","unstructured":"Hui Feng and Meigen Huang. 2022. An approach to converting relational database to graph database: From MySQL to Neo4j. In 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA). IEEE, 674\u2013680."},{"key":"e_1_2_1_19_1","unstructured":"Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Jan L Harrington. 2016. Relational database design and implementation. Morgan Kaufmann.","DOI":"10.1016\/B978-0-12-804399-8.00006-5"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1361-3723(20)30073-7"},{"key":"e_1_2_1_22_1","unstructured":"Aaron Hurst Adam Lerer Adam P Goucher Adam Perelman Aditya Ramesh Aidan Clark AJ Ostrow Akila Welihinda Alan Hayes Alec Radford et al. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1018"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems. 1\u20137.","author":"Pacaci Anil","year":"2017","unstructured":"Anil Pacaci, Alice Zhou, Jimmy Lin, and M Tamer \u00d6zsu. 2017. Do we need specialized graph databases? Benchmarking real-time social networking applications. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems. 1\u20137."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13369-021-05682-9"},{"key":"e_1_2_1_26_1","volume-title":"Communicative agents for software development. arXiv preprint arXiv:2307.07924 6, 3","author":"Qian Chen","year":"2023","unstructured":"Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, and Maosong Sun. 2023. Communicative agents for software development. arXiv preprint arXiv:2307.07924 6, 3 (2023)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Kashif Rabbani Matteo Lissandrini Angela Bonifati and Katja Hose. 2025. Transforming RDF Graphs to Property Graphs using Standardized Schemas. (2025).","DOI":"10.1145\/3698817"},{"key":"e_1_2_1_28_1","volume-title":"Graph databases: new opportunities for connected data","author":"Robinson Ian","unstructured":"Ian Robinson, Jim Webber, and Emil Eifrem. 2015. Graph databases: new opportunities for connected data. O'Reilly Media, Inc."},{"key":"e_1_2_1_29_1","first-page":"21330","article-title":"Relbench: A benchmark for deep learning on relational databases","volume":"37","author":"Robinson Joshua","year":"2024","unstructured":"Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan Eric Lenssen, Yiwen Yuan, Zecheng Zhang, et al. 2024. Relbench: A benchmark for deep learning on relational databases. Advances in Neural Information Processing Systems 37 (2024), 21330\u201321341.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2740908.2742839"},{"key":"e_1_2_1_31_1","unstructured":"StackExchange. n.d.. StackExchange Data Explorer. https:\/\/data.stackexchange.com\/."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_33_1","volume-title":"International Conference on Machine Learning. PMLR, 21076\u201321089","author":"Tang Jianheng","year":"2022","unstructured":"Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. 2022. Rethinking graph neural networks for anomaly detection. In International Conference on Machine Learning. PMLR, 21076\u201321089."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685814"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3200842.3200852"},{"key":"e_1_2_1_36_1","volume-title":"Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani A","year":"2017","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","first-page":"1292","DOI":"10.35940\/ijitee.I7805.0881019","article-title":"An efficient graph database model","volume":"88","author":"Vyawahare Harsha R","year":"2019","unstructured":"Harsha R Vyawahare, Pravin P Karde, and Vilas M Thakare. 2019. An efficient graph database model. Int. J. Innov. Technol. Explor. Eng 88, 10 (2019), 1292\u20131295.","journal-title":"Int. J. Innov. Technol. Explor. Eng"},{"key":"e_1_2_1_38_1","volume-title":"The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track.","author":"Wang Minjie","unstructured":"Minjie Wang, Quan Gan, David Wipf, Zheng Zhang, Christos Faloutsos, Weinan Zhang, Muhan Zhang, Zhenkun Cai, Jiahang Li, Zunyao Mao, et al. [n.d.]. 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_2_1_39_1","volume-title":"Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le.","author":"Wei Jason","year":"2021","unstructured":"Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021)."},{"key":"e_1_2_1_40_1","volume-title":"Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155","author":"Wu Qingyun","year":"2023","unstructured":"Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2023. Autogen: Enabling next-gen llm applications via multi-agent conversation. arXiv preprint arXiv:2308.08155 (2023)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2023.123618"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-016-5228-9"},{"key":"e_1_2_1_43_1","volume-title":"Graph Databases: Theory and Practice","author":"Zhang Chen","year":"2024","unstructured":"Chen Zhang, Jing Wu, and Yan Zhou. 2024. Graph Databases: Theory and Practice. Electronics Industry Press."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1088\/1755-1315\/659\/1\/012108"},{"key":"e_1_2_1_45_1","volume-title":"Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, et al.","author":"Zhou Wangchunshu","year":"2023","unstructured":"Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, et al. 2023. Agents: An open-source framework for autonomous language agents. arXiv preprint arXiv:2309.07870 (2023)."},{"key":"e_1_2_1_46_1","volume-title":"D-bot: Database diagnosis system using large language models. arXiv preprint arXiv:2312.01454","author":"Zhou Xuanhe","year":"2023","unstructured":"Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, and Guoyang Zeng. 2023. D-bot: Database diagnosis system using large language models. arXiv preprint arXiv:2312.01454 (2023)."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685816"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750630","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:12Z","timestamp":1758029892000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750630"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":47,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750630"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750630","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}