{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T05:31:45Z","timestamp":1774935105182,"version":"3.50.1"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:p>The requirement for specialization in data management systems has evolved faster than our software development practices. After decades of organic growth, this situation has created a siloed landscape composed of hundreds of products developed and maintained as monoliths, with limited reuse between systems. This fragmentation has resulted in developers often reinventing the wheel, increased maintenance costs, and slowed down innovation. It has also affected the end users, who are often required to learn the idiosyncrasies of dozens of incompatible SQL and non-SQL API dialects, and settle for systems with incomplete functionality and inconsistent semantics. In this vision paper, considering the recent popularity of open source projects aimed at standardizing different aspects of the data stack, we advocate for a paradigm shift in how data management systems are designed. We believe that by decomposing these into a modular stack of reusable components, development can be streamlined while creating a more consistent experience for users. Towards that goal, we describe the state-of-the-art, principal open source technologies, and highlight open questions and areas where additional research is needed. We hope this work will foster collaboration, motivate further research, and promote a more composable future for data management.<\/jats:p>","DOI":"10.14778\/3603581.3603604","type":"journal-article","created":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T19:06:48Z","timestamp":1691521608000},"page":"2679-2685","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["The Composable Data Management System Manifesto"],"prefix":"10.14778","volume":"16","author":[{"given":"Pedro","family":"Pedreira","sequence":"first","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Orri","family":"Erling","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Konstantinos","family":"Karanasos","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Scott","family":"Schneider","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Wes","family":"McKinney","sequence":"additional","affiliation":[{"name":"Voltron Data"}]},{"given":"Satya R","family":"Valluri","sequence":"additional","affiliation":[{"name":"Databricks Inc."}]},{"given":"Mohamed","family":"Zait","sequence":"additional","affiliation":[{"name":"Databricks Inc."}]},{"given":"Jacques","family":"Nadeau","sequence":"additional","affiliation":[{"name":"Sundeck"}]}],"member":"320","published-online":{"date-parts":[[2023,8,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190662"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526054"},{"key":"e_1_2_1_3_1","unstructured":"BlazingSQL. [n.d.]. A lightweight GPU accelerated SQL engine built on the RAPIDS.ai ecosystem. https:\/\/github.com\/BlazingDB\/blazingsql.  BlazingSQL. [n.d.]. A lightweight GPU accelerated SQL engine built on the RAPIDS.ai ecosystem. https:\/\/github.com\/BlazingDB\/blazingsql."},{"key":"e_1_2_1_4_1","volume-title":"Conference on Innovative Data Systems Research (CIDR)","author":"Chattopadhyay Biswapesh","year":"2023","unstructured":"Biswapesh Chattopadhyay , Pedro Pedreira , Sameer Agarwal , Yutian James Sun , Suketu Vakharia , Peng Li , Weiran Liu , and Sundaram Narayanan . 2023 . Shared Foundations: Modernizing Meta's Data Lakehouse . Conference on Innovative Data Systems Research (CIDR) (2023). Biswapesh Chattopadhyay, Pedro Pedreira, Sameer Agarwal, Yutian James Sun, Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. Conference on Innovative Data Systems Research (CIDR) (2023)."},{"key":"e_1_2_1_5_1","volume-title":"The Challenge of Cross-Language Interoperability. 56, 12","author":"Chisnall David","year":"2013","unstructured":"David Chisnall . 2013. The Challenge of Cross-Language Interoperability. 56, 12 ( 2013 ). David Chisnall. 2013. The Challenge of Cross-Language Interoperability. 56, 12 (2013)."},{"key":"e_1_2_1_6_1","volume-title":"Apache Arrow: A cross-language development platform for in-memory analytics. Retrieved","author":"Software Foundation The Apache","year":"2023","unstructured":"The Apache Software Foundation . 2023 . Apache Arrow: A cross-language development platform for in-memory analytics. Retrieved March 1, 2023 from https:\/\/arrow.apache.org\/ The Apache Software Foundation. 2023. Apache Arrow: A cross-language development platform for in-memory analytics. Retrieved March 1, 2023 from https:\/\/arrow.apache.org\/"},{"key":"e_1_2_1_7_1","unstructured":"The PostgreSQL Global Development Group. [n.d.]. PostgreSQL: The World's Most Advanced Open Source Relational Database. https:\/\/www.postgresql.org\/.  The PostgreSQL Global Development Group. [n.d.]. PostgreSQL: The World's Most Advanced Open Source Relational Database. https:\/\/www.postgresql.org\/."},{"key":"e_1_2_1_8_1","unstructured":"Oracle Inc. [n.d.]. MySQL. https:\/\/www.mysql.com\/.  Oracle Inc. [n.d.]. MySQL. https:\/\/www.mysql.com\/."},{"key":"e_1_2_1_9_1","unstructured":"Snowflake Inc. [n.d.]. Snowpark for Python Java and Scala. https:\/\/www.snowflake.com\/en\/data-cloud\/snowpark\/.  Snowflake Inc. [n.d.]. Snowpark for Python Java and Scala. https:\/\/www.snowflake.com\/en\/data-cloud\/snowpark\/."},{"key":"e_1_2_1_10_1","volume-title":"Gluten: A Spark Plugin to Offload SQL Engine to Native Library. Retrieved","year":"2023","unstructured":"Intel. 2023 . Gluten: A Spark Plugin to Offload SQL Engine to Native Library. Retrieved March 1, 2023 from https:\/\/github.com\/oap-project\/gluten Intel. 2023. Gluten: A Spark Plugin to Offload SQL Engine to Native Library. Retrieved March 1, 2023 from https:\/\/github.com\/oap-project\/gluten"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476378"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022","author":"Marathe Arunprasad P.","year":"2022","unstructured":"Arunprasad P. Marathe , Shu Lin , Weidong Yu , Kareem El Gebaly , Per-\u00c5ke Larson , and Calvin Sun . 2022 . Integrating the Orca Optimizer into MySQL . In Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022 , Edinburgh, UK, March 29 - April 1, 2022. OpenProceedings.org, 2:511--2:523. Arunprasad P. Marathe, Shu Lin, Weidong Yu, Kareem El Gebaly, Per-\u00c5ke Larson, and Calvin Sun. 2022. Integrating the Orca Optimizer into MySQL. In Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022, Edinburgh, UK, March 29 - April 1, 2022. OpenProceedings.org, 2:511--2:523."},{"key":"e_1_2_1_13_1","unstructured":"Wes McKinney. [n.d.]. Adding new columnar memory layouts to Arrow. https:\/\/lists.apache.org\/thread\/49qzofswg1r5z7zh39pjvd1m2ggz2kdq.  Wes McKinney. [n.d.]. Adding new columnar memory layouts to Arrow. https:\/\/lists.apache.org\/thread\/49qzofswg1r5z7zh39pjvd1m2ggz2kdq."},{"key":"e_1_2_1_14_1","volume-title":"Building Query Compilers. Retrieved","author":"Moerkotte Guido","year":"2023","unstructured":"Guido Moerkotte . 2023. Building Query Compilers. Retrieved March 1, 2023 from https:\/\/pi3.informatik.uni-mannheim.de\/~moer\/querycompiler.pdf Guido Moerkotte. 2023. Building Query Compilers. Retrieved March 1, 2023 from https:\/\/pi3.informatik.uni-mannheim.de\/~moer\/querycompiler.pdf"},{"key":"e_1_2_1_15_1","unstructured":"OmniSci. [n.d.]. OmniSci. https:\/\/docs.omnisci.com\/.  OmniSci. [n.d.]. OmniSci. https:\/\/docs.omnisci.com\/."},{"key":"e_1_2_1_16_1","unstructured":"Pandas. [n.d.]. Pandas - Python Data Analysis Library. https:\/\/pandas.pydata.org\/.  Pandas. [n.d.]. Pandas - Python Data Analysis Library. https:\/\/pandas.pydata.org\/."},{"key":"e_1_2_1_17_1","volume-title":"1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022","author":"Pasumansky Mosha","year":"2022","unstructured":"Mosha Pasumansky and Benjamin Wagner . 2022 . Assembling a Query Engine From Spare Parts . In 1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022 , Sydney, Australia , September 9, 2022, Satyanarayana R. Valluri and Mohamed Zait (Eds.). https:\/\/cdmsworkshop.github.io\/2022\/Proceedings\/ShortPapers\/Paper1_MoshaPasumansky.pdf Mosha Pasumansky and Benjamin Wagner. 2022. Assembling a Query Engine From Spare Parts. In 1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022, Sydney, Australia, September 9, 2022, Satyanarayana R. Valluri and Mohamed Zait (Eds.). https:\/\/cdmsworkshop.github.io\/2022\/Proceedings\/ShortPapers\/Paper1_MoshaPasumansky.pdf"},{"key":"e_1_2_1_18_1","first-page":"12","article-title":"Velox: Meta's Unified Execution Engine","volume":"15","author":"Pedreira Pedro","year":"2022","unstructured":"Pedro Pedreira , Orri Erling , Masha Basmanova , Kevin Wilfong , Laith Sakka , Krishna Pai , Wei He , and Biswapesh Chattopadhyay . 2022 . Velox: Meta's Unified Execution Engine . Proc. VLDB Endow. 15 , 12 (aug 2022), 13. Pedro Pedreira, Orri Erling, Masha Basmanova, Kevin Wilfong, Laith Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta's Unified Execution Engine. Proc. VLDB Endow. 15, 12 (aug 2022), 13.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380609"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407807"},{"key":"e_1_2_1_21_1","unstructured":"The Ibis Project. [n.d.]. The flexibility of Python analytics with the scale and performance of modern SQL. https:\/\/ibis-project.org\/.  The Ibis Project. [n.d.]. The flexibility of Python analytics with the scale and performance of modern SQL. https:\/\/ibis-project.org\/."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_1_23_1","volume-title":"Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813","author":"Sethi Raghav","year":"2019","unstructured":"Raghav Sethi , Martin Traverso , Dain Sundstrom , David Phillips , Wenlei Xie , Yutian Sun , Nezih Yegitbasi , Haozhun Jin , Eric Hwang , Nileema Shingte , and Christopher Berner . 2019 . Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813 . Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595637"},{"key":"e_1_2_1_25_1","unstructured":"Apache Spark. [n.d.]. Apache Spark - Unified Engine for large-scale data analytics. https:\/\/spark.apache.org\/.  Apache Spark. [n.d.]. Apache Spark - Unified Engine for large-scale data analytics. https:\/\/spark.apache.org\/."},{"key":"e_1_2_1_26_1","unstructured":"Apache Spark. [n.d.]. Spark SQL And DataFrames. https:\/\/spark.apache.org\/sql\/.  Apache Spark. [n.d.]. Spark SQL And DataFrames. https:\/\/spark.apache.org\/sql\/."},{"key":"e_1_2_1_27_1","volume-title":"Substrait: Cross-Language Serialization for Relational Algebra. Retrieved","year":"2023","unstructured":"Substrait. 2023 . Substrait: Cross-Language Serialization for Relational Algebra. Retrieved March 1, 2023 from https:\/\/substrait.io\/ Substrait. 2023. Substrait: Cross-Language Serialization for Relational Algebra. Retrieved March 1, 2023 from https:\/\/substrait.io\/"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_29_1","volume-title":"Velox - Github Repository. Retrieved","year":"2023","unstructured":"Velox. 2023. Velox - Github Repository. Retrieved March 1, 2023 from https:\/\/github.com\/facebookincubator\/velox Velox. 2023. Velox - Github Repository. Retrieved March 1, 2023 from https:\/\/github.com\/facebookincubator\/velox"},{"key":"e_1_2_1_30_1","volume-title":"1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022","author":"Wilhite David","year":"2022","unstructured":"David Wilhite . 2022 . GoogleSQL: A SQL Language as a Component . In 1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022 , Sydney, Australia , September 9, 2022, Satyanarayana R. Valluri and Mohamed Zait (Eds.). https:\/\/cdmsworkshop.github.io\/2022\/Slides\/Fri_C2.5_DavidWilhite.pptx David Wilhite. 2022. GoogleSQL: A SQL Language as a Component. In 1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022, Sydney, Australia, September 9, 2022, Satyanarayana R. Valluri and Mohamed Zait (Eds.). https:\/\/cdmsworkshop.github.io\/2022\/Slides\/Fri_C2.5_DavidWilhite.pptx"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3603581.3603604","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T19:13:40Z","timestamp":1691522020000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3603581.3603604"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6]]},"references-count":30,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10.14778\/3603581.3603604"],"URL":"https:\/\/doi.org\/10.14778\/3603581.3603604","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,6]]},"assertion":[{"value":"2023-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}