{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:19:16Z","timestamp":1757618356386,"version":"3.44.0"},"reference-count":103,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T00:00:00Z","timestamp":1751155200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T00:00:00Z","timestamp":1751155200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Swiss Federal Institute of Technology Zurich"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["The VLDB Journal"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Nested data is valuable and ubiquitous. It is being generated in ever-increasing volumes across industrial and research environments and frequently contains valuable information that is extracted through analytical workloads. Despite its popularity and value, there is no clear-cut understanding of the status quo in analytical workloads for nested data in high-energy physics (HEP). In this paper, we seek to define the landscape of nested data processing in HEP by evaluating 10 systems and their query languages on the IRIS HEP ADL benchmark, a popular and representative HEP benchmark. We attempt not only to understand how well these systems perform from a query latency and scalability point of view but also from a query language usability perspective. The result of our evaluation paints an interesting and rather complex picture of existing solutions. Many of the evaluated systems are between one and two orders of magnitude slower than the domain-specific system used in HEP today, while a few of the commodity systems provide on-par performance at greater costs. Moreover, the evaluated query languages and dialects vary greatly in how naturally and concisely they can express nested query patterns. These observations suggest that while commodity data management systems and their query languages are viable tools for nested data processing, significant work remains to make them competitive with domain-specific solutions like those used by the HEP community.\n<\/jats:p>","DOI":"10.1007\/s00778-025-00924-w","type":"journal-article","created":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T22:29:40Z","timestamp":1751149780000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["The Status-Quo in nested data processing for high-energy physics"],"prefix":"10.1007","volume":"34","author":[{"given":"Dan","family":"Graur","sequence":"first","affiliation":[]},{"given":"Ingo","family":"M\u00fcller","sequence":"additional","affiliation":[]},{"given":"Mason","family":"Proffitt","sequence":"additional","affiliation":[]},{"given":"Ghislain","family":"Fourny","sequence":"additional","affiliation":[]},{"given":"Gordon T.","family":"Watts","sequence":"additional","affiliation":[]},{"given":"Gustavo","family":"Alonso","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,29]]},"reference":[{"key":"924_CR1","unstructured":"Actian Corporation: Columnar Database for Big Data | Vector Analytic Database. (2021) https:\/\/www.actian.com\/analytic-database\/vector-analytic-database\/"},{"key":"924_CR2","doi-asserted-by":"publisher","unstructured":"Alsubaiee, S., Altowim, Y., Altwaijry, H., Behm, A., Borkar, V., Bu, Y., Carey, M., Cetindil, I., Cheelangi, M., Faraaz, K., Gabrielova, E., Grover, R., Heilbron, Z., Kim, Y.S., Li, C., Li, G., Ok, J.M., Onose N, Pirzadeh, P., Tsotras, V., Vernica, R., Wen, J., Westmann, T.: Asterixdb: a scalable, open source bdms. Proc VLDB Endow 7(14):1905\u20131916, (2014) https:\/\/doi.org\/10.14778\/2733085.2733096","DOI":"10.14778\/2733085.2733096"},{"key":"924_CR3","doi-asserted-by":"publisher","unstructured":"Amaral, V., Helmer, S., Moerkotte, G.: A Visual Query Language for HEP Analysis. In: Nuclear Science Symposium, (2003). https:\/\/doi.org\/10.1109\/NSSMIC.2003.1351826","DOI":"10.1109\/NSSMIC.2003.1351826"},{"key":"924_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/J.CPC.2009.08.005","author":"I Antcheva","year":"2009","unstructured":"Antcheva, I., Ballintijn, M., Bellenot, B., Biskup, M., Brun, R., Buncic, N., Canal, P., Casadei, D., Couet, O., Fine, V., Franco, L., Ganis, G., Gheata, A., Maline, D.G., Goto, M., Iwaszkiewicz, J., Kreshuk, A., Segura, D.M., Maunder, R., Moneta, L., Naumann, A., Offermann, E., Onuchin, V., Panacek, S., Rademakers, F., Russo, P., Tadel, M.: ROOT\u2014A C++ framework for petabyte data storage, statistical analysis and visualization. Comput. Phys. Commun. (2009). https:\/\/doi.org\/10.1016\/J.CPC.2009.08.005","journal-title":"Comput. Phys. Commun."},{"key":"924_CR5","doi-asserted-by":"publisher","unstructured":"Armbrust, M., Ghodsi, A., Zaharia, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, MJ.: Spark SQL: Relational Data Processing in Spark. In: SIGMOD, (2015). https:\/\/doi.org\/10.1145\/2723372.2742797","DOI":"10.1145\/2723372.2742797"},{"key":"924_CR6","doi-asserted-by":"crossref","unstructured":"Audibert, A., Chen, Y., Graur, D., Klimovic, A., \u0160im\u0161a, J., Thekkath, C.A.: tf. data service: A case for disaggregating ml input data processing. In: Proceedings of the 2023 ACM Symposium on Cloud Computing, pp. 358\u2013375 (2023)","DOI":"10.1145\/3620678.3624666"},{"key":"924_CR7","unstructured":"Baden, A., Day, C., Grossman, R., Lifka, D., Lusk, E., May, E., Price, L.: Analyzing high energy physics data using database computing: Preliminary report (1991)"},{"key":"924_CR8","doi-asserted-by":"publisher","first-page":"04","DOI":"10.1051\/EPJCONF\/201921404058","volume":"214","author":"Z Baranowski","year":"2019","unstructured":"Baranowski, Z., Kleszcz, E., Kothuri, P., Canali, L., Castellotti, R., Marquez, M.M., de Barros, N.G.M., Motesnitsalis, E., Mrowczynski, P., Duran, J.C.L.: Evolution of the Hadoop platform and ecosystem for high energy physics. EPJ Web Conf. 214, 04\u2013058 (2019). https:\/\/doi.org\/10.1051\/EPJCONF\/201921404058","journal-title":"EPJ Web Conf."},{"key":"924_CR9","doi-asserted-by":"publisher","unstructured":"Binko, P., Duellmann, D., Shiers, J., Binko, P., Duellmann, D., Shiers, J.: CERN RD45 Status Report - A Persistent Object Manager for HEP. In: CHEP, (1996). https:\/\/doi.org\/10.1142\/9789814447188_0061","DOI":"10.1142\/9789814447188_0061"},{"key":"924_CR10","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1085\/3\/032020","author":"J Blomer","year":"2018","unstructured":"Blomer, J.: A quantitative review of data formats for HEP analyses. J. Phys. Conf. Series (2018). https:\/\/doi.org\/10.1088\/1742-6596\/1085\/3\/032020","journal-title":"J. Phys. Conf. Series"},{"issue":"12","key":"924_CR11","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1145\/1409360.1409380","volume":"51","author":"PA Boncz","year":"2008","unstructured":"Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. CACM 51(12), 77\u201385 (2008). https:\/\/doi.org\/10.1145\/1409360.1409380","journal-title":"CACM"},{"key":"924_CR12","unstructured":"Bowen, M., Landsberg, GL., Partridge, R.: The physics analysis server project. In: Computing in High-Energy and Nuclear Physics (2000)"},{"issue":"1\u20132","key":"924_CR13","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/S0168-9002(97)00048-X","volume":"389","author":"R Brun","year":"1997","unstructured":"Brun, R., Rademakers, F.: ROOT\u2014an object oriented data analysis framework. Nucl. Instrum. Methods Phys. Res., Sect. A 389(1\u20132), 81\u201386 (1997). https:\/\/doi.org\/10.1016\/S0168-9002(97)00048-X","journal-title":"Nucl. Instrum. Methods Phys. Res., Sect. A"},{"key":"924_CR14","doi-asserted-by":"publisher","unstructured":"Brun, R., Rademakers, F., Canal, P., Naumann, A., Couet, O., Moneta, L., Vassilev, V., Linev, S., Piparo, D., GANIS, G., Bellenot, B., Guiraud, E., Amadio, G., wverkerke, Mato, P., TimurP, Tadel, M., wlav, Tejedor, E., Blomer, J., Gheata, A., Hageboeck, S., Roiser, S., marsupial, Wunsch, S., Shadura, O., Bose, A., CristinaCristescu, Valls, X., Isemann, R.: root-project\/root: v6.18\/02. (2019). https:\/\/doi.org\/10.5281\/zenodo.3895860","DOI":"10.5281\/zenodo.3895860"},{"key":"924_CR15","doi-asserted-by":"publisher","unstructured":"Buneman, P.: Semistructured data. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Association for Computing Machinery, New York, NY, USA, PODS \u201997, p 117\u2013121, (1997). https:\/\/doi.org\/10.1145\/263661.263675","DOI":"10.1145\/263661.263675"},{"key":"924_CR16","doi-asserted-by":"crossref","unstructured":"Ceccarelli, A., Cioni, A., Garzelli, M.V., Lenzi, P., Redapi, L.: (2019) Towards enhanced databases for High Energy Physics. In: Deep-Inelastic Scattering and Related Subjects, 2019. ADS Bibcode: 2019disr.confE.223C","DOI":"10.22323\/1.352.0223"},{"key":"924_CR17","doi-asserted-by":"publisher","unstructured":"Chang, J., Gutsche, O., Mandrichenko, I., Pivarski, J.: Striped Data Server for Scalable Parallel Data Analysis. Journal of Physics: Conference Series 1085(4):042,035, (2018). https:\/\/doi.org\/10.1088\/1742-6596\/1085\/4\/042035","DOI":"10.1088\/1742-6596\/1085\/4\/042035"},{"issue":"5","key":"924_CR18","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1016\/S0306-4379(98)00013-1","volume":"23","author":"S Cluet","year":"1998","unstructured":"Cluet, S.: Designing oql: allowing objects to be queried. Inf. Syst. 23(5), 279\u2013305 (1998)","journal-title":"Inf. Syst."},{"key":"924_CR19","doi-asserted-by":"publisher","unstructured":"collaboration, C.: SingleMu primary dataset in AOD format from Run of 2012 (\/SingleMu\/Run2012B-22Jan2013-v1\/AOD). (2017). https:\/\/doi.org\/10.7483\/OPENDATA.CMS.IYVQ.1J0W, cERN Open Data Portal","DOI":"10.7483\/OPENDATA.CMS.IYVQ.1J0W"},{"key":"924_CR20","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/219\/4\/042036","author":"J Cranshaw","year":"2010","unstructured":"Cranshaw, J., Malon, D., Vaniachine, A., Fine, V., Lauret, J., Hamill, P.: Petaminer: Using ROOT for efficient data storage in MySQL database. J. Phys. Conf. Series (2010). https:\/\/doi.org\/10.1088\/1742-6596\/219\/4\/042036","journal-title":"J. Phys. Conf. Series"},{"key":"924_CR21","doi-asserted-by":"crossref","unstructured":"Dageville, B., Cruanes, T., Zukowski, M., Antonov, V., Avanes, A., Bock, J., Claybaugh, J., Engovatov, D., Hentschel, M., Huang, J., et\u00a0al.: The snowflake elastic data warehouse. In: Proceedings of the 2016 International Conference on Management of Data, pp. 215\u2013226 (2016)","DOI":"10.1145\/2882903.2903741"},{"key":"924_CR22","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-5910-1_2","author":"S Dai","year":"2018","unstructured":"Dai, S., Gao, W., Xie, B., Yu, M., Chen, J., Kong, D., Han, R., Li, J.: Evaluating index systems of high energy physics. Commun. Comput. Inform. Sci. (2018). https:\/\/doi.org\/10.1007\/978-981-13-5910-1_2","journal-title":"Commun. Comput. Inform. Sci."},{"key":"924_CR23","doi-asserted-by":"crossref","unstructured":"Deutsch, A., Tannen, V.: Mars: A system for publishing xml from mixed and redundant storage. In: Proceedings 2003 VLDB Conference, Elsevier, pp. 201\u2013212 (2003)","DOI":"10.1016\/B978-012722442-8\/50026-4"},{"key":"924_CR24","doi-asserted-by":"crossref","unstructured":"Deutsch, A., Fernandez, M., Florescu, D., Levy, A., Suciu, D.: (1998) Xml-ql: a query language for xml","DOI":"10.1016\/S1389-1286(99)00020-1"},{"key":"924_CR25","doi-asserted-by":"crossref","unstructured":"Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with stored. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data, pp. 431\u2013442 (1999)","DOI":"10.1145\/304182.304220"},{"key":"924_CR26","unstructured":"Deutsch, A., Popa, L., Tannen, V.: Chase & backchase: A method for query optimization with materialized views and integrity constraints. Technical Reports (CIS) p.\u00a011 (2001)"},{"issue":"2","key":"924_CR27","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1145\/304181.304229","volume":"28","author":"D D\u00fcllmann","year":"1999","unstructured":"D\u00fcllmann, D.: Petabyte databases. SIGMOD Rec 28(2), 506 (1999). https:\/\/doi.org\/10.1145\/304181.304229","journal-title":"SIGMOD Rec"},{"issue":"1","key":"924_CR28","first-page":"28","volume":"35","author":"F F\u00e4rber","year":"2012","unstructured":"F\u00e4rber, F., May, N., Lehner, W., Gro\u00dfe, P., M\u00fcller, I., Rauhe, H., Dees, J.: The SAP HANA database\u2014an architecture overview. IEEE Data Eng Bull 35(1), 28\u201333 (2012)","journal-title":"IEEE Data Eng Bull"},{"issue":"1\u20136","key":"924_CR29","doi-asserted-by":"publisher","first-page":"723","DOI":"10.1016\/S1389-1286(00)00061-X","volume":"33","author":"M Fernandez","year":"2000","unstructured":"Fernandez, M., Tan, W.C., Suciu, D.: Silkroute: trading between relations and xml. Comput. Netw. 33(1\u20136), 723\u2013745 (2000)","journal-title":"Comput. Netw."},{"key":"924_CR30","doi-asserted-by":"crossref","unstructured":"Fernandez, M., Morishima, A., Suciu, D.: Efficient evaluation of xml middle-ware queries. In: Proceedings of the 2001 ACM SIGMOD international conference on Management of data, pp. 103\u2013114 (2001)","DOI":"10.1145\/375663.375674"},{"key":"924_CR31","first-page":"3","volume":"22","author":"D Florescu","year":"1999","unstructured":"Florescu, D., Kossmann, D.: Storing and querying xml data using an RDMBS. IEEE Data Eng. Bullet. 22, 3 (1999)","journal-title":"IEEE Data Eng. Bullet."},{"key":"924_CR32","unstructured":"Fry, A., Chow, I.: Integrating PAW, a graphical analysis interface to Sybase (1993)"},{"key":"924_CR33","unstructured":"Google: (20214) GCS Documentation. https:\/\/cloud.google.com\/storage\/docs"},{"key":"924_CR34","doi-asserted-by":"crossref","unstructured":"Graur, D., Bruno, R., Alonso, G.: Specializing generic java data structures. In: Proceedings of the 18th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, pp. 45\u201353 (2021a)","DOI":"10.1145\/3475738.3480718"},{"key":"924_CR35","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1016\/j.procs.2021.03.079","volume":"184","author":"D Graur","year":"2021","unstructured":"Graur, D., Bruno, R., Bischoff, J., Rieser, M., Scherr, W., Hoefler, T., Alonso, G.: Hermes: enabling efficient large-scale simulation in matsim. Proc. Comput. Sci. 184, 635\u2013641 (2021)","journal-title":"Proc. Comput. Sci."},{"key":"924_CR36","doi-asserted-by":"publisher","unstructured":"Graur, D., M\u00fcller, I., Proffitt, M., Fourny, G., Watts, G.T., Alonso, G.: Benchmark Scripts for Evaluating Query Languages and Systems for High-Energy Physics Data. (2021). https:\/\/doi.org\/10.5281\/zenodo.5569049","DOI":"10.5281\/zenodo.5569049"},{"issue":"2","key":"924_CR37","doi-asserted-by":"publisher","first-page":"154","DOI":"10.14778\/3489496.3489498","volume":"15","author":"D Graur","year":"2021","unstructured":"Graur, D., M\u00fcller, I., Proffitt, M., Fourny, G., Watts, G.T., Alonso, G.: Evaluating query languages and systems for high-energy physics data. Proc VLDB Endow 15(2), 154\u2013168 (2021)","journal-title":"Proc VLDB Endow"},{"key":"924_CR38","unstructured":"Graur, D., Aymon, D., Kluser, D., Albrici, T., Thekkath, C.A., Klimovic, A.: Cachew: Machine learning input data processing as a service. In: 2022 USENIX Annual Technical Conference (USENIX ATC 22), pp 689\u2013706 (2022)"},{"key":"924_CR39","doi-asserted-by":"crossref","unstructured":"Graur, D., R\u00f6thlisberger, R., Jenny, A., Fourny, G., Drozdowski, F., Konigsmark, C., M\u00fcller, I., Alonso, G.: Addressing the nested data processing gap: Jsoniq queries on snowflake through snowpark. In: 40th IEEE International Conference on Data Engineering (ICDE 2024) (2024)","DOI":"10.1109\/ICDE60146.2024.00395"},{"key":"924_CR40","unstructured":"Graur, DO.: Abstractions for efficient machine learning and nested data processing. PhD thesis, ETH Zurich (2024)"},{"key":"924_CR41","doi-asserted-by":"publisher","unstructured":"Grossman, R., Qin, X., Valsamis, D., Xu, W., Day, C.T., Loken, S., MacFarlane, J.F., Quarrie, D., May, E., Lifka, D., Malon, D., Price, L.E., Baden, A., Cormell, L., Leibold, P., al, E.: Analyzing high energy physics data using databases: a case study. Scientific and Statistical Database Management - Proceedings of the International Working Conference (1994). https:\/\/doi.org\/10.1109\/SSDM.1994.336938","DOI":"10.1109\/SSDM.1994.336938"},{"key":"924_CR42","doi-asserted-by":"crossref","unstructured":"Grust, T., Sakr, S., Teubner, J.: Xquery on sql hosts. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB Endowment, VLDB \u201904, p. 252\u2013263 (2004)","DOI":"10.1016\/B978-012088469-8.50025-5"},{"key":"924_CR43","doi-asserted-by":"publisher","unstructured":"Guiraud, E., Naumann, A., Piparo, D.: TDataFrame: functional chains for ROOT data analyses. (2017). https:\/\/doi.org\/10.5281\/zenodo.260230","DOI":"10.5281\/zenodo.260230"},{"key":"924_CR44","doi-asserted-by":"publisher","unstructured":"Gutsche, O., Mandrichenko, I.: Striped Data Analysis Framework. EPJ Web of Conferences 245:06,042, (2020). https:\/\/doi.org\/10.1051\/EPJCONF\/202024506042","DOI":"10.1051\/EPJCONF\/202024506042"},{"key":"924_CR45","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/898\/7\/072012","author":"O Gutsche","year":"2017","unstructured":"Gutsche, O., Cremonesi, M., Elmer, P., Jayatilaka, B., Kowalkowski, J., Pivarski, J., Sehrish, S., Surez, C.M., Svyatkovskiy, A., Tran, N.: Big data in HEP: a comprehensive use case study. J. Phys. Conf. Series (2017). https:\/\/doi.org\/10.1088\/1742-6596\/898\/7\/072012","journal-title":"J. Phys. Conf. Series"},{"issue":"2","key":"924_CR46","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1089\/BIG.2013.0011","volume":"1","author":"M Hausenblas","year":"2013","unstructured":"Hausenblas, M., Nadeau, J.: Apache drill: interactive Ad-Hoc analysis at scale. Big Data 1(2), 100\u2013104 (2013). https:\/\/doi.org\/10.1089\/BIG.2013.0011","journal-title":"Big Data"},{"issue":"12","key":"924_CR47","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.14778\/2732977.2732986","volume":"7","author":"M Karpathiotakis","year":"2014","unstructured":"Karpathiotakis, M., Branco, M., Alagiannis, I., Ailamaki, A.: Adaptive query processing on RAW data. Proc VLDB Endow 7(12), 1119\u20131130 (2014). https:\/\/doi.org\/10.14778\/2732977.2732986","journal-title":"Proc VLDB Endow"},{"key":"924_CR48","doi-asserted-by":"crossref","unstructured":"Kernert, D., May, N., Hladik, M., Werner, K., Lehner, W.: From static to agile-interactive particle physics analysis in the sap hana db. In: DATA, pp 16\u201325 (2015)","DOI":"10.5220\/0005503700160025"},{"key":"924_CR49","doi-asserted-by":"publisher","unstructured":"Khristenko, V., Pivarski, J.: diana-hep\/spark-root: Apache Spark Data Source for ROOT File Format. (2017). https:\/\/doi.org\/10.5281\/zenodo.1034230","DOI":"10.5281\/zenodo.1034230"},{"key":"924_CR50","doi-asserted-by":"crossref","unstructured":"Krishnamurthy, R., Kaushik, R., Naughton, JF.: Xml-to-sql query translation literature: The state of the art and open problems. In: Bellahs\u00e8ne Z, Chaudhri AB, Rahm E, Rys M, Unland, R.: (eds) Database and XML Technologies, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 1\u201318, (2003)","DOI":"10.1007\/978-3-540-39429-7_1"},{"key":"924_CR51","doi-asserted-by":"crossref","unstructured":"Krishnamurthy, R., Kaushik, R., Naughton, JF.: Efficient xml-to-sql query translation: Where to add the intelligence? In: VLDB, pp. 144\u2013155, (2004)","DOI":"10.1016\/B978-012088469-8\/50016-4"},{"issue":"2","key":"924_CR52","doi-asserted-by":"publisher","first-page":"022","DOI":"10.1088\/1742-6596\/513\/2\/022022","volume":"513","author":"DM Limper","year":"2014","unstructured":"Limper, D.M.: An SQL-based approach to physics analysis. J. Phys. Conf. Series 513(2), 022\u2013022 (2014). https:\/\/doi.org\/10.1088\/1742-6596\/513\/2\/022022","journal-title":"J. Phys. Conf. Series"},{"key":"924_CR53","unstructured":"Malcolm, G.: Programming Microsoft SQL Server 2000 with XML. Microsoft Press, (2002)"},{"key":"924_CR54","doi-asserted-by":"publisher","unstructured":"Malon, D., Cranshaw, J., van Gemmeren, P., Zhang, Q.: Emerging Database Technologies and Their Applicability to High Energy Physics: A First Look at SciDB. Journal of Physics: Conference Series 331(4):042,016, (2011). https:\/\/doi.org\/10.1088\/1742-6596\/331\/4\/042016","DOI":"10.1088\/1742-6596\/331\/4\/042016"},{"key":"924_CR55","unstructured":"Malon, D.M., May, EN.: Critical Database Technologies for High Energy Physics. In: VLDB (1997)"},{"key":"924_CR56","doi-asserted-by":"publisher","unstructured":"Malon, D.M., May, E.N., Grossman, R.L., Day, C.T., Quarrie, DR.: Object Database Standards, Persistence Specifications, and Physics Data. In: CHEP, (1995). https:\/\/doi.org\/10.1142\/9789814447188_0058","DOI":"10.1142\/9789814447188_0058"},{"key":"924_CR57","doi-asserted-by":"publisher","unstructured":"Mami, M.N., Graux, D., Thakkar, H., Scerri, S., Auer, S., Lehmann, J.: (2019) The query translation landscape: a survey. https:\/\/doi.org\/10.48550\/ARXIV.1910.03118, https:\/\/arxiv.org\/abs\/1910.03118","DOI":"10.48550\/ARXIV.1910.03118"},{"key":"924_CR58","unstructured":"Manolescu, I., Florescu, D., Kossmann, D.: Answering xml queries on heterogeneous data sources. In: Vldb, vol\u00a01, pp. 241\u2013250 (2001)"},{"key":"924_CR59","doi-asserted-by":"crossref","unstructured":"Marstaller, J.: Comparative performance measures of relational and object-oriented databases using High Energy Physics data (1993)","DOI":"10.2172\/94015"},{"issue":"1\u20132","key":"924_CR60","doi-asserted-by":"publisher","first-page":"330","DOI":"10.14778\/1920841.1920886","volume":"3","author":"S Melnik","year":"2010","unstructured":"Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1\u20132), 330\u2013339 (2010)","journal-title":"Proc VLDB Endow"},{"key":"924_CR61","unstructured":"Melo, A., Pivarski, J.: spark-root\/laurelin: Allows reading ROOT TTrees into Apache Spark as DataFrames. (2020). https:\/\/github.com\/spark-root\/laurelin"},{"issue":"3","key":"924_CR62","doi-asserted-by":"publisher","first-page":"032","DOI":"10.1088\/1742-6596\/1085\/3\/032055","volume":"1085","author":"M Meoni","year":"2018","unstructured":"Meoni, M., Kuznetsov, V., Menichetti, L., Rum\u0161evi\u010dius, J., Boccali, T., Bonacorsi, D.: Exploiting apache spark platform for CMS computing analytics. J. Phys. Conf. Series 1085(3), 032\u2013055 (2018). https:\/\/doi.org\/10.1088\/1742-6596\/1085\/3\/032055","journal-title":"J. Phys. Conf. Series"},{"key":"924_CR63","unstructured":"Microsoft: (2021). SQL Server technical documentation - SQL Server | Microsoft Docs. https:\/\/docs.microsoft.com\/en-us\/sql\/sql-server"},{"key":"924_CR64","unstructured":"Microsoft Azure: (2021). Azure Synapse Analytics. https:\/\/azure.microsoft.com\/en-us\/services\/synapse-analytics\/"},{"issue":"4","key":"924_CR65","doi-asserted-by":"publisher","first-page":"498","DOI":"10.14778\/3436905.3436910","volume":"14","author":"I M\u00fcller","year":"2020","unstructured":"M\u00fcller, I., Fourny, G., Irimescu, S., Cikis, C.B., Alonso, G.: Rumble: data independence for large messy data sets. Proc VLDB Endow 14(4), 498\u2013506 (2020). https:\/\/doi.org\/10.14778\/3436905.3436910","journal-title":"Proc VLDB Endow"},{"key":"924_CR66","doi-asserted-by":"crossref","unstructured":"Murthy, R., Banerjee, S.: Xml schemas in oracle xml db. In: Proceedings 2003 VLDB Conference, Elsevier, pp. 1009\u20131018, (2003)","DOI":"10.1016\/B978-012722442-8\/50094-X"},{"key":"924_CR67","unstructured":"Musin, I.: adjust\/parquet_fdw: Parquet foreign data wrapper for PostgreSQL. (2021). https:\/\/github.com\/adjust\/parquet_fdw"},{"key":"924_CR68","unstructured":"Nowak, M., Kunszt, Z., Geppert, D., Paoli, S., D\u00fcllmann, D.: Object Persistency for HEP data using an Object-Relational Database. (2001). http:\/\/cds.cern.ch\/record\/518801"},{"key":"924_CR69","unstructured":"Ong, KW., Papakonstantinou, Y., Vernoux, R.: The SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases. (2014). arXiv:1405.3631v4"},{"key":"924_CR70","unstructured":"Oracle Corporation: (2021). MySQL. https:\/\/www.mysql.com\/"},{"key":"924_CR71","unstructured":"Orcale: (2021). HeatWave. https:\/\/www.oracle.com\/mysql\/heatwave\/"},{"key":"924_CR72","unstructured":"Pivarski, J.: Survey of data formats, conversion tools. In: HEP Analysis Ecosystem Workshop, (2013). https:\/\/indico.cern.ch\/event\/613842\/contributions\/2585787\/"},{"key":"924_CR73","doi-asserted-by":"publisher","unstructured":"Pivarski, J., Lange, D., Jatuphattharachat, T.: Toward real-time data query systems in HEP. Journal of Physics: Conference Series 1085(3):032,044, (2018). https:\/\/doi.org\/10.1088\/1742-6596\/1085\/3\/032044","DOI":"10.1088\/1742-6596\/1085\/3\/032044"},{"key":"924_CR74","doi-asserted-by":"publisher","unstructured":"Pompili, A., ADF.: GPUs for statistical data analysis in HEP: a performance study of goofit on GPUs vs. RooFit on CPUs. Journal of Physics: Conference Series (2016). https:\/\/doi.org\/10.1088\/1742-6596\/762\/1\/012044","DOI":"10.1088\/1742-6596\/762\/1\/012044"},{"key":"924_CR75","doi-asserted-by":"crossref","unstructured":"Popa, L., Deutsch, A., Sahuguet, A., Tannen, V.: A chase too far? In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 273\u2013284, (2000)","DOI":"10.1145\/342009.335421"},{"key":"924_CR76","unstructured":"Postgres: (2025). Postgres JSON Types. https:\/\/www.postgresql.org\/docs\/current\/datatype-json.html"},{"key":"924_CR77","doi-asserted-by":"publisher","unstructured":"Proffitt, M., M\u00fcller, I., Graur, D., Adamec, M., David, P., Guiraud, E., Binet, S.: iris-hep\/adl-benchmarks-index: ADL Functionality Benchmarks Index. (2021). https:\/\/doi.org\/10.5281\/zenodo.5131287, https:\/\/github.com\/iris-hep\/adl-benchmarks-index\/","DOI":"10.5281\/zenodo.5131287"},{"key":"924_CR78","doi-asserted-by":"crossref","unstructured":"Rellermeyer, JS., Khorasani, S.O., Graur, D., Parthasarathy, A.: The coming age of pervasive data processing. In: 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC), IEEE, pp. 58\u201365, (2019)","DOI":"10.1109\/ISPDC.2019.00011"},{"key":"924_CR79","unstructured":"Robie, J., Brantner, M., Florescu, D., Fourny, G., Westmann, T.: JSONiq: XQuery for JSON. JSON for XQuery pp. 63\u201372, (2012)"},{"key":"924_CR80","doi-asserted-by":"publisher","unstructured":"Rodrigues, E.: The Scikit-HEP Project. EPJ Web of Conferences 214, (2019). https:\/\/doi.org\/10.1051\/epjconf\/201921406005","DOI":"10.1051\/epjconf\/201921406005"},{"key":"924_CR81","unstructured":"Sato, K.: An inside look at Google BigQuery. (2012). https:\/\/cloud.google.com\/files\/BigQueryTechnicalWP.pdf, white paper"},{"key":"924_CR82","doi-asserted-by":"crossref","unstructured":"Schmidt, A., Kersten, M., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of xml documents. In: The World Wide Web and Databases: Third International Workshop WebDB 2000 Dallas, TX, USA, May 18\u201319, 2000 Selected Papers 3, Springer, pp. 137\u2013150, (2001)","DOI":"10.1007\/3-540-45271-0_9"},{"key":"924_CR83","doi-asserted-by":"publisher","unstructured":"Sehrish, S., Kowalkowski, J., Paterno, M.: Spark and HPC for high energy physics data analyses. International Parallel and Distributed Processing Symposium Workshops (2017). https:\/\/doi.org\/10.1109\/IPDPSW.2017.112","DOI":"10.1109\/IPDPSW.2017.112"},{"key":"924_CR84","unstructured":"Services, AW.: Amazon Athena. (2018). https:\/\/aws.amazon.com\/athena\/"},{"key":"924_CR85","unstructured":"Services, AW.: Amazon redshift - cloud data warehouse. (2021). https:\/\/aws.amazon.com\/redshift\/"},{"key":"924_CR86","doi-asserted-by":"crossref","unstructured":"Sethi, R., Traverso, M., Sundstrom, D., Phillips, D., Xie, W., Sun, Y., Yegitbasi, N., Jin, H., Hwang, E., Shingte, N., et\u00a0al.: Presto: SQL on everything. In: ICDE (2019)","DOI":"10.1109\/ICDE.2019.00196"},{"issue":"3","key":"924_CR87","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1145\/1084805.1084808","volume":"34","author":"S Shankar","year":"2005","unstructured":"Shankar, S., Kini, A., DeWitt, D.J., Naughton, J.: Integrating databases and workflow systems. SIGMOD Rec 34(3), 5\u201311 (2005). https:\/\/doi.org\/10.1145\/1084805.1084808","journal-title":"SIGMOD Rec"},{"key":"924_CR88","unstructured":"Shanmugasundaram, J., Kiernan, J., Shekita, E.J., Fan, C., Funderburk, J.: Querying xml views of relational data. In: VLDB, vol\u00a01, pp. 261\u2013270, (2001)"},{"key":"924_CR89","doi-asserted-by":"crossref","unstructured":"Shiers, J.: Databases in high energy physics: A critical review. From the Web to the Grid and Beyond: Computing Paradigms Driven by High-Energy Physics pp. 225\u2013266, (2012)","DOI":"10.1007\/978-3-642-23157-5_9"},{"key":"924_CR90","unstructured":"Snowflake: Querying Semi-structured Data. (2024). https:\/\/docs.snowflake.com\/en\/user-guide\/querying-semistructured"},{"key":"924_CR91","doi-asserted-by":"publisher","unstructured":"Stewart, R.J., Trinder, P.W., Loidl, HW.: Comparing High Level MapReduce Query Languages. In: APPT, vol 6965 LNCS, (2011). https:\/\/doi.org\/10.1007\/978-3-642-24151-2_5","DOI":"10.1007\/978-3-642-24151-2_5"},{"issue":"2","key":"924_CR92","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1145\/16856.16888","volume":"15","author":"M Stonebraker","year":"1986","unstructured":"Stonebraker, M., Rowe, L.A.: The design of POSTGRES. ACM SIGMOD Rec. 15(2), 340\u2013355 (1986). https:\/\/doi.org\/10.1145\/16856.16888","journal-title":"ACM SIGMOD Rec."},{"issue":"2","key":"924_CR93","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1145\/1379387.1379407","volume":"37","author":"AS Szalay","year":"2008","unstructured":"Szalay, A.S.: The sloan digital sky survey and beyond. SIGMOD Record 37(2), 61\u201366 (2008). https:\/\/doi.org\/10.1145\/1379387.1379407","journal-title":"SIGMOD Record"},{"key":"924_CR94","doi-asserted-by":"crossref","unstructured":"Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 204\u2013215, (2002)","DOI":"10.1145\/564691.564715"},{"key":"924_CR95","unstructured":"The PartiQL Specification Committee: (2019) PartiQL Specification. https:\/\/partiql.org\/assets\/PartiQL-Specification.pdf"},{"key":"924_CR96","unstructured":"The PostgreSQL Global Development Group: (2021) PostgreSQL: Documentation: 8.16. Composite Types. https:\/\/www.postgresql.org\/docs\/13\/rowtypes.html#ROWTYPES-ACCESSING"},{"key":"924_CR97","unstructured":"Trino: (2024) Trino User Defined Functions. https:\/\/trino.io\/docs\/current\/sql\/create-function.html"},{"issue":"1","key":"924_CR98","doi-asserted-by":"publisher","first-page":"012","DOI":"10.1088\/1742-6596\/608\/1\/012030","volume":"608","author":"V Vassilev","year":"2015","unstructured":"Vassilev, V.: Native language integrated queries with CppLINQ in C++. J. Phys. Conf. Series 608(1), 012\u2013030 (2015). https:\/\/doi.org\/10.1088\/1742-6596\/608\/1\/012030","journal-title":"J. Phys. Conf. Series"},{"key":"924_CR99","doi-asserted-by":"publisher","unstructured":"Verbitski, A., Gupta, A., Saha, D., Brahmadesam, M., Gupta, K., Mittal, R., Krishnamurthy, S., Maurice, S., Kharatishvili, T., Bao, X.: Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. In: SIGMOD, (2017) https:\/\/doi.org\/10.1145\/3035918.3056101","DOI":"10.1145\/3035918.3056101"},{"issue":"1","key":"924_CR100","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1145\/383034.383038","volume":"1","author":"M Yoshikawa","year":"2001","unstructured":"Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: Xrel: a path-based approach to storage and retrieval of xml documents using relational databases. ACM Trans. Internet Technol. (TOIT) 1(1), 110\u2013141 (2001)","journal-title":"ACM Trans. Internet Technol. (TOIT)"},{"key":"924_CR101","unstructured":"Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster Computing with Working Sets. In: HotCloud, p.\u00a010 (2010)"},{"key":"924_CR102","unstructured":"van\u00a0der Zander, B.: Xidel repository. (2021). https:\/\/github.com\/benibela\/xidel"},{"key":"924_CR103","unstructured":"Zorba, (2021). Zorba documentation. http:\/\/www.zorba.io\/documentation\/latest"}],"container-title":["The VLDB Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-025-00924-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00778-025-00924-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-025-00924-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T00:35:14Z","timestamp":1757205314000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00778-025-00924-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,29]]},"references-count":103,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["924"],"URL":"https:\/\/doi.org\/10.1007\/s00778-025-00924-w","relation":{},"ISSN":["1066-8888","0949-877X"],"issn-type":[{"type":"print","value":"1066-8888"},{"type":"electronic","value":"0949-877X"}],"subject":[],"published":{"date-parts":[[2025,6,29]]},"assertion":[{"value":"30 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 February 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 June 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"55"}}