{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:13:18Z","timestamp":1750219998601,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,22]],"date-time":"2022-08-22T00:00:00Z","timestamp":1661126400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,8,22]]},"DOI":"10.1145\/3548785.3548797","type":"proceedings-article","created":{"date-parts":[[2022,9,13]],"date-time":"2022-09-13T16:08:13Z","timestamp":1663085293000},"page":"75-83","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Formal Framework for Data Lakes Based on Category Theory"],"prefix":"10.1145","author":[{"given":"Alexis","family":"Guyot","sequence":"first","affiliation":[{"name":"Laboratoire d'Informatique de Bourgogne, Universite de Bourgogne, France"}]},{"given":"Annabelle","family":"Gillet","sequence":"additional","affiliation":[{"name":"Laboratoire d'Informatique de Bourgogne, Universite de Bourgogne, France"}]},{"given":"Eric","family":"Leclercq","sequence":"additional","affiliation":[{"name":"Laboratoire d'Informatique de Bourgogne, Universite de Bourgogne, France"}]},{"given":"Nadine","family":"Cullot","sequence":"additional","affiliation":[{"name":"Laboratoire d'Informatique de Bourgogne, Universite de Bourgogne, France"}]}],"member":"320","published-online":{"date-parts":[[2022,9,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2016.0033"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of CIDR.","author":"Armbrust Michael","year":"2021","unstructured":"Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In Proceedings of CIDR."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3236230"},{"key":"e_1_3_2_1_5_1","volume-title":"Can practitioners neglect theory and theoreticians neglect practice?Computer 44, 10","author":"Broy Manfred","year":"2011","unstructured":"Manfred Broy. 2011. Can practitioners neglect theory and theoreticians neglect practice?Computer 44, 10 (2011), 19\u201324."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10270-011-0207-y"},{"key":"e_1_3_2_1_7_1","unstructured":"Isabel Cafezeiro and Edward\u00a0Hermann Haeusler. 2007. Semantic Interoperability via Category Theory.. In ER (Tutorials Posters Panels & Industrial Contributions). Citeseer 197\u2013202."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/357980.358007"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Julia Couto Olimar\u00a0Teixeira Borges Duncan\u00a0D Ruiz Sabrina Marczak and Rafael Prikladnicki. 2019. A Mapping Study about Data Lakes: An Improved Definition and Possible Architectures.. In SEKE. 453\u2013578.","DOI":"10.18293\/SEKE2019-129"},{"key":"e_1_3_2_1_10_1","unstructured":"Zhamak Dehghani. 2019. How to move beyond a monolithic data lake to a distributed data mesh. Martin Fowler\u2019s Blog(2019)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3487664.3487783"},{"key":"e_1_3_2_1_12_1","unstructured":"Zinovy Diskin. 1997. The Arrow Logic of Metadata Environment: A Formalised Graph-Based Framework for Structuring Metadata Repositories. (1997)."},{"key":"e_1_3_2_1_13_1","volume-title":"Oct","author":"Dixon James","year":"2010","unstructured":"James Dixon. 2010. Pentaho, Hadoop, and data lakes. blog, Oct (2010)."},{"key":"e_1_3_2_1_14_1","volume-title":"Applications of category theory to the area of algebraic specification in computer science. Applied categorical structures 6, 1","author":"Ehrig Hartmut","year":"1998","unstructured":"Hartmut Ehrig, Martin Gro\u00dfe-Rhode, and Uwe Wolter. 1998. Applications of category theory to the area of algebraic specification in computer science. Applied categorical structures 6, 1 (1998), 1\u201335."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9947-1945-0013131-6"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899391"},{"key":"e_1_3_2_1_17_1","volume-title":"2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1001\u20131012","author":"Fernandez Raul\u00a0Castro","year":"2018","unstructured":"Raul\u00a0Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A data discovery system. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1001\u20131012."},{"key":"e_1_3_2_1_18_1","volume-title":"Metamodels for data quality description. Data Quality in Geographic Information-From Error to Uncertainty 192","author":"Frank U","year":"1998","unstructured":"Andrew\u00a0U Frank. 1998. Metamodels for data quality description. Data Quality in Geographic Information-From Error to Uncertainty 192 (1998)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183746"},{"key":"e_1_3_2_1_20_1","unstructured":"I. Gartner. 2014. Gartner Says Beware of the Data Lake Fallacy.https:\/\/www.gartner.com\/newsroom\/id\/2809117."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-79382-1_23"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/645683.664569"},{"volume-title":"The Functional Approach to Data Management","author":"Grust Torsten","key":"e_1_3_2_1_23_1","unstructured":"Torsten Grust. 2004. Monad comprehensions: a versatile representation for queries. In The Functional Approach to Data Management. Springer, 288\u2013311."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899389"},{"key":"e_1_3_2_1_25_1","unstructured":"Rihan Hai Christoph Quix and Matthias Jarke. 2021. Data lake concept and systems: a survey. arXiv preprint arXiv:2106.09592(2021)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-33223-5_19"},{"key":"e_1_3_2_1_27_1","first-page":"5","article-title":"Managing Google\u2019s data lake: an overview of the Goods system.IEEE Data","volume":"39","author":"Halevy Y","year":"2016","unstructured":"Alon\u00a0Y Halevy, Flip Korn, Natalya\u00a0Fridman Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, and Steven\u00a0Euijong Whang. 2016. Managing Google\u2019s data lake: an overview of the Goods system.IEEE Data Eng. Bull. 39, 3 (2016), 5\u201314.","journal-title":"Eng. Bull."},{"key":"e_1_3_2_1_28_1","volume-title":"Categorical Management of Multi-Model Data. In 25th International Database Engineering & Applications Symposium. 134\u2013140","author":"Holubova Irena","year":"2021","unstructured":"Irena Holubova, Pavel Contos, and Martin Svoboda. 2021. Categorical Management of Multi-Model Data. In 25th International Database Engineering & Applications Symposium. 134\u2013140."},{"key":"e_1_3_2_1_29_1","volume-title":"Where\u2019s the theory for software engineering?IEEE software 29, 5","author":"Johnson Pontus","year":"2012","unstructured":"Pontus Johnson, Mathias Ekstedt, and Ivar Jacobson. 2012. Where\u2019s the theory for software engineering?IEEE software 29, 5 (2012), 96\u201396."},{"key":"e_1_3_2_1_31_1","volume-title":"ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics. In 25th International Database Engineering & Applications Symposium. 252\u2013262","author":"Liu Pengfei","year":"2021","unstructured":"Pengfei Liu, Sabine Loudcher, J\u00e9r\u00f4me Darmont, and Camille No\u00fbs. 2021. ArchaeoDAL: A Data Lake for Archaeological Data Management and Analytics. In 25th International Database Engineering & Applications Symposium. 252\u2013262."},{"volume-title":"Heterogeneous Data Management, Polystores, and Analytics for Healthcare","author":"Liu Zhen\u00a0Hua","key":"e_1_3_2_1_32_1","unstructured":"Zhen\u00a0Hua Liu, Jiaheng Lu, Dieter Gawlick, Heli Helskyaho, Gregory Pogossiants, and Zhe Wu. 2018. Multi-model database management systems-a look forward. In Heterogeneous Data Management, Polystores, and Analytics for Healthcare. Springer, 16\u201329."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-91563-0_29"},{"key":"e_1_3_2_1_34_1","unstructured":"Jacob McPadden Thomas\u00a0JS Durant Dustin\u00a0R Bunch Andreas Coppi Nathan Price Kris Rodgerson Charles\u00a0J Torre\u00a0Jr William Byron H\u00a0Patrick Young Allen\u00a0L Hsiao 2018. A scalable data science platform for healthcare and precision medicine research. arXiv preprint arXiv:1808.04849(2018)."},{"key":"e_1_3_2_1_35_1","volume-title":"Category Theory Framework for Variability Models with Non-functional Requirements. In International Conference on Advanced Information Systems Engineering. Springer, 397\u2013413","author":"Munoz Daniel-Jesus","year":"2021","unstructured":"Daniel-Jesus Munoz, Dilian Gurov, Monica Pinto, and Lidia Fuentes. 2021. Category Theory Framework for Variability Models with Non-functional Requirements. In International Conference on Advanced Information Systems Engineering. Springer, 397\u2013413."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2858256"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3216122.3216130"},{"key":"e_1_3_2_1_38_1","volume-title":"Metadata extraction and management in data lakes with GEMMS. Complex Systems Informatics and Modeling Quarterly9","author":"Quix Christoph","year":"2016","unstructured":"Christoph Quix, Rihan Hai, and Ivan Vatov. 2016. Metadata extraction and management in data lakes with GEMMS. Complex Systems Informatics and Modeling Quarterly9 (2016), 67\u201383."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056100"},{"volume-title":"Service research and innovation","author":"Rangarajan Sarathkumar","key":"e_1_3_2_1_40_1","unstructured":"Sarathkumar Rangarajan, Huai Liu, Hua Wang, and Chuan-Long Wang. 2015. Scalable architecture for personalized healthcare service recommendation using big data lake. In Service research and innovation. Springer, 65\u201379."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-27615-7_23"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30278-8_5"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.3390\/s22072733"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-020-00608-7"},{"key":"e_1_3_2_1_45_1","unstructured":"Etienne Scholly Pegdwend\u00e9 Sawadogo Pengfei Liu Javier\u00a0Alfonso Espinosa-Oviedo C\u00e9cile Favre Sabine Loudcher J\u00e9r\u00f4me Darmont and Camille No\u00fbs. 2021. Coining goldMEDAL: a new contribution to data lake generic metadata modeling. arXiv preprint arXiv:2103.13155(2021)."},{"key":"e_1_3_2_1_46_1","unstructured":"Dan Shiebler Bruno Gavranovi\u0107 and Paul Wilson. 2021. Category theory in machine learning. arXiv preprint arXiv:2106.07032(2021)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2012.05.001"},{"key":"e_1_3_2_1_49_1","volume-title":"Database queries and constraints via lifting problems. Mathematical structures in computer science 24, 6","author":"Spivak I","year":"2014","unstructured":"David\u00a0I Spivak. 2014. Database queries and constraints via lifting problems. Mathematical structures in computer science 24, 6 (2014)."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-018-0132-9"},{"key":"e_1_3_2_1_51_1","volume-title":"Texts","author":"Toth David","year":"2008","unstructured":"David Toth. 2008. Database engineering from the category theory viewpoint. Databases, Texts (2008), 37."},{"volume-title":"Heterogeneous Data Management, Polystores, and Analytics for Healthcare","author":"Uotila Valter","key":"e_1_3_2_1_52_1","unstructured":"Valter Uotila and Jiaheng Lu. 2021. A Formal Category Theoretical Framework for Multi-model Data Transformations. In Heterogeneous Data Management, Polystores, and Analytics for Healthcare. Springer, 14\u201328."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/11784180_30"},{"key":"e_1_3_2_1_54_1","volume-title":"Understanding visualization: A formal approach using category theory and semiotics","author":"Vickers Paul","year":"2012","unstructured":"Paul Vickers, Joe Faith, and Nick Rossiter. 2012. Understanding visualization: A formal approach using category theory and semiotics. IEEE transactions on visualization and computer graphics 19, 6(2012), 1048\u20131061."},{"key":"e_1_3_2_1_55_1","volume-title":"Small and Big Data. In 25th International Database Engineering & Applications Symposium. 94\u2013102","author":"Zhao Yan","year":"2021","unstructured":"Yan Zhao, Imen Megdiche, Franck Ravat, and Vincent-nam Dang. 2021. A Zone-Based Data Lake Architecture for IoT, Small and Big Data. In 25th International Database Engineering & Applications Symposium. 94\u2013102."}],"event":{"name":"IDEAS'22: International Database Engineered Applications Symposium","acronym":"IDEAS'22","location":"Budapest Hungary"},"container-title":["Proceedings of the 26th International Database Engineered Applications Symposium"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3548785.3548797","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3548785.3548797","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:50:53Z","timestamp":1750182653000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3548785.3548797"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,22]]},"references-count":53,"alternative-id":["10.1145\/3548785.3548797","10.1145\/3548785"],"URL":"https:\/\/doi.org\/10.1145\/3548785.3548797","relation":{},"subject":[],"published":{"date-parts":[[2022,8,22]]},"assertion":[{"value":"2022-09-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}