{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T03:22:00Z","timestamp":1768533720711,"version":"3.49.0"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T00:00:00Z","timestamp":1658880000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T00:00:00Z","timestamp":1658880000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research CouncilEuropean Research Council","doi-asserted-by":"publisher","award":["682315"],"award-info":[{"award-number":["682315"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100011403","name":"GCHQ","doi-asserted-by":"publisher","award":["VeTSS grant"],"award-info":[{"award-number":["VeTSS grant"]}],"id":[{"id":"10.13039\/100011403","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Autom Reasoning"],"published-print":{"date-parts":[[2022,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>SQL is the world\u2019s most popular declarative language, forming the basis of the multi-billion-dollar database industry. Although SQL has been standardized, the full standard is based on ambiguous natural language rather than formal specification. Commercial SQL implementations interpret the standard in different ways, so that, given the same input data, the same query can yield different results depending on the SQL system it is run on. Even for a particular system, mechanically checked formalization of all widely-used features of SQL remains an open problem. The lack of a well-understood formal semantics makes it very difficult to validate the soundness of database implementations. Although formal semantics for fragments of SQL were designed in the past, they usually did not support set and bag operations, lateral joins, nested subqueries, and, crucially, null values. Null values complicate SQL\u2019s semantics in profound ways analogous to null pointers or side-effects in other programming languages. Since certain SQL queries are equivalent in the absence of null values, but produce different results when applied to tables containing incomplete data, semantics which ignore null values are able to prove query equivalences that are unsound in realistic databases. A formal semantics of SQL supporting all the aforementioned features was only proposed recently. In this paper, we report about our mechanization of SQL semantics covering set\/bag operations, lateral joins, nested subqueries, and nulls, written in the Coq proof assistant, and describe the validation of key metatheoretic properties. Additionally, we are able to use the same framework to formalize the semantics of a flat relational calculus (with null values), and show a certified translation of its normal forms into SQL.<\/jats:p>","DOI":"10.1007\/s10817-022-09632-4","type":"journal-article","created":{"date-parts":[[2022,7,27]],"date-time":"2022-07-27T17:16:02Z","timestamp":1658942162000},"page":"989-1030","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Formalization of SQL with Nulls"],"prefix":"10.1007","volume":"66","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2361-8538","authenticated-orcid":false,"given":"Wilmer","family":"Ricciotti","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1307-9286","authenticated-orcid":false,"given":"James","family":"Cheney","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,27]]},"reference":[{"key":"9632_CR1","doi-asserted-by":"publisher","unstructured":"Auerbach, J.S., Hirzel, M., Mandel, L., Shinnar, A., Sim\u00e9on, J.: Prototyping a query compiler using Coq (experience report). Proc. ACM Program. Lang. 1(ICFP), 9:1\u20139:15 (2017). https:\/\/doi.org\/10.1145\/3110253","DOI":"10.1145\/3110253"},{"key":"9632_CR2","doi-asserted-by":"publisher","unstructured":"Benzaken, V., Contejean, E.: A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra. In: A.\u00a0Mahboubi, M.O. Myreen (eds.) Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Cascais, Portugal, January 14\u201315, 2019, pp. 249\u2013261. ACM (2019). https:\/\/doi.org\/10.1145\/3293880.3294107","DOI":"10.1145\/3293880.3294107"},{"key":"9632_CR3","doi-asserted-by":"publisher","unstructured":"Benzaken, V., Contejean, E., Dumbrava, S.: A Coq formalization of the relational data model. In: Programming Languages and Systems\u201423rd European Symposium on Programming, ESOP 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5\u201313, 2014, Proceedings, pp. 189\u2013208 (2014). https:\/\/doi.org\/10.1007\/978-3-642-54833-8_11","DOI":"10.1007\/978-3-642-54833-8_11"},{"issue":"1","key":"9632_CR4","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1145\/181550.181564","volume":"23","author":"P Buneman","year":"1994","unstructured":"Buneman, P., Libkin, L., Suciu, D., Tannen, V., Wong, L.: Comprehension syntax. SIGMOD Rec. 23(1), 87\u201396 (1994). https:\/\/doi.org\/10.1145\/181550.181564","journal-title":"SIGMOD Rec."},{"key":"9632_CR5","doi-asserted-by":"publisher","unstructured":"Buneman, P., Naqvi, S., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comput. Sci. 149(1) (1995). https:\/\/doi.org\/10.1016\/0304-3975(95)00024-Q","DOI":"10.1016\/0304-3975(95)00024-Q"},{"key":"9632_CR6","doi-asserted-by":"publisher","unstructured":"Chu, S., Weitz, K., Cheung, A., Suciu, D.: HoTTSQL: Proving query rewrites with univalent SQL semantics. In: PLDI, pp. 510\u2013524. ACM (2017). https:\/\/doi.org\/10.1145\/3062341.3062348","DOI":"10.1145\/3062341.3062348"},{"issue":"4","key":"9632_CR7","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1145\/320107.320109","volume":"4","author":"EF Codd","year":"1979","unstructured":"Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397\u2013434 (1979). https:\/\/doi.org\/10.1145\/320107.320109","journal-title":"ACM Trans. Database Syst."},{"key":"9632_CR8","doi-asserted-by":"publisher","unstructured":"Cooper, E., Lindley, S., Wadler, P., Yallop, J.: Links: web programming without tiers. In: FMCO (2007). https:\/\/doi.org\/10.1007\/978-3-540-74792-5_12","DOI":"10.1007\/978-3-540-74792-5_12"},{"key":"9632_CR9","unstructured":"Franconi, E., Tessaris, S.: On the logic of SQL nulls. In: Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management, Ouro Preto, Brazil, June 27\u201330, 2012, pp. 114\u2013128 (2012). http:\/\/ceur-ws.org\/Vol-866\/paper8.pdf"},{"key":"9632_CR10","doi-asserted-by":"publisher","unstructured":"Ganski, R.A., Wong, H.K.T.: Optimization of nested SQL queries revisited. In: SIGMOD, pp. 23\u201333. ACM, New York, NY, USA (1987). https:\/\/doi.org\/10.1145\/38713.38723","DOI":"10.1145\/38713.38723"},{"issue":"3","key":"9632_CR11","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1145\/1462571.1462575","volume":"37","author":"J Grant","year":"2008","unstructured":"Grant, J.: Null values in SQL. SIGMOD Rec. 37(3), 23\u201325 (2008). https:\/\/doi.org\/10.1145\/1462571.1462575","journal-title":"SIGMOD Rec."},{"key":"9632_CR12","doi-asserted-by":"publisher","unstructured":"Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31\u201340. ACM (2007). https:\/\/doi.org\/10.1145\/1265530.1265535","DOI":"10.1145\/1265530.1265535"},{"issue":"1","key":"9632_CR13","doi-asserted-by":"publisher","first-page":"27","DOI":"10.14778\/3151113.3151116","volume":"11","author":"Paolo Guagliardo","year":"2017","unstructured":"Guagliardo, Paolo, Libkin, Leonid: A formal semantics of SQL queries, its validation, and applications. Proc. VLDB Endow. 11(1), 27\u201339 (2017). https:\/\/doi.org\/10.14778\/3151113.3151116","journal-title":"Proc. VLDB Endow."},{"issue":"3","key":"9632_CR14","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1145\/319732.319745","volume":"7","author":"W Kim","year":"1982","unstructured":"Kim, W.: On optimizing an SQL-like nested query. ACM Trans. Database Syst. 7(3), 443\u2013469 (1982). https:\/\/doi.org\/10.1145\/319732.319745","journal-title":"ACM Trans. Database Syst."},{"key":"9632_CR15","doi-asserted-by":"crossref","unstructured":"Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In: 33rd ACM symposium on Principles of Programming Languages, pp. 42\u201354. ACM Press (2006). http:\/\/xavierleroy.org\/publi\/compiler-certif.pdf","DOI":"10.1145\/1111320.1111042"},{"key":"9632_CR16","doi-asserted-by":"publisher","unstructured":"Libkin, L.: Incomplete data: what went wrong, and how to fix it. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS\u201914, Snowbird, UT, USA, June 22\u201327, 2014, pp. 1\u201313 (2014). https:\/\/doi.org\/10.1145\/2594538.2594561. http:\/\/doi.acm.org\/10.1145\/2594538.2594561","DOI":"10.1145\/2594538.2594561"},{"key":"9632_CR17","doi-asserted-by":"publisher","unstructured":"Libkin, L.: SQL\u2019s three-valued logic and certain answers. ACM Trans. Database Syst. 41(1), 1:1\u20131:28 (2016). https:\/\/doi.org\/10.1145\/2877206","DOI":"10.1145\/2877206"},{"key":"9632_CR18","doi-asserted-by":"crossref","unstructured":"Malecha, J.G., Morrisett, G., Shinnar, A., Wisnesky, R.: Toward a verified relational database management system. In: POPL, pp. 237\u2013248 (2010)","DOI":"10.1145\/1707801.1706329"},{"key":"9632_CR19","doi-asserted-by":"publisher","unstructured":"Ricciotti, W.: Binding structures as an abstract data type. In: Programming Languages and Systems\u201424th European Symposium on Programming, ESOP 2015, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11\u201318, 2015. Proceedings, pp. 762\u2013786 (2015). https:\/\/doi.org\/10.1007\/978-3-662-46669-8_31","DOI":"10.1007\/978-3-662-46669-8_31"},{"key":"9632_CR20","doi-asserted-by":"publisher","unstructured":"Ricciotti, W., Cheney, J.: Mixing set and bag semantics. In: DBPL, pp. 70\u201373 (2019). https:\/\/doi.org\/10.1145\/3315507.3330202","DOI":"10.1145\/3315507.3330202"},{"key":"9632_CR21","doi-asserted-by":"publisher","unstructured":"Ricciotti, W., Cheney, J.: Strongly Normalizing Higher-Order Relational Queries. In: Z.M. Ariola (ed.) 5th International Conference on Formal Structures for Computation and Deduction (FSCD 2020), Leibniz International Proceedings in Informatics (LIPIcs), vol. 167, pp. 28:1\u201328:22. Schloss Dagstuhl\u2013Leibniz-Zentrum f\u00fcr Informatik, Dagstuhl, Germany (2020). https:\/\/doi.org\/10.4230\/LIPIcs.FSCD.2020.28. https:\/\/drops.dagstuhl.de\/opus\/volltexte\/2020\/12350","DOI":"10.4230\/LIPIcs.FSCD.2020.28"},{"key":"9632_CR22","doi-asserted-by":"publisher","unstructured":"Ricciotti, W., Cheney, J.: Query lifting: Language-integrated query for heterogeneous nested collections. In: Programming Languages and Systems (ESOP 2021). Lecture Notes in Computer Science, pp. 579\u2013606. Springer International Publishing (2021). https:\/\/doi.org\/10.1007\/978-3-030-72019-3_21","DOI":"10.1007\/978-3-030-72019-3_21"},{"issue":"4","key":"9632_CR23","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1145\/1361348.1361350","volume":"36","author":"C Rubinson","year":"2007","unstructured":"Rubinson, C.: Nulls, three-valued logic, and ambiguity in SQL: critiquing Date\u2019s critique. SIGMOD Rec. 36(4), 13\u201317 (2007). https:\/\/doi.org\/10.1145\/1361348.1361350","journal-title":"SIGMOD Rec."},{"key":"9632_CR24","doi-asserted-by":"crossref","unstructured":"van\u00a0der Meyden, R.: Logical approaches to incomplete information: a survey. In: J.\u00a0Chomicki, G.\u00a0Saake (eds.) Logics for Databases and Information Systems, pp. 307\u2013356. Kluwer (1998)","DOI":"10.1007\/978-1-4615-5643-5_10"},{"key":"9632_CR25","doi-asserted-by":"publisher","unstructured":"Wong, L.: Normal forms and conservative extension properties for query languages over collection types. J. Comput. Syst. Sci. 52(3) (1996). https:\/\/doi.org\/10.1006\/jcss.1996.0037","DOI":"10.1006\/jcss.1996.0037"},{"key":"9632_CR26","doi-asserted-by":"publisher","unstructured":"Wong, L.: Kleisli, a functional query system. J. Funct. Program. 10(1) (2000). https:\/\/doi.org\/10.1017\/S0956796899003585","DOI":"10.1017\/S0956796899003585"}],"container-title":["Journal of Automated Reasoning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10817-022-09632-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10817-022-09632-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10817-022-09632-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,5]],"date-time":"2022-11-05T11:19:32Z","timestamp":1667647172000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10817-022-09632-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,27]]},"references-count":26,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,11]]}},"alternative-id":["9632"],"URL":"https:\/\/doi.org\/10.1007\/s10817-022-09632-4","relation":{},"ISSN":["0168-7433","1573-0670"],"issn-type":[{"value":"0168-7433","type":"print"},{"value":"1573-0670","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,27]]},"assertion":[{"value":"19 March 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}