{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T15:42:05Z","timestamp":1774021325781,"version":"3.50.1"},"publisher-location":"Cham","reference-count":51,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030993351","type":"print"},{"value":"9783030993368","type":"electronic"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T00:00:00Z","timestamp":1648512000000},"content-version":"vor","delay-in-days":87,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.<\/jats:p>","DOI":"10.1007\/978-3-030-99336-8_1","type":"book-chapter","created":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T20:02:48Z","timestamp":1648497768000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["Categorical Foundations of Gradient-Based Learning"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8742-6263","authenticated-orcid":false,"given":"Geoffrey S. H.","family":"Cruttwell","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6069-5727","authenticated-orcid":false,"given":"Bruno","family":"Gavranovi\u0107","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3988-2560","authenticated-orcid":false,"given":"Neil","family":"Ghani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3575-135X","authenticated-orcid":false,"given":"Paul","family":"Wilson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6457-1345","authenticated-orcid":false,"given":"Fabio","family":"Zanasi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,3,29]]},"reference":[{"key":"1_CR1","unstructured":"Inceptionism: Going deeper into neural networks (2015), https:\/\/ai.googleblog.com\/2015\/06\/inceptionism-going-deeper-into-neural.html"},{"key":"1_CR2","unstructured":"Explainable AI: the basics - policy briefing (2019), royalsociety.org\/ai-interpretability"},{"key":"1_CR3","doi-asserted-by":"publisher","unstructured":"Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004. pp. 415\u2013425 (2004). https:\/\/doi.org\/10.1109\/LICS.2004.1319636","DOI":"10.1109\/LICS.2004.1319636"},{"key":"1_CR4","unstructured":"Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: 30th Conference on Neural Information Processings Systems (NIPS) (2016)"},{"key":"1_CR5","unstructured":"Baez, J.C., Erbele, J.: Categories in Control. Theory and Applications of Categories 30(24), 836\u2013881 (2015)"},{"key":"1_CR6","doi-asserted-by":"publisher","unstructured":"Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang: Resourceful lenses for string data. SIGPLAN Not. 43(1), 407\u2013419 (Jan 2008). https:\/\/doi.org\/10.1145\/1328897.1328487","DOI":"10.1145\/1328897.1328487"},{"key":"1_CR7","unstructured":"Boisseau, G.: String Diagrams for Optics. arXiv:2002.11480 (2020)"},{"key":"1_CR8","doi-asserted-by":"publisher","unstructured":"Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal flow diagrams I: linear relations on streams. Inf. Comput. 252, 2\u201329 (2017). https:\/\/doi.org\/10.1016\/j.ic.2016.03.002, https:\/\/doi.org\/10.1016\/j.ic.2016.03.002","DOI":"10.1016\/j.ic.2016.03.002 10.1016\/j.ic.2016.03.002"},{"key":"1_CR9","doi-asserted-by":"crossref","unstructured":"Capucci, M., Gavranovi\u2019c, B., Hedges, J., Rischel, E.F.: Towards foundations of categorical cybernetics. arXiv:2105.06332 (2021)","DOI":"10.4204\/EPTCS.372.17"},{"key":"1_CR10","doi-asserted-by":"crossref","unstructured":"Capucci, M., Ghani, N., Ledent, J., Nordvall Forsberg, F.: Translating Extensive Form Games to Open Games with Agency. arXiv:2105.06763 (2021)","DOI":"10.4204\/EPTCS.372.16"},{"key":"1_CR11","unstructured":"Chollet, F., et\u00a0al.: Keras (2015), https:\/\/github.com\/fchollet\/keras"},{"key":"1_CR12","unstructured":"Clarke, B., Elkins, D., Gibbons, J., Loregian, F., Milewski, B., Pillmore, E., Rom\u00e1n, M.: Profunctor optics, a categorical update. arXiv:2001.07488 (2020)"},{"key":"1_CR13","unstructured":"Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B., Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proceedings of the 28th Computer Science Logic (CSL) conference (2020)"},{"key":"1_CR14","doi-asserted-by":"publisher","unstructured":"Coecke, B., Kissinger, A.: Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press (2017). https:\/\/doi.org\/10.1017\/9781316219317","DOI":"10.1017\/9781316219317"},{"key":"1_CR15","doi-asserted-by":"crossref","unstructured":"Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273\u2013297 (1995)","DOI":"10.1007\/BF00994018"},{"key":"1_CR16","unstructured":"Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: Training Deep Neural Networks with binary weights during propagations. arXiv:1511.00363"},{"key":"1_CR17","unstructured":"CRCoauthors, A.: Numeric Optics: A python library for constructing and training neural networks based on lenses and reverse derivatives. https:\/\/github.com\/anonymous-c0de\/esop-2022"},{"key":"1_CR18","unstructured":"Dalrymple, D.: Dioptics: a common generalization of open games and gradient-based learners. SYCO7 (2019), https:\/\/research.protocol.ai\/publications\/dioptics-a-common-generalization-of-open-games-and-gradient-based-learners\/dalrymple2019.pdf"},{"key":"1_CR19","doi-asserted-by":"crossref","unstructured":"Dosovitskiy, A., Brox, T.: Inverting convolutional networks with convolutional networks. arXiv:1506.02753 (2015)","DOI":"10.1109\/CVPR.2016.522"},{"key":"1_CR20","unstructured":"Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121\u20132159 (2011)"},{"key":"1_CR21","doi-asserted-by":"crossref","unstructured":"Elliott, C.: The simple essence of automatic differentiation (differentiable functional programming made easy). arXiv:1804.00746 (2018)","DOI":"10.1145\/3236765"},{"key":"1_CR22","unstructured":"Fong, B., Johnson, M.: Lenses and learners. In: Proceedings of the 8th International Workshop on Bidirectional transformations (Bx@PLW) (2019)"},{"key":"1_CR23","doi-asserted-by":"crossref","unstructured":"Fong, B., Spivak, D.I., Tuy\u00e9ras, R.: Backprop as functor: A compositional perspective on supervised learning. In: Proceedings of the Thirty fourth Annual IEEE Symposium on Logic in Computer Science (LICS 2019). pp. 1\u201313. IEEE Computer Society Press (June 2019)","DOI":"10.1109\/LICS.2019.8785665"},{"key":"1_CR24","unstructured":"Gavranovic, B.: Compositional deep learning. arXiv:1907.08292 (2019)"},{"key":"1_CR25","doi-asserted-by":"publisher","unstructured":"Ghani, N., Hedges, J., Winschel, V., Zahn, P.: Compositional game theory. In: Proceedings of the 33rd Annual ACM\/IEEE Symposium on Logic in Computer Science. p. 472\u2013481. LICS \u201918 (2018). https:\/\/doi.org\/10.1145\/3209108.3209165","DOI":"10.1145\/3209108.3209165"},{"key":"1_CR26","doi-asserted-by":"crossref","unstructured":"Ghica, D.R., Jung, A., Lopez, A.: Diagrammatic Semantics for Digital Circuits. arXiv:1703.10247 (2017)","DOI":"10.1109\/FMCAD.2016.7886659"},{"key":"1_CR27","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672\u20132680 (2014), http:\/\/papers.nips.cc\/paper\/5423-generative-adversarial-nets.pdf"},{"key":"1_CR28","doi-asserted-by":"crossref","unstructured":"Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics (2008)","DOI":"10.1137\/1.9780898717761"},{"key":"1_CR29","unstructured":"Hedges, J.: Limits of bimorphic lenses. arXiv:1808.05545 (2018)"},{"key":"1_CR30","doi-asserted-by":"publisher","unstructured":"Hermida, C., Tennent, R.D.: Monoidal indeterminates and categories of possible worlds. Theor. Comput. Sci. 430, 3\u201322 (Apr 2012). https:\/\/doi.org\/10.1016\/j.tcs.2012.01.001","DOI":"10.1016\/j.tcs.2012.01.001"},{"key":"1_CR31","doi-asserted-by":"crossref","unstructured":"Johnson, M., Rosebrugh, R., Wood, R.: Lenses, fibrations and universal translations. Mathematical structures in computer science 22, 25\u201342 (2012)","DOI":"10.1017\/S0960129511000442"},{"key":"1_CR32","unstructured":"Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015), http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"1_CR33","doi-asserted-by":"publisher","unstructured":"Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. pp. 2278\u20132324 (1998). https:\/\/doi.org\/10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"1_CR34","doi-asserted-by":"crossref","unstructured":"Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. arXiv:1412.0035 (2014)","DOI":"10.1109\/CVPR.2015.7299155"},{"key":"1_CR35","doi-asserted-by":"crossref","unstructured":"Nguyen, A.M., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv:1412.1897 (2014)","DOI":"10.1109\/CVPR.2015.7298640"},{"key":"1_CR36","unstructured":"Olah, C.: Neural networks, types, and functional programming (2015), http:\/\/colah.github.io\/posts\/2015-09-NN-Types-FP\/"},{"key":"1_CR37","doi-asserted-by":"publisher","unstructured":"Polyak, B.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics 4(5), 1 \u2013 17 (1964). https:\/\/doi.org\/10.1016\/0041-5553(64)90137-5, http:\/\/www.sciencedirect.com\/science\/article\/pii\/0041555364901375","DOI":"10.1016\/0041-5553(64)90137-5"},{"key":"1_CR38","unstructured":"Riley, M.: Categories of optics. arXiv:1809.00738 (2018)"},{"key":"1_CR39","doi-asserted-by":"crossref","unstructured":"Selinger, P.: A survey of graphical languages for monoidal categories. Lecture Notes in Physics p. 289\u2013355 (2010)","DOI":"10.1007\/978-3-642-12821-9_4"},{"key":"1_CR40","doi-asserted-by":"crossref","unstructured":"Selinger, P.: Control categories and duality: on the categorical semantics of the lambda-mu calculus. Mathematical Structures in Computer Science 11(02), 207\u2013260 (4 2001). https:\/\/doi.org\/null, http:\/\/journals.cambridge.org\/article_S096012950000311X","DOI":"10.1017\/S096012950000311X"},{"key":"1_CR41","unstructured":"Seshia, S.A., Sadigh, D.: Towards verified artificial intelligence. CoRR abs\/1606.08514 (2016), http:\/\/arxiv.org\/abs\/1606.08514"},{"key":"1_CR42","doi-asserted-by":"crossref","unstructured":"Shiebler, D.: Categorical Stochastic Processes and Likelihood. Compositionality 3(1) (2021)","DOI":"10.32408\/compositionality-3-1"},{"key":"1_CR43","unstructured":"Shiebler, D., Gavranovi\u0107, B., Wilson, P.: Category Theory in Machine Learning. arXiv:2106.07032 (2021)"},{"key":"1_CR44","unstructured":"Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034 (2014)"},{"key":"1_CR45","unstructured":"Spivak, D.I.: Functorial data migration. arXiv:1009.1166 (2010)"},{"key":"1_CR46","doi-asserted-by":"crossref","unstructured":"Sprunger, D., Katsumata, S.y.: Differentiable causal computations via delayed trace. In: Proceedings of the 34th Annual ACM\/IEEE Symposium on Logic in Computer Science. LICS \u201919, IEEE Press (2019)","DOI":"10.1109\/LICS.2019.8785670"},{"key":"1_CR47","unstructured":"Steckermeier, A.: Lenses in functional programming. Preprint, available at https:\/\/sinusoid.es\/misc\/lager\/lenses.pdf (2015)"},{"key":"1_CR48","unstructured":"Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. vol.\u00a028, pp. 1139\u20131147 (2013), http:\/\/proceedings.mlr.press\/v28\/sutskever13.html"},{"key":"1_CR49","doi-asserted-by":"publisher","unstructured":"Turi, D., Plotkin, G.: Towards a mathematical operational semantics. In: Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science. pp. 280\u2013291 (1997). https:\/\/doi.org\/10.1109\/LICS.1997.614955","DOI":"10.1109\/LICS.1997.614955"},{"key":"1_CR50","unstructured":"Wilson, P., Zanasi, F.: Reverse derivative ascent: A categorical approach to learning boolean circuits. In: Proceedings of Applied Category Theory (ACT) (2020), https:\/\/cgi.cse.unsw.edu.au\/~eptcs\/paper.cgi?ACT2020:31"},{"key":"1_CR51","doi-asserted-by":"crossref","unstructured":"Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv:1703.10593 (2017)","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["Lecture Notes in Computer Science","Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-99336-8_1","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T05:06:37Z","timestamp":1731647197000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-99336-8_1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783030993351","9783030993368"],"references-count":51,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-99336-8_1","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"29 March 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ESOP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"European Symposium on Programming","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Munich","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"5 April 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"7 April 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"31","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"esop2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/etaps.org\/2022\/esop","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"HotCRP","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"64","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"21","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"33% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.5","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"7","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}