{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T18:05:59Z","timestamp":1775239559092,"version":"3.50.1"},"reference-count":230,"publisher":"Emerald","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Many types of data from fields including natural language processing, computer vision, and bioinformatics are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging as neural networks are typically designed for continuous computation.<\/jats:p>\n                  <jats:p>This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.<\/jats:p>","DOI":"10.1561\/2000000134","type":"journal-article","created":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T06:42:01Z","timestamp":1748846521000},"page":"99-211","source":"Crossref","is-referenced-by-count":2,"title":["Discrete Latent Structure in Neural Networks"],"prefix":"10.1561","volume":"19","author":[{"given":"Vlad","family":"Niculae","sequence":"first","affiliation":[{"name":"1Language Technology Lab, Informatics Institute, Faculty of Science, University of Amsterdam ,","place":["Netherlands"]}]},{"given":"Caio","family":"Corro","sequence":"additional","affiliation":[{"name":"INSA Rennes, IRISA, Inria, CNRS, Universit\u00e9 de Rennes ,","place":["France"]}]},{"given":"Nikita","family":"Nangia","sequence":"additional","affiliation":[{"name":"Amazon ,","place":["USA"]}]},{"given":"Tsvetomila","family":"Mihaylova","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Automation, Aalto University ,","place":["Finland"]}]},{"given":"Andr\u00e9 F. T.","family":"Martins","sequence":"additional","affiliation":[{"name":"Instituto Superior T\u00e9cnico ,","place":["Portugal"]},{"name":"Instituto de Telecomunica\u00e7\u00f5es ,","place":["Portugal"]},{"name":"Unbabel ,","place":["Portugal"]}]}],"member":"140","published-online":{"date-parts":[[2025,6,30]]},"reference":[{"issue":"1","key":"2026040313322600300_ref001","first-page":"147","article-title":"A learning algorithm for boltzmann machines","volume":"9","author":"Ackley","year":"1985","journal-title":"Cognitive science,"},{"key":"2026040313322600300_ref002","volume-title":"preprint arXiv:1106.1925,","author":"Adams","year":"2011"},{"key":"2026040313322600300_ref003","volume-title":"Advances in Neural Information Processing Systems,","author":"Agrawal","year":"2019"},{"key":"2026040313322600300_ref004","first-page":"38 546","article-title":"Exploring length generalization in large language models","volume":"35","author":"Anil","year":"2022","journal-title":"Advances in Neural Information Processing Systems,"},{"issue":"S1","key":"2026040313322600300_ref005","doi-asserted-by":"publisher","DOI":"10.1121\/1.2017061","article-title":"Trainable grammars for speech recognition","volume":"65","author":"Baker","year":"1979","journal-title":"The Journal of the Acoustical Society of America,"},{"key":"2026040313322600300_ref006","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511804779","volume-title":"Bayesian reasoning and machine learning.","author":"Barber","year":"2012"},{"key":"2026040313322600300_ref007","doi-asserted-by":"publisher","first-page":"2963","DOI":"10.18653\/v1\/P19-1284","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,","author":"Bastings"},{"key":"2026040313322600300_ref008","volume-title":"An inequality and associated maximization technique in statistical estimation of probabilistic functions of a markov process","author":"Baum","year":"1972"},{"key":"2026040313322600300_ref009","doi-asserted-by":"publisher","first-page":"673","DOI":"10.18653\/v1\/N19-1071","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),","author":"Baziotis"},{"key":"2026040313322600300_ref010","article-title":"The infinite hidden markov model","volume":"14","author":"Beal","year":"2001","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"6","key":"2026040313322600300_ref011","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1090\/S0002-9904-1954-09848-8","volume":"60","author":"Bellman","year":"1954","journal-title":"Bulletin of the American Mathematical Society"},{"issue":"8","key":"2026040313322600300_ref012","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence,"},{"key":"2026040313322600300_ref013","volume-title":"Proc. of NIPS,","author":"Bengio","year":"2013"},{"key":"2026040313322600300_ref014","first-page":"9508","article-title":"Learning with differentiable pertubed optimizers","volume":"33","author":"Berthet","year":"2020","journal-title":"Advances in neural information processing systems,"},{"issue":"1","key":"2026040313322600300_ref015","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/BF02186476","article-title":"The auction algorithm: A distributed relaxation method for the assignment problem","volume":"14","author":"Bertsekas","year":"1988","journal-title":"Annals of Operations Research,"},{"key":"2026040313322600300_ref016","volume-title":"Nonlinear Programming.","author":"Bertsekas","year":"1999"},{"issue":"3","key":"2026040313322600300_ref017","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1137\/S1052623497331063","article-title":"Gradient convergence in gradient methods with errors","volume":"10","author":"Bertsekas","year":"2000","journal-title":"SIAM Journal on Optimization,"},{"key":"2026040313322600300_ref018","first-page":"147","article-title":"Tres observaciones sobre el algebra lineal","volume":"5","author":"Birkhoff","year":"1946","journal-title":"Univ. Nac. Tucum\u00e1n Rev. Ser. A,"},{"key":"2026040313322600300_ref019","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1007\/978-94-011-5014-9_13","volume-title":"Learning in Graphical Models,","author":"Bishop","year":"1998"},{"key":"2026040313322600300_ref020","volume-title":"arXiv preprint arXiv:2105.15183,","author":"Blondel","year":"2021"},{"issue":"35","key":"2026040313322600300_ref021","first-page":"1","article-title":"Learning with Fenchel-Young losses","volume":"21","author":"Blondel","year":"2020","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313322600300_ref022","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1162\/tacl_a_00361","article-title":"Latent compositional representations improve systematic generalization in grounded question answering","volume":"9","author":"Bogin","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics,"},{"key":"2026040313322600300_ref023","volume-title":"Convex analysis and nonlinear optimization: theory and examples.","author":"Borwein","year":"2010"},{"key":"2026040313322600300_ref024","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1023\/A:1007541817488","article-title":"An efficient, probabilistically sound algorithm for segmentation and word discovery","volume":"34","author":"Brent","year":"1999","journal-title":"Machine Learning,"},{"key":"2026040313322600300_ref025","article-title":"Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters","volume":"2","author":"Bridle","year":"1989","journal-title":"Advances in Neural Information Processing Systems,"},{"issue":"3","key":"2026040313322600300_ref026","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/0167-6377(84)90010-5","article-title":"An O(n) algorithm for quadratic knapsack problems","volume":"3","author":"Brucker","year":"1984","journal-title":"Operations Research Letters,"},{"issue":"1","key":"2026040313322600300_ref027","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1007379606734","article-title":"Multitask learning","volume":"28","author":"Caruana","year":"1997","journal-title":"Machine learning,"},{"issue":"1","key":"2026040313322600300_ref028","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1093\/biomet\/83.1.81","article-title":"Rao-blackwellisation of sampling schemes","volume":"83","author":"Casella","year":"1996","journal-title":"Biometrika,"},{"key":"2026040313322600300_ref029","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.2830","article-title":"Content modeling using latent permutations","volume":"36","author":"Chen","year":"2009","journal-title":"Journal of Artificial Intelligence Research,"},{"key":"2026040313322600300_ref030","doi-asserted-by":"publisher","first-page":"1724","DOI":"10.3115\/v1\/D14-1179","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP),","author":"Cho","year":"2014"},{"key":"2026040313322600300_ref031","first-page":"1396","article-title":"On the shortest arborescence of a directed graph","volume":"14","author":"Chu","year":"1965","journal-title":"Science Sinica,"},{"key":"2026040313322600300_ref032","volume-title":"Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL),","author":"Clark","year":"2001"},{"key":"2026040313322600300_ref033","author":"Cocke","year":"1970"},{"key":"2026040313322600300_ref034","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-02170-1","volume-title":"Bayesian Analysis in Natural Language Processing,","author":"Cohen","year":"2019"},{"key":"2026040313322600300_ref035","volume-title":"preprint arXiv:1206.6735,","author":"Cohen","year":"2012"},{"issue":"1-2","key":"2026040313322600300_ref036","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1007\/s10107-015-0946-6","article-title":"Fast projection onto the simplex and the l 1 ball","volume":"158","author":"Condat","year":"2016","journal-title":"Mathematical Programming,"},{"key":"2026040313322600300_ref037","volume-title":"Proceedings of EMNLP-IJCNLP,","author":"Correia","year":"2019"},{"key":"2026040313322600300_ref038","volume-title":"Proceedings of NeurIPS,","author":"Correia","year":"2020"},{"key":"2026040313322600300_ref039","volume-title":"Proc. of ICLR,","author":"Corro","year":"2019"},{"key":"2026040313322600300_ref040","volume-title":"Proc. of ACL,","author":"Corro","year":"2019"},{"key":"2026040313322600300_ref041","volume-title":"Proceedings of NeurIPS,","author":"Cuturi","year":"2013"},{"key":"2026040313322600300_ref042","article-title":"ser. Proceedings of Machine Learning Research","volume":"70","author":"Cuturi","year":"2017","journal-title":"Proceedings of ICML,"},{"issue":"1-2","key":"2026040313322600300_ref043","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2400000036","article-title":"Acceleration methods","volume":"5","author":"d\u2019Aspremont","year":"2021","journal-title":"Foundations and Trends\u00ae in Optimization,"},{"issue":"2","key":"2026040313322600300_ref044","doi-asserted-by":"crossref","first-page":"183","DOI":"10.2140\/pjm.1955.5.183","article-title":"The generalized simplex method for minimizing a linear form under linear inequality restraints","volume":"5","author":"Dantzig","year":"1955","journal-title":"Pacific Journal of Mathematics,"},{"key":"2026040313322600300_ref045","doi-asserted-by":"crossref","DOI":"10.1017\/9781108679930","volume-title":"Mathematics for machine learning.","author":"Deisenroth","year":"2020"},{"issue":"1","key":"2026040313322600300_ref046","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the em algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"Journal of the Royal Statistical Society: Series B (Methodological),"},{"key":"2026040313322600300_ref047","article-title":"Latent alignment and variational attention","volume":"31","author":"Deng","year":"2018","journal-title":"Advances in Neural Information Processing Systems,"},{"issue":"10","key":"2026040313322600300_ref048","doi-asserted-by":"crossref","first-page":"2454","DOI":"10.1109\/TPAMI.2013.31","article-title":"Learning graphical model parameters with approximate marginal inference","volume":"35","author":"Domke","year":"2013","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence,"},{"key":"2026040313322600300_ref049","volume-title":"Proc of ReplNLP,","author":"Drozdov","year":"2017"},{"key":"2026040313322600300_ref050","first-page":"190","volume-title":"Proceedings of the 2013 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies,","author":"Du","year":"2013"},{"key":"2026040313322600300_ref051","volume-title":"Proc. of ICML,","author":"Duchi","year":"2008"},{"key":"2026040313322600300_ref052","doi-asserted-by":"publisher","first-page":"233","DOI":"10.6028\/jres.071b.032","article-title":"Optimum branchings","volume":"71B","author":"Edmonds","year":"1967","journal-title":"J. Res. Nat. Bur. Stand.,"},{"key":"2026040313322600300_ref053","first-page":"334","volume-title":"Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing,","author":"Eisenstein","year":"2008"},{"key":"2026040313322600300_ref054","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/W16-5901","volume-title":"Proceedings of the Workshop on Structured Prediction for NLP,","author":"Eisner","year":"2016"},{"key":"2026040313322600300_ref055","doi-asserted-by":"crossref","first-page":"28 940","DOI":"10.52202\/068431-2098","article-title":"Savi++: Towards end-to-end object-centric learning from real-world videos","volume":"35","author":"Elsayed","year":"2022","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref056","volume-title":"International Conference on Learning Representations,","author":"Farinhas","year":"2022"},{"issue":"3","key":"2026040313322600300_ref057","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1109\/PROC.1973.9030","article-title":"The Viterbi algorithm","volume":"61","author":"Forney","year":"1973","journal-title":"Proceedings of the IEEE,"},{"issue":"1-2","key":"2026040313322600300_ref058","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1002\/nav.3800030109","article-title":"An algorithm for quadratic programming","volume":"3","author":"Frank","year":"1956","journal-title":"Nav. Res. Log.,"},{"issue":"2","key":"2026040313322600300_ref059","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1111\/j.1467-9892.1994.tb00184.x","article-title":"Data augmentation and dynamic linear models","volume":"15","author":"Fr\u00fchwirth-Schnatter","year":"1994","journal-title":"Journal of Time Series Analysis,"},{"key":"2026040313322600300_ref060","first-page":"20 259","article-title":"Latent template induction with gumbel-crfs","volume":"33","author":"Fu","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref061","volume":"174","author":"Garey","year":"1979","journal-title":"Computers and intractability,"},{"key":"2026040313322600300_ref062","volume-title":"Low-Power Computer Vision,","author":"Gholami","year":"2022"},{"key":"2026040313322600300_ref063","volume":"116","author":"Glasserman","year":"1990","journal-title":"Gradient estimation via perturbation analysis,"},{"issue":"10","key":"2026040313322600300_ref064","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1145\/84537.84552","article-title":"Likelihood ratio gradient estimation for stochastic systems","volume":"33","author":"Glynn","year":"1990","journal-title":"Communications of the ACM,"},{"key":"2026040313322600300_ref065","first-page":"709","volume-title":"Bulletin of the American Mathematical Society,","author":"Goldstein","year":"1964"},{"key":"2026040313322600300_ref066","doi-asserted-by":"publisher","first-page":"673","DOI":"10.3115\/1220175.1220260","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,","author":"Goldwater","year":"2006"},{"key":"2026040313322600300_ref067","first-page":"744","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,","author":"Goldwater","year":"2007"},{"key":"2026040313322600300_ref068","volume-title":"Proc. ICLR,","author":"Grathwohl","year":"2018"},{"key":"2026040313322600300_ref069","first-page":"2424","article-title":"Multi-object representation learning with iterative variational inference","volume":"97","author":"Greff","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning"},{"key":"2026040313322600300_ref070","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898717761","volume-title":"Evaluating Derivatives,","author":"Griewank","year":"2008"},{"key":"2026040313322600300_ref071","volume-title":"Proc. ICLR,","author":"Gu","year":"2016"},{"key":"2026040313322600300_ref072","volume":"33","author":"Gumbel","year":"1954","journal-title":"Statistical theory of extreme values and some practical applications: a series of lectures,"},{"key":"2026040313322600300_ref073","volume-title":"Proc. NeurIPS,","author":"Havrylov","year":"2017"},{"key":"2026040313322600300_ref074","volume-title":"Proceedings of ICML,","author":"Hazan","year":"2012"},{"key":"2026040313322600300_ref075","article-title":"On sampling from the gibbs distribution with random maximum a-posteriori perturbations","volume":"26","author":"Hazan","year":"2013","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"2026040313322600300_ref076","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1007\/BF01580223","article-title":"Validation of subgradient optimization","volume":"6","author":"Held","year":"1974","journal-title":"Mathematical Programming,"},{"key":"2026040313322600300_ref077","volume-title":"Neural networks for machine learning,","author":"Hinton","year":"2012"},{"issue":"1358","key":"2026040313322600300_ref078","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1098\/rstb.1997.0101","article-title":"Generative models for discovering sparse distributed representations","volume":"352","author":"Hinton","year":"1997","journal-title":"Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences,"},{"issue":"8","key":"2026040313322600300_ref079","doi-asserted-by":"publisher","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training Products of Experts by Minimizing Contrastive Divergence","volume":"14","author":"Hinton","year":"2002","journal-title":"Neural Computation,"},{"issue":"7","key":"2026040313322600300_ref080","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A Fast Learning Algorithm for Deep Belief Nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Computation,"},{"issue":"1\u20134","key":"2026040313322600300_ref081","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1002\/sapm1941201224","article-title":"The distribution of a product from several sources to numerous localities","volume":"20","author":"Hitchcock","year":"1941","journal-title":"Journal of mathematics and physics,"},{"issue":"4","key":"2026040313322600300_ref082","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1007\/BF00933971","article-title":"Optimization and perturbation analysis of queueing networks","volume":"40","author":"Ho","year":"1983","journal-title":"Journal of Optimization Theory and Applications,"},{"key":"2026040313322600300_ref083","volume-title":"Advances in Neural Information Processing Systems","author":"Hoogeboom","year":"2019"},{"key":"2026040313322600300_ref084","first-page":"1","volume-title":"Coling 2008: Advanced Dynamic Programming in Computational Linguistics: Theory, Algorithms and Applications - Tutorial notes,","author":"Huang","year":"2008"},{"key":"2026040313322600300_ref085","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence,","author":"Huijben","year":"2022"},{"issue":"2","key":"2026040313322600300_ref086","first-page":"730","article-title":"Spike and slab variable selection: Frequentist and bayesian strategies","volume":"33","author":"Ishwaran","year":"2005","journal-title":"et al.,"},{"key":"2026040313322600300_ref087","volume-title":"Proc. of ICLR,","author":"Jang","year":"2017"},{"key":"2026040313322600300_ref088","first-page":"139","volume-title":"Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference,","author":"Johnson","year":"2007"},{"key":"2026040313322600300_ref089","article-title":"Composing graphical models with neural networks for structured representations and fast inference","volume":"29","author":"Johnson","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"4","key":"2026040313322600300_ref090","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1007\/BF02278710","article-title":"A shortest augmenting path algorithm for dense and sparse linear assignment problems","volume":"38","author":"Jonker","year":"1987","journal-title":"Computing,"},{"key":"2026040313322600300_ref091","volume-title":"Speech and Language Processing (3rd ed.)","author":"Jurafsky","year":"2018"},{"key":"2026040313322600300_ref092","first-page":"1865","article-title":"Regularization techniques for learning with matrices","volume":"13","author":"Kakade","year":"2012","journal-title":"Journal of Machine Learning Research,"},{"issue":"7\u20138","key":"2026040313322600300_ref093","first-page":"227","article-title":"On the translocation of masses","volume":"37","author":"Kantorovich","year":"1942","journal-title":"Dokl Asad Nauk SSSR,"},{"key":"2026040313322600300_ref094","doi-asserted-by":"crossref","first-page":"77","DOI":"10.63317\/2c235hpjhnk8","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024),","author":"Kasai","year":"2024"},{"key":"2026040313322600300_ref095","author":"Kasami","year":"1966"},{"key":"2026040313322600300_ref096","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1084","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers),","author":"Keith","year":"2018"},{"key":"2026040313322600300_ref097","volume-title":"Proc. of ICLR,","author":"Kim","year":"2017"},{"key":"2026040313322600300_ref098","volume-title":"arXiv preprint arXiv:1812.06834,","author":"Kim","year":"2018"},{"key":"2026040313322600300_ref099","volume-title":"Proc. of ICLR,","author":"Kingma","year":"2015"},{"key":"2026040313322600300_ref100","volume-title":"Proceedings of ICLR,","author":"Kingma","year":"2014"},{"key":"2026040313322600300_ref101","article-title":"Semi-supervised learning with deep generative models","volume":"27","author":"Kingma","year":"2014","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref102","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1162\/tacl_a_00101","article-title":"Simple and accurate dependency parsing using bidirectional lstm feature representations","volume":"4","author":"Kiperwasser","year":"2016","journal-title":"TACL,"},{"issue":"12","key":"2026040313322600300_ref103","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1002\/andp.18471481202","article-title":"Ueber die aufl\u00f6sung der gleichungen, auf welche man bei der untersuchung der linearen vertheilung galvanischer str\u00f6me gef\u00fchrt wird","volume":"148","author":"Kirchhoff","year":"1847","journal-title":"Annalen der Physik,"},{"key":"2026040313322600300_ref104","doi-asserted-by":"publisher","first-page":"478","DOI":"10.3115\/1218955.1219016","volume-title":"Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04),","author":"Klein","year":"2004"},{"key":"2026040313322600300_ref105","first-page":"5","volume-title":"ICCV,","author":"Komodakis","year":"2007"},{"issue":"47","key":"2026040313322600300_ref106","first-page":"1","article-title":"Ancestral gumbel-top-k sampling for sampling without replacement","volume":"21","author":"Kool","year":"2020","journal-title":"Journal of Machine Learning Research,"},{"key":"2026040313322600300_ref107","volume-title":"Proc. ICLR,","author":"Kool","year":"2020"},{"key":"2026040313322600300_ref108","first-page":"3499","article-title":"Stochastic beams and where to find them: The Gumbel-top-k trick for sampling sequences without replacement","volume":"97","author":"Kool","year":"2019","journal-title":"Proceedings of the 36th International Conference on Machine Learning,"},{"key":"2026040313322600300_ref109","first-page":"7574","article-title":"Storchastic: A framework for general stochastic automatic differentiation","volume":"34","author":"van","year":"2021"},{"issue":"1-2","key":"2026040313322600300_ref110","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Nav. Res. Log.,"},{"issue":"1-2","key":"2026040313322600300_ref111","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/0022-1694(80)90036-0","article-title":"A generalized probability density function for double-bounded random processes","volume":"46","author":"Kumaraswamy","year":"1980","journal-title":"Journal of hydrology,"},{"key":"2026040313322600300_ref112","volume-title":"Proc. ICML,","author":"Kyrillidis","year":"2013"},{"issue":"4","key":"2026040313322600300_ref113","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1287\/mnsc.41.4.738","article-title":"Note: On the interchange of derivative and expectation for likelihood ratio derivative estimators","volume":"41","author":"L\u2019Ecuyer","year":"1995","journal-title":"Management Science,"},{"key":"2026040313322600300_ref114","volume-title":"Proc. of NIPS,","author":"Lacoste-Julien","year":"2015"},{"key":"2026040313322600300_ref115","first-page":"3","volume":"1","author":"Lafferty","year":"2001","journal-title":"et al.,"},{"key":"2026040313322600300_ref116","volume-title":"Proceedings of ICLR,","author":"Lan","year":"2020"},{"issue":"1","key":"2026040313322600300_ref117","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1016\/0885-2308(90)90022-X","article-title":"The estimation of stochastic context-free grammars using the inside-outside algorithm","volume":"4","author":"Lari","year":"1990","journal-title":"Computer Speech & Language,"},{"key":"2026040313322600300_ref118","volume-title":"Proc. ICLR,","author":"Lazaridou","year":"2017"},{"key":"2026040313322600300_ref119","first-page":"1","volume-title":"USSR Computational mathematics and mathematical physics,","author":"Levitin","year":"1966"},{"key":"2026040313322600300_ref120","volume-title":"Proc. of EMNLP,","author":"Li","year":"2009"},{"key":"2026040313322600300_ref121","volume-title":"Proceedings of ICML","author":"Liang","year":"2010"},{"key":"2026040313322600300_ref122","volume-title":"International Conference on Learning Representations,","author":"Lin","year":"2018"},{"key":"2026040313322600300_ref123","volume-title":"Ph.D. dissertation, Master\u2019s Thesis (in Finnish)","author":"Linnainmaa","year":"1970"},{"key":"2026040313322600300_ref124","doi-asserted-by":"publisher","first-page":"1129","DOI":"10.18653\/v1\/2021.findings-acl.97","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021,","author":"Liu","year":"2021"},{"key":"2026040313322600300_ref125","doi-asserted-by":"publisher","first-page":"5800","DOI":"10.18653\/v1\/2023.findings-emnlp.386","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023,","author":"Liu","year":"2023"},{"key":"2026040313322600300_ref126","volume-title":"Proc. ICML,","author":"Liu","year":"2019"},{"key":"2026040313322600300_ref127","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1184","volume-title":"Proceedings of EMNLP","author":"Liu","year":"2018"},{"key":"2026040313322600300_ref128","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1162\/tacl_a_00005","article-title":"Learning structured text representations","volume":"6","author":"Liu","year":"2018","journal-title":"TACL,"},{"key":"2026040313322600300_ref129","first-page":"11 525","article-title":"Object-centric learning with slot attention","volume":"33","author":"Locatello","year":"2020","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref130","volume-title":"International Conference on Learning Representations,","author":"Loshchilov","year":"2019"},{"key":"2026040313322600300_ref131","volume-title":"International Conference on Learning Representations,","author":"Louizos","year":"2018"},{"key":"2026040313322600300_ref132","author":"Lowerre","year":"1976"},{"key":"2026040313322600300_ref133","volume-title":"Proc. of ICLR,","author":"Maddison","year":"2017"},{"key":"2026040313322600300_ref134","first-page":"342","volume-title":"Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP,","author":"Martins","year":"2009"},{"key":"2026040313322600300_ref135","volume-title":"Proc. of ICML","author":"Martins","year":"2016"},{"issue":"1","key":"2026040313322600300_ref136","first-page":"495","article-title":"Ad3: Alternating directions dual decomposition for map inference in graphical models","volume":"16","author":"Martins","year":"2015","journal-title":"JMLR,"},{"key":"2026040313322600300_ref137","volume-title":"Proc. of ACL-IJCNLP,","author":"Martins","year":"2009"},{"key":"2026040313322600300_ref138","doi-asserted-by":"publisher","first-page":"664","DOI":"10.18653\/v1\/2021.emnlp-main.52","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Meister","year":"2021"},{"key":"2026040313322600300_ref139","doi-asserted-by":"publisher","first-page":"2173","DOI":"10.18653\/v1\/2020.emnlp-main.170","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP),","author":"Meister","year":"2020"},{"key":"2026040313322600300_ref140","doi-asserted-by":"publisher","first-page":"795","DOI":"10.1162\/tacl_a_00346","article-title":"Best-first beam search","volume":"8","author":"Meister","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics,"},{"key":"2026040313322600300_ref141","volume-title":"Proc. of ICLR,","author":"Mena","year":"2018"},{"key":"2026040313322600300_ref142","volume-title":"Proc. of ICML,","author":"Mensch","year":"2018"},{"key":"2026040313322600300_ref143","volume-title":"Proc. of NIPS,","author":"Meshi","year":"2015"},{"issue":"247","key":"2026040313322600300_ref144","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1080\/01621459.1949.10483310","article-title":"The monte carlo method","volume":"44","author":"Metropolis","year":"1949","journal-title":"Journal of the American Statistical Association,"},{"key":"2026040313322600300_ref145","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.171","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP),","author":"Mihaylova","year":"2020"},{"issue":"404","key":"2026040313322600300_ref146","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1080\/01621459.1988.10478694","article-title":"Bayesian variable selection in linear regression","volume":"83","author":"Mitchell","year":"1988","journal-title":"Journal of the American Statistical Association,"},{"key":"2026040313322600300_ref147","volume-title":"Proceedings of ICML,","author":"Mnih","year":"2014"},{"issue":"132","key":"2026040313322600300_ref148","first-page":"1","article-title":"Monte carlo gradient estimation in machine learning","volume":"21","author":"Mohamed","year":"2020","journal-title":"J. Mach. Learn. Res.,"},{"issue":"3","key":"2026040313322600300_ref149","first-page":"321","article-title":"Semiring frameworks and algorithms for shortest-distance problems","volume":"7","author":"Mohri","year":"2002","journal-title":"J. Autom. Lang. Comb.,"},{"key":"2026040313322600300_ref150","volume-title":"Probabilistic Machine Learning: An introduction.","author":"Murphy","year":"2022"},{"key":"2026040313322600300_ref151","doi-asserted-by":"publisher","first-page":"974","DOI":"10.1162\/tacl_a_00583","article-title":"Conditional generation with a question-answering blueprint","volume":"11","author":"Narayan","year":"2023","journal-title":"Transactions of the Association for Computational Linguistics,"},{"issue":"1","key":"2026040313322600300_ref152","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/0004-3702(92)90065-6","article-title":"Connectionist learning of belief networks","volume":"56","author":"Neal","year":"1992","journal-title":"Artificial Intelligence,"},{"issue":"3","key":"2026040313322600300_ref153","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"Journal of Molecular Biology,"},{"key":"2026040313322600300_ref154","first-page":"543","article-title":"A method for solving the convex programming problem with convergence rate O(1\/k2)","volume":"269","author":"Nesterov","year":"1983","journal-title":"Dokl akad nauk Sssr,"},{"key":"2026040313322600300_ref155","volume-title":"Proc. of NIPS,","author":"Niculae","year":"2017"},{"key":"2026040313322600300_ref156","article-title":"LP-SparseMAP: Differentiable relaxed optimization for sparse structured prediction","volume":"119","author":"Niculae","year":"2020","journal-title":"Proceedings of ICML,"},{"key":"2026040313322600300_ref157","volume-title":"Proc. of ICML,","author":"Niculae","year":"2018"},{"key":"2026040313322600300_ref158","first-page":"14 567","article-title":"Implicit mle: Back-propagating through discrete exponential family distributions","volume":"34","author":"Niepert","year":"2021","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref159","doi-asserted-by":"publisher","DOI":"10.1007\/b98874","volume-title":"Numerical Optimization.","author":"Nocedal","year":"1999"},{"key":"2026040313322600300_ref160","article-title":"Neural discrete representation learning","volume":"30","author":"van den Oord","year":"2017","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref161","volume-title":"Proc. ICML,","author":"Paisley","year":"2012"},{"key":"2026040313322600300_ref162","first-page":"51","volume-title":"Robotics and Autonomous Systems,","author":"Palmer","year":"2017"},{"key":"2026040313322600300_ref163","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1109\/ICCV.2011.6126242","volume-title":"2011 International Conference on Computer Vision,","author":"Papandreou","year":"2011"},{"key":"2026040313322600300_ref164","volume-title":"Statistical field theory","author":"Parisi","year":"1988"},{"key":"2026040313322600300_ref165","first-page":"5691","article-title":"Gradient estimation with stochastic softmax tricks","volume":"33","author":"Paulus","year":"2020","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref166","first-page":"1","volume-title":"Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference,","author":"Pearce","year":"2020"},{"key":"2026040313322600300_ref167","doi-asserted-by":"publisher","first-page":"1863","DOI":"10.18653\/v1\/P18-1173","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),","author":"Peng","year":"2018"},{"issue":"473","key":"2026040313322600300_ref168","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1198\/016214505000000907","article-title":"Convexity, classification, and risk bounds","volume":"101","author":"Peter L Bartlett","year":"2006","journal-title":"Journal of the American Statistical Association,"},{"key":"2026040313322600300_ref169","volume-title":"Proc. ACL,","author":"Peters","year":"2019"},{"key":"2026040313322600300_ref170","first-page":"995","article-title":"A mean field theory learning algorithm for neural network","volume":"1","author":"Peterson","year":"1987","journal-title":"Complex systems,"},{"issue":"5-6","key":"2026040313322600300_ref171","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational optimal transport","volume":"11","author":"Peyr\u00e9","year":"2019","journal-title":"Foundations and Trends\u00ae in Machine Learning,"},{"key":"2026040313322600300_ref172","volume":"373","author":"Pflug","year":"2012","journal-title":"Optimization of stochastic models: the interface between simulation and optimization,"},{"key":"2026040313322600300_ref173","volume-title":"International Conference on Learning Representations,","author":"Pogancic","year":"2020"},{"key":"2026040313322600300_ref174","first-page":"1346","volume-title":"Proceedings of the 20th International Conference on Computational Linguistics,","author":"Punyakanok","year":"2004"},{"key":"2026040313322600300_ref175","volume-title":"Pytorch,","author":"PyTorch Foundation","year":"2017"},{"issue":"2","key":"2026040313322600300_ref176","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"P. IEEE,"},{"key":"2026040313322600300_ref177","first-page":"814","article-title":"Black Box Variational Inference","volume":"33","author":"Ranganath","year":"2014","journal-title":"Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics,"},{"issue":"3","key":"2026040313322600300_ref178","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/s10994-011-5256-5","article-title":"Classifier chains for multi-label classification","volume":"85","author":"Read","year":"2011","journal-title":"Machine learning,"},{"issue":"83","key":"2026040313322600300_ref179","first-page":"2387","article-title":"Composite binary losses","volume":"11","author":"Reid","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"2026040313322600300_ref180","first-page":"7008","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Rennie","year":"2017"},{"key":"2026040313322600300_ref181","first-page":"1278","article-title":"Stochastic backpropagation and approximate inference in deep generative models","volume":"32","author":"Rezende","year":"2014","journal-title":"Proceedings of the 31st International Conference on Machine Learning,"},{"issue":"3","key":"2026040313322600300_ref182","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A Stochastic Approximation Method","volume":"22","author":"Robbins","year":"1951","journal-title":"The Annals of Mathematical Statistics,"},{"key":"2026040313322600300_ref183","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26","author":"Rolfe","year":"2017"},{"key":"2026040313322600300_ref184","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1145\/1102351.1102444","volume-title":"Proceedings of the 22nd international conference on Machine learning,","author":"Roth","year":"2005"},{"issue":"1","key":"2026040313322600300_ref185","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF02060943","article-title":"Sensitivity analysis of discrete event systems by the \u201cpush out\u201d method","volume":"39","author":"Rubinstein","year":"1992","journal-title":"Annals of Operations Research,"},{"key":"2026040313322600300_ref186","volume-title":"Unpublished manuscript, Technion, Haifa, Israel","author":"Rubinstein","year":"1976"},{"key":"2026040313322600300_ref187","volume-title":"Association for Computational Linguistics","author":"Rush","year":"2010"},{"key":"2026040313322600300_ref188","volume-title":"Torch-struct: Deep structured prediction library,","author":"Rush","year":"2020"},{"key":"2026040313322600300_ref189","doi-asserted-by":"crossref","first-page":"125","DOI":"10.3115\/1654494.1654507","volume-title":"Proceedings of the Ninth International Workshop on Parsing Technology,","author":"Sagae","year":"2005"},{"issue":"1","key":"2026040313322600300_ref190","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/TASSP.1978.1163055","article-title":"Dynamic programming algorithm optimization for spoken word recognition","volume":"26","author":"Sakoe","year":"1978","journal-title":"IEEE Transactions on Acoustics, Speech, and Signal Processing,"},{"key":"2026040313322600300_ref191","first-page":"448","article-title":"Deep boltzmann machines","volume":"5","author":"Salakhutdinov","year":"2009","journal-title":"Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics"},{"key":"2026040313322600300_ref192","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1613\/jair.251","article-title":"Mean field theory for sigmoid belief networks","volume":"4","author":"Saul","year":"1996","journal-title":"Journal of artificial intelligence research"},{"issue":"2","key":"2026040313322600300_ref193","doi-asserted-by":"crossref","first-page":"876","DOI":"10.1214\/aoms\/1177703591","volume":"35","author":"Sinkhorn","year":"1964","journal-title":"The annals of mathematical statistics,"},{"key":"2026040313322600300_ref194","doi-asserted-by":"publisher","first-page":"1691","DOI":"10.18653\/v1\/2022.emnlp-main.110","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,","author":"Stanojevic","year":"2022"},{"key":"2026040313322600300_ref195","volume-title":"arXiv preprint arXiv:2308.03291,","author":"Stanojevic","year":"2023"},{"key":"2026040313322600300_ref196","volume-title":"International Conference on Learning Representations,","author":"van Steenkiste","year":"2018"},{"key":"2026040313322600300_ref197","volume-title":"Proc. of AISTATS,","author":"Stoyanov","year":"2011"},{"key":"2026040313322600300_ref198","article-title":"Sequence to sequence learning with neural networks","volume":"27","author":"Sutskever","year":"2014","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"2026040313322600300_ref199","author":"Taskar","year":"2004"},{"key":"2026040313322600300_ref200","volume-title":"Proceedings of ICML,","author":"Tay","year":"2020"},{"key":"2026040313322600300_ref201","first-page":"1971","article-title":"Doubly stochastic variational bayes for non-conjugate inference","volume":"32","author":"Titsias","journal-title":"Proceedings of the 31st International Conference on Machine Learning,"},{"key":"2026040313322600300_ref202","article-title":"Local expectation gradients for black box variational inference","volume":"28","author":"Titsias","year":"2015","journal-title":"Advances in Neural Information Processing Systems,"},{"issue":"3","key":"2026040313322600300_ref203","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/jdwm.2007070101","article-title":"Multi-label classification: An overview","volume":"3","author":"Tsoumakas","year":"2007","journal-title":"International Journal of Data Warehousing and Mining (IJDWM),"},{"key":"2026040313322600300_ref204","volume-title":"Proceedings of NeurIPS","author":"Tucker","year":"2017"},{"key":"2026040313322600300_ref205","volume":"21","author":"Tutte","year":"2001","journal-title":"Graph theory,"},{"key":"2026040313322600300_ref206","first-page":"5035","article-title":"DVAE++: Discrete variational autoencoders with overlapping transformations","volume":"80","author":"Vahdat","year":"2018","journal-title":"Proceedings of the 35th International Conference on Machine Learning,"},{"issue":"2","key":"2026040313322600300_ref207","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/0304-3975(79)90044-6","article-title":"The complexity of computing the permanent","volume":"8","author":"Valiant","year":"1979","journal-title":"Theoretical Computer Science,"},{"key":"2026040313322600300_ref208","article-title":"Principles of risk minimization for learning theory","volume":"4","author":"Vapnik","year":"1991","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2026040313322600300_ref209","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems,"},{"issue":"3","key":"2026040313322600300_ref210","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1162\/089120101317066113","article-title":"A statistical model for word discovery in transcribed speech","volume":"27","author":"Venkataraman","year":"2001","journal-title":"Computational Linguistics,"},{"issue":"1","key":"2026040313322600300_ref211","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1007\/BF01074755","article-title":"Speech discrimination by dynamic programming","volume":"4","author":"Vintsyuk","year":"1968","journal-title":"Cybernetics,"},{"key":"2026040313322600300_ref212","volume-title":"Proc. of AISTATS,","author":"Vinyes","year":"2017"},{"issue":"2","key":"2026040313322600300_ref213","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1109\/TIT.1967.1054010","article-title":"Error bounds for convolutional codes and an asymptotically optimum decoding algorithm","volume":"13","author":"Viterbi","year":"1967","journal-title":"IEEE Transactions on Information Theory,"},{"issue":"1","key":"2026040313322600300_ref214","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1145\/321796.321811","article-title":"The string-to-string correction problem","volume":"21","author":"Wagner","year":"1974"},{"issue":"1\u20132","key":"2026040313322600300_ref215","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000001","article-title":"Graphical models, exponential families, and variational inference","volume":"1","author":"Wainwright","year":"2008","journal-title":"Foundations and Trends\u00ae in Machine Learning,"},{"issue":"3-4","key":"2026040313322600300_ref216","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1023\/A:1022672621406","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Machine Learning,"},{"key":"2026040313322600300_ref217","volume-title":"Journal of Machine Learning Research,","author":"Williamson","year":"2016"},{"issue":"1","key":"2026040313322600300_ref218","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1007\/BF01580381","article-title":"Finding the nearest point in a polytope","volume":"11","author":"Wolfe","year":"1976","journal-title":"Mathematical Programming,"},{"key":"2026040313322600300_ref219","volume-title":"Proceedings of ICML, PMLR","author":"Wong","year":"2021"},{"key":"2026040313322600300_ref220","first-page":"2048","article-title":"Show, attend and tell: Neural image caption generation with visual attention","volume":"37","author":"Xu","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning,"},{"key":"2026040313322600300_ref221","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1162\/tacl_a_00480","article-title":"Document summarization with latent queries","volume":"10","author":"Xu","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics,"},{"key":"2026040313322600300_ref222","doi-asserted-by":"publisher","first-page":"5048","DOI":"10.18653\/v1\/2022.emnlp-main.337","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,","author":"Yao","year":"2022"},{"issue":"2","key":"2026040313322600300_ref223","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/0022-2496(77)90026-8","article-title":"The relationship between luce\u2019s choice axiom, thurstone\u2019s theory of comparative judgment, and the double exponential distribution","volume":"15","author":"Yellott Jr.","year":"1977","journal-title":"Journal of Mathematical Psychology,"},{"issue":"2","key":"2026040313322600300_ref224","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/S0019-9958(67)80007-X","article-title":"Recognition and parsing of context-free languages in time n3","volume":"10","author":"Younger","year":"1967","journal-title":"Information and Control"},{"key":"2026040313322600300_ref225","doi-asserted-by":"crossref","DOI":"10.1142\/5021","volume-title":"Convex Analysis in General Vector Spaces.","author":"Z\u0103linescu","year":"2002"},{"key":"2026040313322600300_ref226","first-page":"1","volume-title":"Proceedings of the 1st Annual Meeting of the ELRA\/ISCA Special Interest Group on Under-Resourced Languages,","author":"Zanon Boito","year":"2022"},{"key":"2026040313322600300_ref227","volume-title":"ICLR2022 Workshop on the Elements of Reasoning: Objects, Structure and Causality,","author":"Zantedeschi","year":"2022"},{"key":"2026040313322600300_ref228","first-page":"42 046","article-title":"Revisiting structured variational autoencoders","volume":"202","author":"Zhao","year":"2023","journal-title":"Proceedings of the 40th International Conference on Machine Learning,"},{"key":"2026040313322600300_ref229","volume-title":"The Twelfth International Conference on Learning Representations,","author":"Zhou","year":"2024"},{"key":"2026040313322600300_ref230","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.824","volume-title":"Proceedings of EMNLP, Online and Punta Cana","author":"Zmigrod","year":"2021"}],"container-title":["Foundations and Trends\u00ae in Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftsig\/article-pdf\/19\/2\/99\/11161113\/2000000134en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftsig\/article-pdf\/19\/2\/99\/11161113\/2000000134en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T17:32:57Z","timestamp":1775237577000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftsig\/article\/19\/2\/99\/1332865\/Discrete-Latent-Structure-in-Neural-Networks"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":230,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1561\/2000000134","relation":{},"ISSN":["1932-8346","1932-8354"],"issn-type":[{"value":"1932-8346","type":"print"},{"value":"1932-8354","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,30]]}}}