{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T05:46:52Z","timestamp":1751521612450},"reference-count":32,"publisher":"MIT Press","issue":"8","content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,14]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Often in language and other areas of cognition, whether two components of an object are identical or not determines if it is well formed. We call such constraints identity effects. When developing a system to learn well-formedness from examples, it is easy enough to build in an identity effect. But can identity effects be learned from the data without explicit guidance? We provide a framework in which we can rigorously prove that algorithms satisfying simple criteria cannot make the correct inference. We then show that a broad class of learning algorithms, including deep feedforward neural networks trained via gradient-based algorithms (such as stochastic gradient descent or the Adam method), satisfies our criteria, dependent on the encoding of inputs. In some broader circumstances, we are able to provide adversarial examples that the network necessarily classifies incorrectly. Finally, we demonstrate our theory with computational experiments in which we explore the effect of different input encodings on the ability of algorithms to generalize to novel inputs. This allows us to show similar effects to those predicted by theory for more realistic methods that violate some of the conditions of our theoretical results.<\/jats:p>","DOI":"10.1162\/neco_a_01510","type":"journal-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T23:46:28Z","timestamp":1657237588000},"page":"1756-1789","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":4,"title":["Invariance, Encodings, and Generalization: Learning Identity Effects With Neural Networks"],"prefix":"10.1162","volume":"34","author":[{"given":"S.","family":"Brugiapaglia","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, Concordia University, Montreal, Quebec, H3G 1M8, Canada simone.brugiapaglia@concordia.ca"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M.","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Concordia University, Montreal, Quebec, H3G 1M8, Canada matthew.liu@mail.concordia.ca"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P.","family":"Tupper","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada pft3@sfu.ca"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2022,7,14]]},"reference":[{"key":"2022071522333247700_B1","first-page":"77","volume-title":"Papers in optimality theory","author":"Benua","year":"1995"},{"key":"2022071522333247700_B2","volume-title":"Probability and measure","author":"Billingsley","year":"2008"},{"key":"2022071522333247700_B3","article-title":"Debate: Yoshua Bengio and Gary Marcus: The best way forward for AI","author":"Boucher","year":"2020"},{"key":"2022071522333247700_B4","article-title":"Keras","author":"Chollet","year":"2015"},{"key":"2022071522333247700_B5","article-title":"Simple MNIST convnet (Keras)","author":"Chollet","year":"2020"},{"key":"2022071522333247700_B6","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1145\/1014052.1014066","article-title":"Adversarial classification","volume-title":"Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Dalvi","year":"2004"},{"key":"2022071522333247700_B7","author":"Devlin","year":"2018","journal-title":"Bert: Pre-training of deep bidirectional transformers for language understanding"},{"issue":"2","key":"2022071522333247700_B8","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1017\/S0952675713000134","article-title":"Learning the identity effect as an artificial language: Bias and generalisation","volume":"30","author":"Gallagher","year":"2013","journal-title":"Phonology"},{"issue":"6368","key":"2022071522333247700_B9","doi-asserted-by":"publisher","DOI":"10.1126\/science.aag2612","article-title":"A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs","volume":"358","author":"George","year":"2017","journal-title":"Science"},{"issue":"2","key":"2022071522333247700_B10","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1023\/B:NALA.0000015789.98638.f9","article-title":"Contrastive focus reduplication in English (the salad-salad paper)","volume":"22","author":"Ghomeshi","year":"2004","journal-title":"Natural Language and Linguistic Theory"},{"key":"2022071522333247700_B11","article-title":"Understanding the difficulty of training deep feedforward neural networks","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics.","author":"Glorot","year":"2010"},{"key":"2022071522333247700_B12","volume-title":"Deep learning","author":"Goodfellow","year":"2016"},{"key":"2022071522333247700_B13","author":"Goodfellow","year":"2014","journal-title":"Explaining and harnessing adversarial examples"},{"issue":"10","key":"2022071522333247700_B14","doi-asserted-by":"publisher","first-page":"2222","DOI":"10.1109\/TNNLS.2016.2582924","article-title":"LSTM: A search space odyssey","volume":"28","author":"Greff","year":"2016","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"8","key":"2022071522333247700_B15","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2022071522333247700_B16","author":"Kingma","year":"2014","journal-title":"Adam: A method for stochastic optimization."},{"key":"2022071522333247700_B17","volume-title":"Probabilistic graphical models: Principles and techniques","author":"Koller","year":"2009"},{"key":"2022071522333247700_B18","article-title":"MNIST handwritten digit database","author":"LeCun","year":"2010"},{"issue":"5413","key":"2022071522333247700_B19","first-page":"436","article-title":"Do infants learn grammar with algebra or statistics? Response","volume":"284","author":"Marcus","year":"1999","journal-title":"Science"},{"key":"2022071522333247700_B20","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/1187.001.0001","volume-title":"The algebraic mind: Integrating connectionism and cognitive science","author":"Marcus","year":"2001"},{"key":"2022071522333247700_B21","volume-title":"Rebooting AI: Building artificial intelligence we can trust","author":"Marcus","year":"2019"},{"issue":"5","key":"2022071522333247700_B22","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1016\/S1364-6613(99)01320-0","article-title":"Does generalization in infant learning implicate abstract algebra-like rules?","volume":"3","author":"McClelland","year":"1999","journal-title":"Trends in Cognitive Sciences"},{"issue":"5","key":"2022071522333247700_B23","first-page":"592","article-title":"How to generate random matrices from the classical compact groups","volume":"54","author":"Mezzadri","year":"2007","journal-title":"Notices of the American Mathematical Society"},{"key":"2022071522333247700_B24","first-page":"1","article-title":"Trigger poverty and reduplicative identity in Lakota","author":"Paschen","year":"2021","journal-title":"Natural Language and Linguistic Theory"},{"key":"2022071522333247700_B25","doi-asserted-by":"crossref","first-page":"93","DOI":"10.18653\/v1\/W18-5810","article-title":"Seq2seq models with dropout can learn generalizable reduplication","volume-title":"Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology","author":"Prickett","year":"2018"},{"key":"2022071522333247700_B26","article-title":"Learning reduplication with a neural network without explicit variables","author":"Prickett","year":"2019"},{"key":"2022071522333247700_B27","author":"Radford","year":"2018","journal-title":"Improving language understanding by generative pre-training"},{"issue":"6088","key":"2022071522333247700_B28","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"2022071522333247700_B29","author":"Thesing","year":"2019","journal-title":"What do AI algorithms actually learn? On false structures in deep learning."},{"key":"2022071522333247700_B30","article-title":"Which learning algorithms can generalize identity-based rules to novel inputs?","volume-title":"Proceedings of the 28th Annual Meeting of the Cognitive Science Society","author":"Tupper","year":"2016"},{"key":"2022071522333247700_B31","first-page":"5998","volume-title":"Advances in neural information processing systems","author":"Vaswani"},{"key":"2022071522333247700_B32","author":"Zeiler","year":"2012","journal-title":"ADADELTA: An adaptive learning rate method"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/8\/1756\/2034946\/neco_a_01510.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/neco\/article-pdf\/34\/8\/1756\/2034946\/neco_a_01510.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,11]],"date-time":"2023-02-11T03:44:56Z","timestamp":1676087096000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/34\/8\/1756\/111784\/Invariance-Encodings-and-Generalization-Learning"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,14]]},"references-count":32,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,7,14]]},"published-print":{"date-parts":[[2022,7,14]]}},"URL":"https:\/\/doi.org\/10.1162\/neco_a_01510","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8]]},"published":{"date-parts":[[2022,7,14]]}}}