{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T02:15:39Z","timestamp":1769566539299,"version":"3.49.0"},"reference-count":36,"publisher":"MIT Press - Journals","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p> Backpropagation (BP) is the cornerstone of today's deep learning algorithms, but it is inefficient partially because of backward locking, which means updating the weights of one layer locks the weight updates in the other layers. Consequently, it is challenging to apply parallel computing or a pipeline structure to update the weights in different layers simultaneously. In this letter, we introduce a novel learning structure, associated learning (AL), that modularizes the network into smaller components, each of which has a local objective. Because the objectives are mutually independent, AL can learn the parameters in different layers independently and simultaneously, so it is feasible to apply a pipeline structure to improve the training throughput. Specifically, this pipeline structure improves the complexity of the training time from [Formula: see text], which is the time complexity when using BP and stochastic gradient descent (SGD) for training, to [Formula: see text], where [Formula: see text] is the number of training instances and [Formula: see text] is the number of hidden layers. Surprisingly, even though most of the parameters in AL do not directly interact with the target variable, training deep models by this method yields accuracies comparable to those from models trained using typical BP methods, in which all parameters are used to predict the target variable. Consequently, because of the scalability and the predictive power demonstrated in the experiments, AL deserves further study to determine the better hyperparameter settings, such as activation function selection, learning rate scheduling, and weight initialization, to accumulate experience, as we have done over the years with the typical BP method. In addition, perhaps our design can also inspire new network designs for deep learning. Our implementation is available at https:\/\/github.com\/SamYWK\/Associated_Learning . <\/jats:p>","DOI":"10.1162\/neco_a_01335","type":"journal-article","created":{"date-parts":[[2020,10,20]],"date-time":"2020-10-20T21:25:44Z","timestamp":1603229144000},"page":"174-193","source":"Crossref","is-referenced-by-count":3,"title":["Associated Learning: Decomposing End-to-End Backpropagation Based on Autoencoders and Target Propagation"],"prefix":"10.1162","volume":"33","author":[{"given":"Yu-Wei","family":"Kao","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University, Taoyuan, 32001, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hung-Hsuan","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University, Taoyuan, 32001, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"key":"B1","author":"Allen-Zhu Z.","year":"2018","journal-title":"A convergence theory for deep learning via over-parameterization."},{"key":"B2","author":"Arora S.","year":"2018","journal-title":"On the optimization of deep networks: Implicit acceleration by overparameterization"},{"key":"B3","first-page":"37","volume-title":"Proceedings of ICML Workshop on Unsupervised and Transfer Learning","author":"Baldi P.","year":"2012"},{"key":"B4","first-page":"485","volume-title":"Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence","author":"Balduzzi D.","year":"2015"},{"key":"B5","first-page":"9390","volume-title":"Advances in neural information processing systems","author":"Bartunov S.","year":"2018"},{"key":"B6","author":"Belilovsky E.","year":"2018","journal-title":"Greedy layerwise learning can scale to imagenet."},{"key":"B7","author":"Belilovsky E.","year":"2019","journal-title":"Decoupled greedy learning of CNNs"},{"key":"B8","author":"Bengio Y.","year":"2014","journal-title":"How auto-encoders could provide credit assignment in deep networks via target propagation."},{"key":"B9","author":"Bengio Y.","year":"2015","journal-title":"Towards biologically plausible deep learning"},{"key":"B10","first-page":"281","volume":"13","author":"Bergstra J.","year":"2012","journal-title":"Journal of Machine Learning Research"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-30508-6_1"},{"key":"B12","author":"Chen H.-H.","year":"2017","journal-title":"Weighted-SVD: Matrix factorization with weights on the latent factors"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.5220\/0009885600890097"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.1038\/337129a0"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"B17","first-page":"103","volume-title":"Advances in neural information processing systems","volume":"32","author":"Huang Y.","year":"2019"},{"key":"B18","first-page":"6659","volume-title":"Advances in neural information processing systems","author":"Huo Z.","year":"2018"},{"key":"B19","author":"Huo Z.","year":"2018","journal-title":"Decoupled parallel backpropagation with convergence guarantee"},{"key":"B20","author":"Jaderberg M.","year":"2016","journal-title":"Decoupled neural interfaces using synthetic gradients."},{"key":"B21","author":"Krizhevsky A.","year":"2009","journal-title":"Learning multiple layers of features from tiny images"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23528-8_31"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms13276"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2909737"},{"key":"B26","first-page":"2579","volume":"9","author":"Maaten L. v. d.","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1973.5408500"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2018.00608"},{"key":"B29","first-page":"1037","volume-title":"Proceedings of the 30th Conference on Neural Information Processing Systems","author":"N\u00f8kland A.","year":"2016"},{"key":"B30","author":"N\u00f8kland A.","year":"2019","journal-title":"Training neural networks with local error signals."},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"key":"B32","first-page":"3856","volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems","author":"Sabour S.","year":"2017"},{"key":"B33","author":"Shallue C. J.","year":"2018","journal-title":"Measuring the effects of data parallelism on neural network training"},{"key":"B34","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Simonyan K.","year":"2015"},{"key":"B35","first-page":"2722","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Taylor G.","year":"2016"},{"key":"B36","first-page":"2595","volume-title":"Advances in neural information processing systems","volume":"23","author":"Zinkevich M.","year":"2010"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/neco_a_01335","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:43:55Z","timestamp":1615585435000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/33\/1\/174-193\/95655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1162\/neco_a_01335"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_01335","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]}}}