{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T19:40:14Z","timestamp":1757706014891,"version":"3.41.0"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T00:00:00Z","timestamp":1714176000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFB3103700, 2022YFB3103704"],"award-info":[{"award-number":["2022YFB3103700, 2022YFB3103704"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276248, U21B2046"],"award-info":[{"award-number":["62276248, U21B2046"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association CAS","doi-asserted-by":"crossref","award":["2023111"],"award-info":[{"award-number":["2023111"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from three distinct NLU tasks based on five widely used pre-trained language models and find it particularly superior for models with few parameters or long input.<\/jats:p>","DOI":"10.1145\/3652599","type":"journal-article","created":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T12:03:11Z","timestamp":1710504191000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3766-0275","authenticated-orcid":false,"given":"Yunchang","family":"Zhu","sequence":"first","affiliation":[{"name":"CAS Key Laboratory of AI Security, Institute of Computing Technology, CAS; University of the Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1161-8546","authenticated-orcid":false,"given":"Liang","family":"Pang","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of AI Security, Institute of Computing Technology, CAS, Beijing China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2904-2296","authenticated-orcid":false,"given":"Kangxi","family":"Wu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of AI Security, Institute of Computing Technology, CAS; University of the Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7811-3262","authenticated-orcid":false,"given":"Yanyan","family":"Lan","sequence":"additional","affiliation":[{"name":"Institute for AI Industry Research, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1204-4820","authenticated-orcid":false,"given":"Huawei","family":"Shen","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of AI Security, Institute of Computing Technology, CAS; University of the Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5201-8195","authenticated-orcid":false,"given":"Xueqi","family":"Cheng","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of AI Security, Institute of Computing Technology, CAS; University of the Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,4,27]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/322017.322021"},{"key":"e_1_3_2_3_2","first-page":"20852","volume-title":"Advances in Neural Information Processing Systems","author":"Bartoldson Brian","year":"2020","unstructured":"Brian Bartoldson, Ari Morcos, Adrian Barbu, and Gordon Erlebacher. 2020. The generalization-stability tradeoff in neural network pruning. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Red Hook, NY, USA, 20852\u201320864. DOI:https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/ef2ee09ea9551de88bc11fd7eeea93b0-Abstract.html"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","unstructured":"Iz Beltagy Matthew E. Peters and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150 (2020). DOI:10.48550\/arXiv.2004.05150","DOI":"10.48550\/arXiv.2004.05150"},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the 2nd Text Analysis Conference (TAC \u201909)","author":"Bentivogli Luisa","year":"2009","unstructured":"Luisa Bentivogli, Peter Clark, Ido Dagan, and Danilo Giampiccolo. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proceedings of the 2nd Text Analysis Conference (TAC \u201909)."},{"key":"e_1_3_2_6_2","first-page":"129","article-title":"What is the state of neural network pruning?","volume":"2","author":"Blalock Davis","year":"2020","unstructured":"Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. 2020. What is the state of neural network pruning? Proceedings of Machine Learning and Systems 2 (March2020), 129\u2013146. DOI:https:\/\/proceedings.mlsys.org\/paper\/2020\/hash\/d2ddea18f00665ce8623e36bd4e3c7c5-Abstract.html","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_7_2","first-page":"2206","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Borgeaud Sebastian","year":"2022","unstructured":"Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego De Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack Rae, Erich Elsen, and Laurent Sifre. 2022. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning. 2206\u20132240. DOI:https:\/\/proceedings.mlr.press\/v162\/borgeaud22a.html"},{"key":"e_1_3_2_8_2","first-page":"1877","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei, 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33. Curran Associates, Red Hook, NY, USA, 1877\u20131901. DOI:https:\/\/papers.nips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html"},{"key":"e_1_3_2_9_2","first-page":"822","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Brutzkus Alon","year":"2019","unstructured":"Alon Brutzkus and Amir Globerson. 2019. Why do larger models generalize better? A theoretical perspective via the XOR problem. In Proceedings of the 36th International Conference on Machine Learning. 822\u2013830. DOI:https:\/\/proceedings.mlr.press\/v97\/brutzkus19b.html"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2001"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58334-7"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/AAI28114605"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525087"},{"key":"e_1_3_2_14_2","unstructured":"Zihan Chen Hongbo Zhang Xiaoji Zhang and Leqi Zhao. 2018. Quora Question Pairs. Retrieved March 26 2024 from DOI:https:\/\/www.kaggle.com\/c\/quora-question-pairs"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1078"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Clark Kevin","year":"2020","unstructured":"Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In Proceedings of the International Conference on Learning Representations. DOI:https:\/\/openreview.net\/forum?id=r1xMH1BtvB"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499178.2499179"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v9i4.952"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646059"},{"key":"e_1_3_2_20_2","article-title":"Overview of the TREC 2020 deep learning track","author":"Craswell Nick","year":"2021","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. arXiv:2102.07662 [cs] (2021). DOI:http:\/\/arxiv.org\/abs\/2102.07662","journal-title":"arXiv:2102.07662 [cs]"},{"key":"e_1_3_2_21_2","article-title":"Overview of the TREC 2019 deep learning track","author":"Craswell Nick","year":"2020","unstructured":"Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 deep learning track. arXiv:2003.07820 [cs] (2020). DOI:http:\/\/arxiv.org\/abs\/2003.07820","journal-title":"arXiv:2003.07820 [cs]"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_23_2","volume-title":"Proceedings of the 3rd International Workshop on Paraphrasing (IWP \u201905)","author":"Dolan William B.","year":"2005","unstructured":"William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the 3rd International Workshop on Paraphrasing (IWP \u201905). DOI:https:\/\/aclanthology.org\/I05-5002"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.561"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.775"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.100"},{"key":"e_1_3_2_28_2","volume-title":"Advances in Neural Information Processing Systems","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, New York, NY, USA, 1\u20139.DOI:https:\/\/papers.nips.cc\/paper\/2015\/hash\/ae0eb3eed39d2bcef4622b2499a05fe6-Abstract.html"},{"key":"e_1_3_2_29_2","article-title":"Gaussian error linear units (GELUs)","author":"Hendrycks Dan","year":"2016","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016).","journal-title":"arXiv preprint arXiv:1606.08415"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/1113.003.0010"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","unstructured":"Geoffrey E. Hinton Nitish Srivastava Alex Krizhevsky Ilya Sutskever and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012). DOI:10.48550\/arXiv.1207.0580","DOI":"10.48550\/arXiv.1207.0580"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324901002807"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","unstructured":"Gautier Izacard Patrick Lewis Maria Lomeli Lucas Hosseini Fabio Petroni Timo Schick Jane Dwivedi-Yu Armand Joulin Sebastian Riedel and Edouard Grave. 2022. Few-shot learning with retrieval augmented language models. arXiv:2208.03299 (2022). DOI:10.48550\/arXiv.2208.03299","DOI":"10.48550\/arXiv.2208.03299"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361 (2020). DOI:10.48550\/arXiv.2001.08361","DOI":"10.48550\/arXiv.2001.08361"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_2_36_2","volume-title":"Advances in Neural Information Processing Systems","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Red Hook, NY, USA, 1\u20139.DOI:https:\/\/papers.nips.cc\/paper\/2015\/hash\/bc7316929fe1545bf0b98d114ee3ecb8-Abstract.html"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","unstructured":"Alex Labach Hojjat Salehinejad and Shahrokh Valaee. 2019. Survey of dropout methods for deep neural networks. arXiv:1904.13310 (2019). DOI:10.48550\/arXiv.1904.13310","DOI":"10.48550\/arXiv.1904.13310"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations. DOI:https:\/\/openreview.net\/forum?id=H1eA7AEtvS"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3031549"},{"key":"e_1_3_2_40_2","volume-title":"Advances in Neural Information Processing Systems","author":"LeCun Yann","year":"1989","unstructured":"Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. In Advances in Neural Information Processing Systems, Vol. 2. Curran Associates, Red Hook, NY, USA, 598\u2013605. DOI:https:\/\/papers.nips.cc\/paper\/1989\/hash\/6c9882bbac1c7093bd25041881277658-Abstract.html"},{"key":"e_1_3_2_41_2","first-page":"10890","volume-title":"Advances in Neural Information Processing Systems","author":"Liang Xiaobo","year":"2021","unstructured":"Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, and Tie-Yan Liu. 2021. R-Drop: Regularized dropout for neural networks. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Red Hook, NY, 10890\u201310905. DOI:https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/5a66b9200f29ac3fa0ae244cc2a51b39-Abstract.html"},{"key":"e_1_3_2_42_2","volume-title":"Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC \u201902)","author":"Lin Jimmy","year":"2002","unstructured":"Jimmy Lin. 2002. The Web as a resource for question answering: Perspectives and challenges. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC \u201902). DOI:http:\/\/www.lrec-conf.org\/proceedings\/lrec2002\/pdf\/85.pdf"},{"key":"e_1_3_2_43_2","article-title":"A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques","author":"Lin Jimmy","year":"2021","unstructured":"Jimmy Lin and Xueguang Ma. 2021. A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques. arXiv:2106.14807 [cs] (2021). DOI:http:\/\/arxiv.org\/abs\/2106.14807","journal-title":"arXiv:2106.14807 [cs]"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9183698"},{"key":"e_1_3_2_45_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692 [cs] (2019). DOI:http:\/\/arxiv.org\/abs\/1907.11692","journal-title":"arXiv:1907.11692 [cs]"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.298"},{"key":"e_1_3_2_47_2","article-title":"Dropout with expectation-linear regularization","author":"Ma Xuezhe","year":"2016","unstructured":"Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yaoliang Yu, Yuntian Deng, and Eduard Hovy. 2016. Dropout with expectation-linear regularization. arXiv preprint arXiv:1609.08017 (2016).","journal-title":"arXiv preprint arXiv:1609.08017"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463262"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","unstructured":"Richard Meyes Melanie Lu Constantin Waubert de Puiseau and Tobias Meisen. 2019. Ablation studies in artificial neural networks. arXiv:1901.08644 (2019). DOI:10.48550\/arXiv.1901.08644","DOI":"10.48550\/arXiv.1901.08644"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1160"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290995"},{"key":"e_1_3_2_52_2","unstructured":"Tri Nguyen Mir Rosenberg Xia Song Jianfeng Gao Saurabh Tiwary Rangan Majumder and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches 2016 Co-Located with the 30th Annual Conference on Neural Information Processing Systems (CoCo@NIPS \u201916)."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348384"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482450"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016875"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10341"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401104"},{"issue":"140","key":"e_1_3_2_58_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367. DOI:http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.758"},{"key":"e_1_3_2_62_2","first-page":"1631","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631\u20131642."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6441"},{"key":"e_1_3_2_64_2","article-title":"Principles of risk minimization for learning theory","author":"Vapnik Vladimir","year":"1991","unstructured":"Vladimir Vapnik. 1991. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems, Vol. 4. Curran Associates, Red Hook, NY, USA, 831\u2013838.","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"e_1_3_2_65_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Red Hook, NY, USA, 6000\u20136010.DOI:https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_66_2","first-page":"1058","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Wan Li","year":"2013","unstructured":"Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using DropConnect. In Proceedings of the 30th International Conference on Machine Learning. 1058\u20131066. DOI:https:\/\/proceedings.mlr.press\/v28\/wan13.html"},{"key":"e_1_3_2_67_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations. DOI:https:\/\/openreview.net\/forum?id=rJ4km2R5t7"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.226"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525859"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00290"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_73_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Xiong Lee","year":"2020","unstructured":"Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In Proceedings of the International Conference on Learning Representations. DOI:https:\/\/openreview.net\/forum?id=zeFrfgyZln"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511808.3557388"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1259"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482124"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.424"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.334"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401332"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.293"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532017"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390524"},{"key":"e_1_3_2_83_2","article-title":"Fraternal dropout","author":"Zolna Konrad","year":"2017","unstructured":"Konrad Zolna, Devansh Arpit, Dendi Suhubdy, and Yoshua Bengio. 2017. Fraternal dropout. arXiv preprint arXiv:1711.00066 (2017).","journal-title":"arXiv preprint arXiv:1711.00066"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652599","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3652599","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:30Z","timestamp":1750291410000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,27]]},"references-count":82,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3652599"],"URL":"https:\/\/doi.org\/10.1145\/3652599","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2024,4,27]]},"assertion":[{"value":"2023-01-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}