{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T04:31:14Z","timestamp":1718944274481},"reference-count":78,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2022,12,19]],"date-time":"2022-12-19T00:00:00Z","timestamp":1671408000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>There are many types of approaches for Paraphrase Identification (PI), an NLP task of determining whether a sentence pair has equivalent semantics. Traditional approaches mainly consist of unsupervised learning and feature engineering, which are computationally inexpensive. However, their task performance is moderate nowadays. To seek a method that can preserve the low computational costs of traditional approaches but yield better task performance, we take an investigation into neural network-based transfer learning approaches. We discover that by improving the usage of parameters efficiently for feature-based transfer, our research goal can be accomplished. Regarding the improvement, we propose a pre-trained task-specific architecture. The fixed parameters of the pre-trained architecture can be shared by multiple classifiers with small additional parameters. As a result, the computational cost left involving parameter update is only generated from classifier-tuning: the features output from the architecture combined with lexical overlap features are fed into a single classifier for tuning. Furthermore, the pre-trained task-specific architecture can be applied to natural language inference and semantic textual similarity tasks as well. Such technical novelty leads to slight consumption of computational and memory resources for each task and is also conducive to power-efficient continual learning. The experimental results show that our proposed method is competitive with adapter-BERT (a parameter-efficient fine-tuning approach) over some tasks while consuming only 16% trainable parameters and saving 69-96% time for parameter update.<\/jats:p>","DOI":"10.1017\/s135132492200050x","type":"journal-article","created":{"date-parts":[[2022,12,19]],"date-time":"2022-12-19T06:25:02Z","timestamp":1671431102000},"page":"1066-1096","update-policy":"http:\/\/dx.doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":1,"title":["Parameter-efficient feature-based transfer for paraphrase identification"],"prefix":"10.1017","volume":"29","author":[{"given":"Xiaodong","family":"Liu","sequence":"first","affiliation":[]},{"given":"Rafal","family":"Rzepka","sequence":"additional","affiliation":[]},{"given":"Kenji","family":"Araki","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,12,19]]},"reference":[{"key":"S135132492200050X_ref45","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/S14-2001"},{"key":"S135132492200050X_ref43","volume-title":"Advances in Neural Information Processing Systems","volume":"30.","author":"Lopez-Paz","year":"2017"},{"key":"S135132492200050X_ref28","first-page":"2790","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97.","author":"Houlsby","year":"2019"},{"key":"S135132492200050X_ref36","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12022"},{"key":"S135132492200050X_ref11","unstructured":"de Masson d\u2019Autume, C. , Ruder, S. , Kong, L. and Yogatama, D. (2019). Episodic memory in lifelong language learning. In Wallach, H. , Larochelle, H. , Beygelzimer, A. , d\u2019Alch\u00e9-Buc, F. , Fox, E. and Garnett, R. (eds), Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc."},{"key":"S135132492200050X_ref47","doi-asserted-by":"publisher","DOI":"10.1016\/S0079-7421(08)60536-8"},{"key":"S135132492200050X_ref10","doi-asserted-by":"publisher","DOI":"10.3115\/1687878.1687944"},{"key":"S135132492200050X_ref35","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-srw.24"},{"key":"S135132492200050X_ref50","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems","author":"Mikolov","year":"2013"},{"key":"S135132492200050X_ref21","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(99)01294-2"},{"key":"S135132492200050X_ref60","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.704"},{"key":"S135132492200050X_ref44","unstructured":"Madnani, N. , Tetreault, J. and Chodorow, M. (2012). Re-examining machine translation metrics for paraphrase identification. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montr\u00e9al, Canada. Association for Computational Linguistics, pp. 182\u2013190."},{"key":"S135132492200050X_ref33","doi-asserted-by":"publisher","DOI":"10.1111\/j.1469-8137.1912.tb05611.x"},{"key":"S135132492200050X_ref63","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355"},{"key":"S135132492200050X_ref72","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1091"},{"key":"S135132492200050X_ref29","unstructured":"Houlsby, N. , Giurgiu, A. , Jastrzebski, S. , Morrone, B. , de Laroussilhe, Q. , Gesmundo, A. , Attariyan, M. and Gelly, S. (2019b). Parameter-efficient transfer learning for nlp."},{"key":"S135132492200050X_ref23","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1004"},{"key":"S135132492200050X_ref54","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"S135132492200050X_ref2","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553380"},{"key":"S135132492200050X_ref76","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-6106"},{"key":"S135132492200050X_ref25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1181"},{"key":"S135132492200050X_ref48","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.199"},{"key":"S135132492200050X_ref40","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.figlang-1.3"},{"key":"S135132492200050X_ref62","first-page":"801","volume-title":"Advances in Neural Information Processing Systems","author":"Socher","year":"2011"},{"key":"S135132492200050X_ref42","unstructured":"Liu, X. , Zheng, Y. , Du, Z. , Ding, M. , Qian, Y. , Yang, Z. and Tang, J. (2021). Gpt understands, too."},{"key":"S135132492200050X_ref34","unstructured":"Ji, Y. and Eisenstein, J. (2013). Discriminative improvements to distributional sentence similarity. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 891\u2013896."},{"key":"S135132492200050X_ref61","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1113"},{"key":"S135132492200050X_ref78","unstructured":"Zhang, Y. , Baldridge, J. and He, L. (2019). PAWS: paraphrase adversaries from word scrambling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 1298\u20131308."},{"key":"S135132492200050X_ref41","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-0907"},{"key":"S135132492200050X_ref32","unstructured":"Iyer, S. , Dandekar, N. and Csernai, K. (2017). First Quora Dataset Release: Question Pairs. Available at: https:\/\/quoradata.quora.com\/First-Quora-Dataset-Release-Question-Pairs."},{"key":"S135132492200050X_ref51","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00374"},{"key":"S135132492200050X_ref49","unstructured":"Mikolov, T. , Grave, E. , Bojanowski, P. , Puhrsch, C. and Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA)."},{"key":"S135132492200050X_ref8","unstructured":"Conneau, A. and Kiela, D. (2018). SentEval: an evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA)."},{"key":"S135132492200050X_ref68","unstructured":"Wan, S. , Dras, M. , Dale, R. and Paris, C. (2006). Using dependency-based features to take the\u2019para-farce\u2019out of paraphrase. In Proceedings of the Australasian Language Technology Workshop 2006, pp. 131\u2013138."},{"key":"S135132492200050X_ref58","unstructured":"Rusu, A.A. , Rabinowitz, N.C. , Desjardins, G. , Soyer, H. , Kirkpatrick, J. , Kavukcuoglu, K. , Pascanu, R. and Hadsell, R. (2016). Progressive neural networks."},{"key":"S135132492200050X_ref12","article-title":"Optimal subarchitecture extraction for BERT","author":"de Wynter","year":"2020","journal-title":"CoRR"},{"key":"S135132492200050X_ref70","unstructured":"Weston, J. , Bordes, A. , Chopra, S. , Rush, A. M. , van Merri\u00ebnboer, B. , Joulin, A. and Mikolov, T. (2015). Towards ai-complete question answering: a set of prerequisite toy tasks."},{"key":"S135132492200050X_ref24","unstructured":"Guo, W. and Diab, M. (2012). Modeling sentences in the latent space. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-volume 1. Association for Computational Linguistics, pp. 864\u2013872."},{"key":"S135132492200050X_ref73","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1154"},{"key":"S135132492200050X_ref13","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"S135132492200050X_ref26","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.611"},{"key":"S135132492200050X_ref71","first-page":"5753","volume-title":"Advances in Neural Information Processing Systems 32","author":"Yang","year":"2019"},{"key":"S135132492200050X_ref65","first-page":"181","volume-title":"Lifelong Learning Algorithms","author":"Thrun","year":"1998"},{"key":"S135132492200050X_ref30","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1031"},{"key":"S135132492200050X_ref64","unstructured":"Sun, F.-K. , Ho, C.-H. and Lee, H.-Y. (2020). LAMOL: LAnguage MOdeling for lifelong language learning. In International Conference on Learning Representations."},{"key":"S135132492200050X_ref74","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00097"},{"key":"S135132492200050X_ref6","unstructured":"Chandrasekaran, D. and Mago, V. (2020). Evolution of semantic similarity - A survey. CoRR, abs\/2004.13820."},{"key":"S135132492200050X_ref22","unstructured":"Gammerman, A. , Vovk, V. and Vapnik, V. (1998). Learning by transduction. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc, pp. 148\u2013155."},{"key":"S135132492200050X_ref18","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1184"},{"key":"S135132492200050X_ref75","volume-title":"Advances in Neural Information Processing Systems","volume":"27","author":"Yosinski","year":"2014"},{"key":"S135132492200050X_ref56","unstructured":"Radford, A. , Wu, J. , Child, R. , Luan, D. , Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners."},{"key":"S135132492200050X_ref16","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220406"},{"key":"S135132492200050X_ref53","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"S135132492200050X_ref57","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"S135132492200050X_ref20","unstructured":"Fernando, S. and Stevenson, M. (2008). A semantic similarity approach to paraphrase detection. In Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45\u201352."},{"key":"S135132492200050X_ref37","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"S135132492200050X_ref17","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1166"},{"key":"S135132492200050X_ref38","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1126"},{"key":"S135132492200050X_ref39","unstructured":"Lan, Z. , Chen, M. , Goodman, S. , Gimpel, K. , Sharma, P. and Soricut, R. (2019). ALBERT: a lite BERT for self-supervised learning of language representations. CoRR, abs\/1909.11942."},{"key":"S135132492200050X_ref46","unstructured":"McCann, B. , Keskar, N.S. , Xiong, C. and Socher, R. (2018). The natural language decathlon: multitask learning as question answering. CoRR, abs\/1806.08730."},{"key":"S135132492200050X_ref55","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-3049"},{"key":"S135132492200050X_ref7","unstructured":"Chen, T. , Goodfellow, I. and Shlens, J. (2016). Net2net: accelerating learning via knowledge transfer."},{"key":"S135132492200050X_ref52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.552"},{"key":"S135132492200050X_ref59","unstructured":"Sanh, V. , Debut, L. , Chaumond, J. and Wolf, T. (2019). Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108."},{"key":"S135132492200050X_ref27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.201"},{"key":"S135132492200050X_ref69","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5446"},{"key":"S135132492200050X_ref31","volume-title":"Advances in Neural Information Processing Systems","volume":"32.","author":"Hung","year":"2019"},{"key":"S135132492200050X_ref66","unstructured":"van de Ven, G. M. and Tolias, A. S. (2019). Three scenarios for continual learning."},{"key":"S135132492200050X_ref9","doi-asserted-by":"publisher","DOI":"10.26615\/978-954-452-072-4_035"},{"key":"S135132492200050X_ref4","first-page":"95","volume-title":"Multitask Learning","author":"Caruana","year":"1998"},{"key":"S135132492200050X_ref5","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2001"},{"key":"S135132492200050X_ref67","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"S135132492200050X_ref3","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001493000339"},{"key":"S135132492200050X_ref15","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S135132492200050X_ref14","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"S135132492200050X_ref1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1371"},{"key":"S135132492200050X_ref19","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"S135132492200050X_ref77","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00391"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S135132492200050X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T08:59:28Z","timestamp":1689757168000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S135132492200050X\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,19]]},"references-count":78,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["S135132492200050X"],"URL":"https:\/\/doi.org\/10.1017\/s135132492200050x","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,19]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}