{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T06:16:04Z","timestamp":1655360164138},"reference-count":39,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper describes <jats:italic>gft<\/jats:italic> (general fine-tuning), a little language for deep nets, introduced at an ACL-2022 tutorial. <jats:italic>gft<\/jats:italic> makes deep nets accessible to a broad audience including non-programmers. It is standard practice in many fields to use statistics packages such as R. One should not need to know how to program in order to fit a regression or classification model and to use the model to make predictions for novel inputs. With <jats:italic>gft<\/jats:italic>, fine-tuning and inference are similar to fit and predict in regression and classification. <jats:italic>gft<\/jats:italic> demystifies deep nets; no one would suggest that regression-like methods are \u201cintelligent.\u201d<\/jats:p>","DOI":"10.1017\/s1351324922000237","type":"journal-article","created":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T07:18:12Z","timestamp":1653290292000},"page":"519-535","update-policy":"http:\/\/dx.doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":0,"title":["Emerging trends: General fine-tuning (<i>gft<\/i>)"],"prefix":"10.1017","volume":"28","author":[{"given":"Kenneth","family":"Ward Church","sequence":"first","affiliation":[]},{"given":"Xingyu","family":"Cai","sequence":"additional","affiliation":[]},{"given":"Yibiao","family":"Ying","sequence":"additional","affiliation":[]},{"given":"Zeyu","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Guangxu","family":"Xun","sequence":"additional","affiliation":[]},{"given":"Yuchen","family":"Bian","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2022,5,23]]},"reference":[{"key":"S1351324922000237_ref3","doi-asserted-by":"publisher","DOI":"10.1145\/6424.315691"},{"key":"S1351324922000237_ref20","unstructured":"Hinton, G. , Vinyals, O. and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531."},{"key":"S1351324922000237_ref15","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S1351324922000237_ref23","unstructured":"Liu, Y. , Ott, M. , Goyal, N. , Du, J. , Joshi, M. , Chen, D. , Levy, O. , Lewis, M. , Zettlemoyer, L. and Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach."},{"key":"S1351324922000237_ref30","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"S1351324922000237_ref36","unstructured":"Wang, A. , Pruksachatkun, Y. , Nangia, N. , Singh, A. , Michael, J. , Hill, F. , Levy, O. and Bowman, S. (2019). Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems 32."},{"key":"S1351324922000237_ref16","unstructured":"Du, J. , Na, X. , Liu, X. and Bu, H. (2018). Aishell-2: Transforming mandarin asr research into industrial scale. arXiv preprint arXiv:1808.10583."},{"key":"S1351324922000237_ref37","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5446"},{"key":"S1351324922000237_ref35","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324901002789"},{"key":"S1351324922000237_ref17","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00413"},{"key":"S1351324922000237_ref31","unstructured":"Su, W. , Chen, X. , Feng, S. , Liu, J. , Liu, W. , Sun, Y. , Tian, H. , Wu, H. and Wang, H. (2021). Ernie-tiny: A progressive distillation framework for pretrained transformer compression. arXiv preprint arXiv:2106.02241."},{"key":"S1351324922000237_ref19","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"S1351324922000237_ref1","volume-title":"The AWK Programming Language","author":"Aho","year":"1987"},{"key":"S1351324922000237_ref4","unstructured":"Bommasani, R. , Hudson, D.A. , Adeli, E. , Altman, R. , Arora, S. , von Arx, S. , Bernstein, M.S. , Bohg, J. , Bosselut, A. , Brunskill, E. , Brynjolfsson, E. , Buch, S. , Card, D. , Castellon, R. , Chatterji, N. , Chen, A. , Creel, K. , Davis, J.Q. , Demszky, D. , Donahue, C. , Doumbouya, M. , Durmus, E. , Ermon, S. , Etchemendy, J. , Ethayarajh, K. , Fei-Fei, L. , Finn, C. , Gale, T. , Gillespie, L. , Goel, K. , Goodman, N. , Grossman, S. , Guha, N. , Hashimoto, T. , Henderson, P. , Hewitt, J. , Ho, D.E. , Hong, J. , Hsu, K. , Huang, J. , Icard, T. , Jain, S. , Jurafsky, D. , Kalluri, P. , Karamcheti, S. , Keeling, G. , Khani, F. , Khattab, O. , Kohd, P.W. , Krass, M. , Krishna, R. , Kuditipudi, R. , Kumar, A. , Ladhak, F. , Lee, M. , Lee, T. , Leskovec, J. , Levent, I. , Li, X.L. , Li, X. , Ma, T. , Malik, A. , Manning, C.D. , Mirchandani, S. , Mitchell, E. , Munyikwa, Z. , Nair, S. , Narayan, A. , Narayanan, D. , Newman, B. , Nie, A. , Niebles, J.C. , Nilforoshan, H. , Nyarko, J. , Ogut, G. , Orr, L. , Papadimitriou, I. , Park, J.S. , Piech, C. , Portelance, E. , Potts, C. , Raghunathan, A. , Reich, R. , Ren, H. , Rong, F. , Roohani, Y. , Ruiz, C. , Ryan, J. , R\u00e9, C. , Sadigh, D. , Sagawa, S. , Santhanam, K. , Shih, A. , Srinivasan, K. , Tamkin, A. , Taori, R. , Thomas, A.W. , Tram\u00e8r, F. , Wang, R.E. , Wang, W. , Wu, B. , Wu, J. , Wu, Y. , Xie, S.M. , Yasunaga, M. , You, J. , Zaharia, M. , Zhang, M. , Zhang, T. , Zhang, X. , Zhang, Y. , Zheng, L. , Zhou, K. and Liang, P. (2021). On the opportunities and risks of foundation models."},{"key":"S1351324922000237_ref5","unstructured":"Brown, T.B. , Mann, B. , Ryder, N. , Subbiah, M. , Kaplan, J. , Dhariwal, P. , Neelakantan, A. , Shyam, P. , Sastry, G. , Askell, A. , Agarwal, S. , Herbert-Voss, A. , Krueger, G. , Henighan, T. , Child, R. , Ramesh, A. , Ziegler, D.M. , Wu, J. , Winter, C. , Hesse, C. , Chen, M. , Sigler, E. , Litwin, M. , Gray, S. , Chess, B. , Clark, J. , Berner, C. , McCandlish, S. , Radford, A. , Sutskever, I. and Amodei, D. (2020). Language models are few-shot learners. NeurIPS."},{"key":"S1351324922000237_ref25","unstructured":"Moore, G.A. and McKenna, R. (1999). Crossing the Chasm. Capstone Oxford."},{"key":"S1351324922000237_ref14","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"S1351324922000237_ref10","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324922000043"},{"key":"S1351324922000237_ref29","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2124"},{"key":"S1351324922000237_ref11","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000231"},{"key":"S1351324922000237_ref7","doi-asserted-by":"crossref","unstructured":"Chelba, C. , Mikolov, T. , Schuster, M. , Ge, Q. , Brants, T. , Koehn, P. and Robinson, T. (2013). One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005.","DOI":"10.21437\/Interspeech.2014-564"},{"key":"S1351324922000237_ref33","doi-asserted-by":"crossref","unstructured":"Sun, Y. , Wang, S. , Li, Y. , Feng, S. , Tian, H. , Wu, H. and Wang, H. (2020). Ernie 2.0: A continual pre-training framework for language understanding. AAAI.","DOI":"10.1609\/aaai.v34i05.6428"},{"key":"S1351324922000237_ref38","unstructured":"Wu, B. , Xu, C. , Dai, X. , Wan, A. , Zhang, P. , Yan, Z. , Tomizuka, M. , Gonzalez, J. , Keutzer, K. and Vajda, P. (2020). Visual transformers: Token-based image representation and processing for computer vision."},{"key":"S1351324922000237_ref28","unstructured":"Radford, A. , Wu, J. , Child, R. , Luan, D. , Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog."},{"key":"S1351324922000237_ref9","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000322"},{"key":"S1351324922000237_ref21","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1031"},{"key":"S1351324922000237_ref2","unstructured":"Baevski, A. , Zhou, H. , Mohamed, A. and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477."},{"key":"S1351324922000237_ref6","unstructured":"Buck, C. , Heafield, K. and van Ooyen, B. (2014). N-gram counts and language models from the common crawl. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland: European Languages Resources Association (ELRA), pp. 3579\u20133584."},{"key":"S1351324922000237_ref8","unstructured":"Chowdhery, A. , Narang, S. , Devlin, J. , Bosma, M. , Mishra, G. , Roberts, A. , Barham, P. , Chung, H.W. , Sutton, C. , Gehrmann, S. , et al. (2022). PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311."},{"key":"S1351324922000237_ref12","doi-asserted-by":"crossref","unstructured":"Conneau, A. , Khandelwal, K. , Goyal, N. , Chaudhary, V. , Wenzek, G. , Guzm\u00e1n, F. , Grave, E. , Ott, M. , Zettlemoyer, L. and Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. CoRR, abs\/1911.02116.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"S1351324922000237_ref22","unstructured":"Ito, K. and Johnson, L. (2017). The LJ speech dataset. https:\/\/keithito.com\/LJ-Speech-Dataset\/"},{"key":"S1351324922000237_ref26","volume-title":"The Measurement of Meaning","volume":"47","author":"Osgood","year":"1957"},{"key":"S1351324922000237_ref13","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324920000601"},{"key":"S1351324922000237_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3800(02)00204-1"},{"key":"S1351324922000237_ref27","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"S1351324922000237_ref24","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1017"},{"key":"S1351324922000237_ref32","unstructured":"Sun, Y. , Wang, S. , Feng, S. , Ding, S. , Pang, C. , Shang, J. , Liu, J. , Chen, X. , Zhao, Y. , Lu, Y. , et al. (2021). Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137."},{"key":"S1351324922000237_ref34","doi-asserted-by":"publisher","DOI":"10.1177\/107769905303000401"},{"key":"S1351324922000237_ref39","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324922000237","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T05:40:32Z","timestamp":1655358032000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324922000237\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,23]]},"references-count":39,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["S1351324922000237"],"URL":"https:\/\/doi.org\/10.1017\/s1351324922000237","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,23]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}