{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T15:22:22Z","timestamp":1777130542600,"version":"3.51.4"},"reference-count":63,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T00:00:00Z","timestamp":1667865600000},"content-version":"vor","delay-in-days":311,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Multi-task learning, in which several tasks are jointly learned by a single model, allows NLP models to share information from multiple annotations and may facilitate better predictions when the tasks are inter-related. This technique, however, requires annotating the same text with multiple annotation schemes, which may be costly and laborious. Active learning (AL) has been demonstrated to optimize annotation processes by iteratively selecting unlabeled examples whose annotation is most valuable for the NLP model. Yet, multi-task active learning (MT-AL) has not been applied to state-of-the-art pre-trained Transformer-based NLP models. This paper aims to close this gap. We explore various multi-task selection criteria in three realistic multi-task scenarios, reflecting different relations between the participating tasks, and demonstrate the effectiveness of multi-task compared to single-task selection. Our results suggest that MT-AL can be effectively used in order to minimize annotation efforts for multi-task NLP models.1<\/jats:p>","DOI":"10.1162\/tacl_a_00515","type":"journal-article","created":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T21:00:05Z","timestamp":1667941205000},"page":"1209-1228","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":11,"title":["Multi-task Active Learning for Pre-trained Transformer-based Models"],"prefix":"10.1162","volume":"10","author":[{"given":"Guy","family":"Rotman","sequence":"first","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT, Israel. grotman@campus.technion.ac.il"}]},{"given":"Roi","family":"Reichart","sequence":"additional","affiliation":[{"name":"Faculty of Industrial Engineering and Management, Technion, IIT, Israel. roiri@technion.ac.il"}]}],"member":"281","published-online":{"date-parts":[[2022,11,7]]},"reference":[{"key":"2022110820595992500_bib1","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.18653\/v1\/2020.coling-main.130","article-title":"Pre-trained language model based active learning for sentence matching","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Bai","year":"2020"},{"key":"2022110820595992500_bib2","doi-asserted-by":"publisher","first-page":"646","DOI":"10.18653\/v1\/D18-1066","article-title":"Joint learning for emotion classification and emotion cause detection","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Chen","year":"2018"},{"key":"2022110820595992500_bib3","doi-asserted-by":"publisher","first-page":"758","DOI":"10.1145\/1571941.1572114","article-title":"Reciprocal rank fusion outperforms condorcet and individual rank learning methods","volume-title":"Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval","author":"Cormack","year":"2009"},{"key":"2022110820595992500_bib4","article-title":"Snips voice platform: An embedded spoken language understanding system for private-by-design voice interfaces","author":"Coucke","year":"2018","journal-title":"CoRR"},{"key":"2022110820595992500_bib5","doi-asserted-by":"publisher","first-page":"295","DOI":"10.18653\/v1\/2020.emnlp-main.21","article-title":"Calibration of pre-trained transformers","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Desai","year":"2020"},{"key":"2022110820595992500_bib6","first-page":"4171","article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022110820595992500_bib7","article-title":"Deep biaffine attention for neural dependency parsing","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Dozat","year":"2017"},{"key":"2022110820595992500_bib8","doi-asserted-by":"publisher","first-page":"1383","DOI":"10.18653\/v1\/P18-1128","article-title":"The hitchhiker\u2019s guide to testing statistical significance in natural language processing","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Dror","year":"2018"},{"key":"2022110820595992500_bib9","doi-asserted-by":"publisher","first-page":"43","DOI":"10.18653\/v1\/P18-2008","article-title":"Active learning for deep semantic parsing","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Duong","year":"2018"},{"key":"2022110820595992500_bib10","doi-asserted-by":"publisher","first-page":"7949","DOI":"10.18653\/v1\/2020.emnlp-main.638","article-title":"Active learning for BERT: An empirical study","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Ein-Dor","year":"2020"},{"key":"2022110820595992500_bib11","doi-asserted-by":"publisher","first-page":"326","DOI":"10.3115\/1620754.1620802","article-title":"Joint parsing and named entity recognition","volume-title":"Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Finkel","year":"2009"},{"key":"2022110820595992500_bib12","first-page":"1050","article-title":"Dropout as a Bayesian approximation: Representing model uncertainty in deep learning","volume-title":"International Conference on Machine Learning","author":"Gal","year":"2016"},{"key":"2022110820595992500_bib13","doi-asserted-by":"publisher","first-page":"124","DOI":"10.18653\/v1\/W17-3518","article-title":"The WebNLG challenge: Generating text from RDF data","volume-title":"Proceedings of the 10th International Conference on Natural Language Generation","author":"Gardent","year":"2017"},{"key":"2022110820595992500_bib14","doi-asserted-by":"publisher","first-page":"1158","DOI":"10.18653\/v1\/2020.coling-main.100","article-title":"Fine-tuning BERT for low-resource natural language understanding via active learning","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Grie\u00dfhaber","year":"2020"},{"key":"2022110820595992500_bib15","first-page":"1321","article-title":"On calibration of modern neural networks","volume-title":"International Conference on Machine Learning","author":"Guo","year":"2017"},{"key":"2022110820595992500_bib16","doi-asserted-by":"publisher","first-page":"415","DOI":"10.3115\/1620754.1620815","article-title":"Active learning for statistical phrase-based machine translation","volume-title":"Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Haffari","year":"2009"},{"issue":"8","key":"2022110820595992500_bib17","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2022110820595992500_bib18","doi-asserted-by":"publisher","first-page":"57","DOI":"10.3115\/1614049.1614064","article-title":"Ontonotes: The 90% solution","volume-title":"Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers","author":"Hovy","year":"2006"},{"key":"2022110820595992500_bib19","doi-asserted-by":"publisher","first-page":"43","DOI":"10.18653\/v1\/W18-3406","article-title":"Multi-task active learning for neural semantic role labeling on low resource conversational corpus","volume-title":"Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP","author":"Ikhwantri","year":"2018"},{"key":"2022110820595992500_bib20","first-page":"69","article-title":"Mmr-based active machine learning for bio named entity recognition","volume-title":"Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers","author":"Kim","year":"2006"},{"key":"2022110820595992500_bib21","article-title":"Adam: A method for stochastic optimization","volume-title":"ICLR (Poster)","author":"Kingma","year":"2015"},{"key":"2022110820595992500_bib22","doi-asserted-by":"publisher","first-page":"1326","DOI":"10.18653\/v1\/2020.emnlp-main.102","article-title":"Calibrated language model fine-tuning for in- and out-of-distribution data","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Kong","year":"2020"},{"key":"2022110820595992500_bib23","doi-asserted-by":"publisher","first-page":"97","DOI":"10.18653\/v1\/N18-2016","article-title":"An annotated corpus for machine reading of instructions in wet lab protocols","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Kulkarni","year":"2018"},{"key":"2022110820595992500_bib24","doi-asserted-by":"publisher","first-page":"8320","DOI":"10.18653\/v1\/2020.acl-main.738","article-title":"Active learning for coreference resolution using discrete annotation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Li","year":"2020"},{"key":"2022110820595992500_bib25","first-page":"4568","article-title":"Weakly supervised named entity tagging with learnable logical rules","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Li","year":"2021"},{"key":"2022110820595992500_bib26","doi-asserted-by":"crossref","first-page":"344","DOI":"10.18653\/v1\/P16-1033","article-title":"Active learning for dependency parsing with partial annotation","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Li","year":"2016"},{"key":"2022110820595992500_bib27","doi-asserted-by":"publisher","first-page":"799","DOI":"10.18653\/v1\/P18-1074","article-title":"A multi-lingual multi-task architecture for low-resource sequence labeling","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Lin","year":"2018"},{"key":"2022110820595992500_bib28","doi-asserted-by":"crossref","first-page":"4487","DOI":"10.18653\/v1\/P19-1441","article-title":"Multi-task deep neural networks for natural language understanding","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Liu","year":"2019"},{"key":"2022110820595992500_bib29","article-title":"Roberta: A robustly optimized bert pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"2022110820595992500_bib30","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/978-3-540-88908-3_9","article-title":"Visualizing the Pareto frontier","volume-title":"Multiobjective Optimization","author":"Lotov","year":"2008"},{"key":"2022110820595992500_bib31","doi-asserted-by":"publisher","first-page":"3219","DOI":"10.18653\/v1\/D18-1360","article-title":"Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Yi","year":"2018"},{"key":"2022110820595992500_bib32","first-page":"6297","article-title":"Learned in translation: Contextualized word vectors","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"McCann","year":"2017"},{"key":"2022110820595992500_bib33","doi-asserted-by":"publisher","first-page":"8528","DOI":"10.1609\/aaai.v34i05.6374","article-title":"Effective modeling of encoder-decoder architecture for joint entity and relation extraction","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Nayak","year":"2020"},{"key":"2022110820595992500_bib34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/2021.naacl-demos.1","article-title":"PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations","author":"The Nguyen","year":"2021"},{"key":"2022110820595992500_bib35","first-page":"4034","article-title":"Universal Dependencies v2: An evergrowing multilingual treebank collection","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Nivre","year":"2020"},{"key":"2022110820595992500_bib36","first-page":"13991","article-title":"Can you trust your model\u2019s uncertainty? Evaluating predictive uncertainty under dataset shift","volume":"32","author":"Ovadia","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2022110820595992500_bib37","doi-asserted-by":"publisher","first-page":"151","DOI":"10.18653\/v1\/K18-1015","article-title":"Active learning for interactive neural machine translation of data streams","volume-title":"Proceedings of the 22nd Conference on Computational Natural Language Learning","author":"Peris","year":"2018"},{"key":"2022110820595992500_bib38","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.18653\/v1\/N18-1202","article-title":"Deep contextualized word representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2022110820595992500_bib39","doi-asserted-by":"publisher","DOI":"10.3115\/116580.116612","article-title":"Evaluation of spoken language systems: The atis domain","volume-title":"Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990","author":"Price","year":"1990"},{"key":"2022110820595992500_bib40","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2022110820595992500_bib41","first-page":"408","article-title":"An ensemble method for selection of high quality parses","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics","author":"Reichart","year":"2007"},{"key":"2022110820595992500_bib42","doi-asserted-by":"publisher","first-page":"3","DOI":"10.3115\/1596374.1596379","article-title":"Sample selection for statistical parsers: Cognitively driven algorithms and evaluation measures","volume-title":"Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)","author":"Reichart","year":"2009"},{"key":"2022110820595992500_bib43","first-page":"861","article-title":"Multi-task active learning for linguistic annotations","volume-title":"Proceedings of ACL-08: HLT","author":"Reichart","year":"2008"},{"key":"2022110820595992500_bib44","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1162\/tacl_a_00294","article-title":"Deep contextualized self-training for low resource dependency parsing","volume":"7","author":"Rotman","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022110820595992500_bib45","article-title":"An overview of multi-task learning in deep neural networks","author":"Ruder","year":"2017","journal-title":"arXiv preprint arXiv:1706.05098"},{"key":"2022110820595992500_bib46","first-page":"126","article-title":"Aggression and misogyny detection using BERT: A multi-task approach","volume-title":"Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying","author":"Samghabadi","year":"2020"},{"key":"2022110820595992500_bib47","doi-asserted-by":"publisher","first-page":"6949","DOI":"10.1609\/aaai.v33i01.33016949","article-title":"A hierarchical multi-task approach for learning embeddings from semantic tasks","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Sanh","year":"2019"},{"key":"2022110820595992500_bib48","doi-asserted-by":"publisher","first-page":"185","DOI":"10.18653\/v1\/P18-1018","article-title":"Comprehensive supersense disambiguation of English prepositions and possessives","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Schneider","year":"2018"},{"key":"2022110820595992500_bib49","doi-asserted-by":"publisher","first-page":"1070","DOI":"10.3115\/1613715.1613855","article-title":"An analysis of active learning strategies for sequence labeling tasks","volume-title":"Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing","author":"Settles","year":"2008"},{"key":"2022110820595992500_bib50","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1145\/130385.130417","article-title":"Query by committee","volume-title":"Proceedings of the Fifth Annual Workshop on Computational Learning Theory","author":"Sebastian Seung","year":"1992"},{"key":"2022110820595992500_bib51","doi-asserted-by":"publisher","first-page":"252","DOI":"10.18653\/v1\/W17-2630","article-title":"Deep active learning for named entity recognition","volume-title":"Proceedings of the 2nd Workshop on Representation Learning for NLP","author":"Shen","year":"2017"},{"key":"2022110820595992500_bib52","doi-asserted-by":"publisher","first-page":"231","DOI":"10.18653\/v1\/P16-2038","article-title":"Deep multi-task learning with low level tasks supervised at lower layers","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"S\u00f8gaard","year":"2016"},{"key":"2022110820595992500_bib53","doi-asserted-by":"publisher","first-page":"2818","DOI":"10.1109\/CVPR.2016.308","article-title":"Rethinking the inception architecture for computer vision","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Szegedy","year":"2016"},{"key":"2022110820595992500_bib54","doi-asserted-by":"publisher","DOI":"10.2172\/1525811","article-title":"On mixup training: Improved calibration and predictive uncertainty for deep neural networks","volume-title":"Advances in Neural Information Processing Systems","author":"Thulasidasan","year":"2019"},{"key":"2022110820595992500_bib55","first-page":"1247","article-title":"A comparison of models for cost-sensitive active learning","volume-title":"Coling 2010: Posters","author":"Tomanek","year":"2010"},{"key":"2022110820595992500_bib56","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022110820595992500_bib57","doi-asserted-by":"publisher","first-page":"353","DOI":"10.18653\/v1\/W18-5446","article-title":"GLUE: A multi-task benchmark and analysis platform for natural language understanding","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wang","year":"2018"},{"key":"2022110820595992500_bib58","doi-asserted-by":"publisher","first-page":"12","DOI":"10.18653\/v1\/2020.louhi-1.2","article-title":"Simple hierarchical multi-task neural end-to-end entity linking for biomedical text","volume-title":"Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis","author":"Wiatrak","year":"2020"},{"key":"2022110820595992500_bib59","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2022110820595992500_bib60","doi-asserted-by":"publisher","first-page":"209","DOI":"10.18653\/v1\/W18-5022","article-title":"Cost-sensitive active learning for dialogue state tracking","volume-title":"Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue","author":"Xie","year":"2018"},{"key":"2022110820595992500_bib61","doi-asserted-by":"publisher","first-page":"3239","DOI":"10.18653\/v1\/2020.acl-main.296","article-title":"SpanMlt: A span-based multi-task learning framework for pair-wise aspect and opinion terms extraction","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"He","year":"2020"},{"key":"2022110820595992500_bib62","doi-asserted-by":"publisher","first-page":"4900","DOI":"10.18653\/v1\/2020.coling-main.430","article-title":"A multitask active learning framework for natural language understanding","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Zhu","year":"2020"},{"key":"2022110820595992500_bib63","doi-asserted-by":"publisher","first-page":"5675","DOI":"10.1109\/ICASSP.2017.7953243","article-title":"Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding","volume-title":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Zhu","year":"2017"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00515\/2057243\/tacl_a_00515.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00515\/2057243\/tacl_a_00515.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T21:00:45Z","timestamp":1667941245000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00515\/113664\/Multi-task-Active-Learning-for-Pre-trained"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":63,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00515","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}