{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T09:03:56Z","timestamp":1765357436255},"reference-count":45,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,9,6]],"date-time":"2023-09-06T00:00:00Z","timestamp":1693958400000},"content-version":"vor","delay-in-days":248,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Label scarcity is a bottleneck for improving task performance in specialized domains. We propose a novel compositional transfer learning framework (DoT51) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from masked language modelling of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: We simultaneously train natural language generation (NLG) for in-domain label-to-data generation, which enables data augmentation for self-finetuning and natural language understanding (NLU) for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on natural language inference, text summarization, and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current state-of-the-art in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.<\/jats:p>","DOI":"10.1162\/tacl_a_00585","type":"journal-article","created":{"date-parts":[[2023,9,6]],"date-time":"2023-09-06T14:08:01Z","timestamp":1694009281000},"page":"1097-1113","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":2,"title":["Compositional Zero-Shot Domain Transfer with Text-to-Text Models"],"prefix":"10.1162","volume":"11","author":[{"given":"Fangyu","family":"Liu","sequence":"first","affiliation":[{"name":"University of Cambridge, UK. fl399@cam.ac.uk"}]},{"given":"Qianchu","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK. t-floraliu@microsoft.com"}]},{"given":"Shruthi","family":"Bannur","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK"}]},{"given":"Fernando","family":"P\u00e9rez-Garc\u00eda","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK"}]},{"given":"Naoto","family":"Usuyama","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, USA"}]},{"given":"Sheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, USA"}]},{"given":"Tristan","family":"Naumann","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, USA"}]},{"given":"Aditya","family":"Nori","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK"}]},{"given":"Hoifung","family":"Poon","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, USA"}]},{"given":"Javier","family":"Alvarez-Valle","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK"}]},{"given":"Ozan","family":"Oktay","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK"}]},{"given":"Stephanie L.","family":"Hyland","sequence":"additional","affiliation":[{"name":"Microsoft Health Futures, UK. stephanie.hyland@microsoft.com"}]}],"member":"281","published-online":{"date-parts":[[2023,9,1]]},"reference":[{"key":"2023090614075071500_bib1","doi-asserted-by":"crossref","first-page":"1998","DOI":"10.18653\/v1\/2022.emnlp-main.130","article-title":"Large language models are few-shot clinical information extractors","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Agrawal","year":"2022"},{"key":"2023090614075071500_bib2","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1109\/ICPR56361.2022.9956656","article-title":"Entity-driven fact-aware abstractive summarization of biomedical literature","volume-title":"2022 26th International Conference on Pattern Recognition (ICPR)","author":"Alambo","year":"2022"},{"key":"2023090614075071500_bib3","article-title":"ExT5: Towards extreme multi-task scaling for transfer learning","volume-title":"International Conference on Learning Representations","author":"Aribandi","year":"2022"},{"key":"2023090614075071500_bib4","doi-asserted-by":"publisher","first-page":"3615","DOI":"10.18653\/v1\/D19-1371","article-title":"SciBERT: A pretrained language model for scientific text","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Iz","year":"2019"},{"key":"2023090614075071500_bib5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-031-20059-5_1","article-title":"Making the most of text semantics to improve biomedical vision\u2013language processing","volume-title":"Computer Vision \u2013 ECCV 2022","author":"Boecking","year":"2022"},{"key":"2023090614075071500_bib6","doi-asserted-by":"crossref","first-page":"632","DOI":"10.18653\/v1\/D15-1075","article-title":"A large annotated corpus for learning natural language inference","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Bowman","year":"2015"},{"key":"2023090614075071500_bib7","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023090614075071500_bib8","doi-asserted-by":"publisher","first-page":"1657","DOI":"10.18653\/v1\/P17-1152","article-title":"Enhanced LSTM for natural language inference","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Chen","year":"2017"},{"key":"2023090614075071500_bib9","doi-asserted-by":"publisher","first-page":"615","DOI":"10.18653\/v1\/N18-2097","article-title":"A discourse-aware attention model for abstractive summarization of long documents","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Cohan","year":"2018"},{"key":"2023090614075071500_bib10","doi-asserted-by":"crossref","first-page":"8440","DOI":"10.18653\/v1\/2020.acl-main.747","article-title":"Unsupervised cross-lingual representation learning at scale","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Conneau","year":"2020"},{"key":"2023090614075071500_bib11","doi-asserted-by":"publisher","first-page":"2475","DOI":"10.18653\/v1\/D18-1269","article-title":"XNLI: Evaluating cross-lingual sentence representations","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Conneau","year":"2018"},{"issue":"2","key":"2023090614075071500_bib12","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1093\/jamia\/ocv080","article-title":"Preparing a collection of radiology examinations for distribution and retrieval","volume":"23","author":"Demner-Fushman","year":"2016","journal-title":"Journal of the American Medical Informatics Association"},{"key":"2023090614075071500_bib13","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2023090614075071500_bib14","first-page":"1180","article-title":"Unsupervised domain adaptation by backpropagation","volume-title":"International conference on machine learning","author":"Ganin","year":"2015"},{"issue":"1","key":"2023090614075071500_bib15","first-page":"34","article-title":"English Gigaword","volume":"4","author":"Graff","year":"2003","journal-title":"Linguistic Data Consortium, Philadelphia"},{"issue":"1","key":"2023090614075071500_bib16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Yu","year":"2021","journal-title":"ACM Transactions on Computing for Healthcare (HEALTH)"},{"key":"2023090614075071500_bib17","doi-asserted-by":"publisher","first-page":"8342","DOI":"10.18653\/v1\/2020.acl-main.740","article-title":"Don\u2019t stop pretraining: Adapt language models to domains and tasks","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Gururangan","year":"2020"},{"key":"2023090614075071500_bib18","article-title":"spaCy: Industrial-strength natural language processing in python","author":"Honnibal","year":"2020"},{"key":"2023090614075071500_bib19","article-title":"Correcting sample selection bias by unlabeled data","volume":"19","author":"Huang","year":"2006","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"2023090614075071500_bib20","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1038\/s41597-019-0322-0","article-title":"MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports","volume":"6","author":"Johnson","year":"2019","journal-title":"Scientific Data"},{"issue":"4","key":"2023090614075071500_bib21","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2023090614075071500_bib22","doi-asserted-by":"publisher","first-page":"1075","DOI":"10.18653\/v1\/2021.eacl-main.92","article-title":"Zero-shot neural passage retrieval via domain-targeted synthetic question generation","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Ji","year":"2021"},{"key":"2023090614075071500_bib23","doi-asserted-by":"publisher","first-page":"9879","DOI":"10.1109\/CVPR42600.2020.00990","article-title":"End-to-end learning of visual representations from uncurated instructional videos","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Miech","year":"2020"},{"key":"2023090614075071500_bib24","doi-asserted-by":"publisher","first-page":"2791","DOI":"10.18653\/v1\/2022.naacl-main.201","article-title":"MetaICL: Learning to learn in context","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Min","year":"2022"},{"key":"2023090614075071500_bib25","doi-asserted-by":"publisher","first-page":"5288","DOI":"10.18653\/v1\/2021.naacl-main.416","article-title":"Improving factual completeness and consistency of image-to-text radiology report generation","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Miura","year":"2021"},{"key":"2023090614075071500_bib26","doi-asserted-by":"publisher","first-page":"4885","DOI":"10.18653\/v1\/2020.acl-main.441","article-title":"Adversarial NLI: A new benchmark for natural language understanding","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Nie","year":"2020"},{"key":"2023090614075071500_bib27","article-title":"Representation learning with contrastive predictive coding","author":"van den Oord","year":"2018","journal-title":"arXiv preprint arXiv:1807.03748"},{"key":"2023090614075071500_bib28","doi-asserted-by":"publisher","first-page":"751","DOI":"10.1109\/TNN.2010.2091281","article-title":"Cross-domain sentiment classification via spectral feature alignment","volume-title":"Proceedings of the 19th International Conference on World Wide Web","author":"Pan","year":"2010"},{"issue":"2","key":"2023090614075071500_bib29","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/TNN.2010.2091281","article-title":"Domain adaptation via transfer component analysis","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2023090614075071500_bib30","doi-asserted-by":"crossref","first-page":"110","DOI":"10.18653\/v1\/2022.deeplo-1.12","article-title":"Task transfer and domain adaptation for zero-shot question answering","volume-title":"Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing","author":"Pan","year":"2022"},{"key":"2023090614075071500_bib31","doi-asserted-by":"publisher","first-page":"58","DOI":"10.18653\/v1\/W19-5006","article-title":"Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets","volume-title":"Proceedings of the 18th BioNLP Workshop and Shared Task","author":"Peng","year":"2019"},{"key":"2023090614075071500_bib32","article-title":"SciFive: A text-to-text transformer model for biomedical literature","author":"Phan","year":"2021","journal-title":"arXiv preprint arXiv: 2106.03598"},{"issue":"140","key":"2023090614075071500_bib33","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"key":"2023090614075071500_bib34","article-title":"Counterfactual data augmentation improves factuality of abstractive summarization","author":"Rajagopal","year":"2022","journal-title":"arXiv preprint arXiv: 2205.12416"},{"key":"2023090614075071500_bib35","doi-asserted-by":"crossref","first-page":"6838","DOI":"10.18653\/v1\/2020.coling-main.603","article-title":"Neural unsupervised domain adaptation in NLP\u2014A survey","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Ramponi","year":"2020"},{"key":"2023090614075071500_bib36","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.18653\/v1\/D18-1187","article-title":"Lessons from natural language inference in the clinical domain","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Romanov","year":"2018"},{"key":"2023090614075071500_bib37","article-title":"Multitask prompted training enables zero-shot task generalization","volume-title":"International Conference on Learning Representations","author":"Sanh","year":"2022"},{"key":"2023090614075071500_bib38","first-page":"4596","article-title":"Adafactor: Adaptive learning rates with sublinear memory cost","volume-title":"International Conference on Machine Learning","author":"Shazeer","year":"2018"},{"key":"2023090614075071500_bib39","doi-asserted-by":"crossref","first-page":"1500","DOI":"10.18653\/v1\/2020.emnlp-main.117","article-title":"Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Smit","year":"2020"},{"key":"2023090614075071500_bib40","doi-asserted-by":"publisher","first-page":"26","DOI":"10.18653\/v1\/W18-6304","article-title":"An analysis of attention mechanisms: The case of word sense disambiguation in neural machine translation","volume-title":"Proceedings of the Third Conference on Machine Translation: Research Papers","author":"Tang","year":"2018"},{"key":"2023090614075071500_bib41","doi-asserted-by":"publisher","first-page":"2345","DOI":"10.18653\/v1\/2022.naacl-main.168","article-title":"GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Wang","year":"2022"},{"key":"2023090614075071500_bib42","article-title":"Finetuned language models are zero-shot learners","volume-title":"International Conference on Learning Representations","author":"Wei","year":"2022"},{"key":"2023090614075071500_bib43","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.18653\/v1\/N18-1101","article-title":"A broad-coverage challenge corpus for sentence understanding through inference","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Williams","year":"2018"},{"issue":"1","key":"2023090614075071500_bib44","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/s10579-018-9431-1","article-title":"MedSTS: A resource for clinical semantic textual similarity","volume":"54","author":"Yanshan","year":"2020","journal-title":"Language Resources and Evaluation"},{"key":"2023090614075071500_bib45","doi-asserted-by":"publisher","first-page":"4848","DOI":"10.18653\/v1\/2022.naacl-main.357","article-title":"Domain-oriented prefix-tuning: Towards efficient and generalizable finetuning for zero-shot dialogue summarization","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Zhao","year":"2022"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00585\/2157384\/tacl_a_00585.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00585\/2157384\/tacl_a_00585.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,6]],"date-time":"2023-09-06T14:08:22Z","timestamp":1694009302000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00585\/117443\/Compositional-Zero-Shot-Domain-Transfer-with-Text"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":45,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00585","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}