{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T05:49:20Z","timestamp":1694238560518},"reference-count":29,"publisher":"MIT Press - Journals","issue":"4","license":[{"start":{"date-parts":[[2021,8,5]],"date-time":"2021-08-05T00:00:00Z","timestamp":1628121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We propose a new, more actionable view of neural network interpretability and data analysis by leveraging the remarkable matching effectiveness of representations derived from deep networks, guided by an approach for class-conditional feature detection. The decomposition of the filter-n-gram interactions of a convolutional neural network (CNN) and a linear layer over a pre-trained deep network yields a strong binary sequence labeler, with flexibility in producing predictions at\u2014and defining loss functions for\u2014varying label granularities, from the fully supervised sequence labeling setting to the challenging zero-shot sequence labeling setting, in which we seek token-level predictions but only have document-level labels for training. From this sequence-labeling layer we derive dense representations of the input that can then be matched to instances from training, or a support set with known labels. Such introspection with inference-time decision rules provides a means, in some settings, of making local updates to the model by altering the labels or instances in the support set without re-training the full model. Finally, we construct a particular K-nearest neighbors (K-NN) model from matched exemplar representations that approximates the original model\u2019s predictions and is at least as effective a predictor with respect to the ground-truth labels. This additionally yields interpretable heuristics at the token level for determining when predictions are less likely to be reliable, and for screening input dissimilar to the support set. In effect, we show that we can transform the deep network into a simple weighting over exemplars and associated labels, yielding an introspectable\u2014and modestly updatable\u2014version of the original model.<\/jats:p>","DOI":"10.1162\/coli_a_00416","type":"journal-article","created":{"date-parts":[[2021,8,5]],"date-time":"2021-08-05T14:36:19Z","timestamp":1628174179000},"page":"729-773","update-policy":"http:\/\/dx.doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":1,"title":["Detecting Local Insights from Global Labels: Supervised and Zero-Shot Sequence Labeling via a Convolutional Decomposition"],"prefix":"10.1162","volume":"47","author":[{"given":"Allen","family":"Schmaltz","sequence":"first","affiliation":[{"name":"Department of Epidemiology, Harvard University. aschmaltz@hsph.harvard.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,12,23]]},"reference":[{"key":"2022010319055320400_bib1","first-page":"2760","article-title":"An empirical study on the properties of random bases for kernel methods","volume-title":"Advances in Neural Information Processing Systems","author":"Alber","year":"2017"},{"key":"2022010319055320400_bib2","doi-asserted-by":"publisher","first-page":"103","DOI":"10.18653\/v1\/W19-4410","article-title":"Context is key: Grammatical error detection with contextual word representations","volume-title":"Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Bell","year":"2019"},{"key":"2022010319055320400_bib3","doi-asserted-by":"publisher","first-page":"2635","DOI":"10.21437\/Interspeech.2014-564","article-title":"One billion word benchmark for measuring progress in statistical language modeling","volume-title":"INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, Singapore, September 14-18, 2014","author":"Chelba","year":"2014"},{"key":"2022010319055320400_bib4","first-page":"342","article-title":"Kernel methods for deep learning","volume-title":"Advances in Neural Information Processing Systems","author":"Cho","year":"2009"},{"key":"2022010319055320400_bib5","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1007\/978-1-4613-1641-1_8","article-title":"A comparison of rule and exemplar-based learning systems","volume-title":"Machine Learning, Meta-Reasoning and Logics","author":"Clark","year":"1990"},{"key":"2022010319055320400_bib6","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022010319055320400_bib7","doi-asserted-by":"publisher","first-page":"1307","DOI":"10.18653\/v1\/2020.findings-emnlp.117","article-title":"Evaluating models\u2019 local decision boundaries via contrast sets","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Gardner","year":"2020"},{"key":"2022010319055320400_bib8","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1997.10474027","article-title":"In search of lost domain generalization","volume-title":"International Conference on Learning Representations","author":"Gulrajani","year":"2021"},{"issue":"438","key":"2022010319055320400_bib9","doi-asserted-by":"crossref","first-page":"748","DOI":"10.1080\/01621459.1997.10474027","article-title":"Prediction intervals for artificial neural networks","volume":"92","author":"Hwang","year":"1997","journal-title":"Journal of the American Statistical Association"},{"key":"2022010319055320400_bib10","first-page":"3543","article-title":"Attention is not explanation","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Jain","year":"2019"},{"key":"2022010319055320400_bib11","article-title":"Learning the difference that makes a difference with counterfactually-augmented data","volume-title":"International Conference on Learning Representations","author":"Kaushik","year":"2020"},{"key":"2022010319055320400_bib12","doi-asserted-by":"publisher","first-page":"1746","DOI":"10.3115\/v1\/D14-1181","article-title":"Convolutional neural networks for sentence classification","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Kim","year":"2014"},{"key":"2022010319055320400_bib13","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"Advances in Neural Information Processing Systems 26","author":"Mikolov","year":"2013"},{"key":"2022010319055320400_bib14","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: Global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2022010319055320400_bib15","doi-asserted-by":"publisher","first-page":"2121","DOI":"10.18653\/v1\/P17-1194","article-title":"Semi-supervised multitask learning for sequence labeling","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Rei","year":"2017"},{"key":"2022010319055320400_bib16","doi-asserted-by":"publisher","first-page":"293","DOI":"10.18653\/v1\/N18-1027","article-title":"Zero-shot sequence labeling: Transferring knowledge from sentences to tokens","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Rei","year":"2018"},{"issue":"01","key":"2022010319055320400_bib17","doi-asserted-by":"publisher","first-page":"6916","DOI":"10.1609\/aaai.v33i01.33016916","article-title":"Jointly learning to label sentences and tokens","volume":"33","author":"Rei","year":"2019","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2022010319055320400_bib18","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.18653\/v1\/P16-1112","article-title":"Compositional sequence labeling models for error detection in learner writing","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Rei","year":"2016"},{"key":"2022010319055320400_bib19","doi-asserted-by":"publisher","first-page":"502","DOI":"10.18653\/v1\/S17-2088","article-title":"SemEval-2017 Task 4: Sentiment analysis in Twitter","volume-title":"Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)","author":"Rosenthal","year":"2017"},{"key":"2022010319055320400_bib20","doi-asserted-by":"publisher","first-page":"2807","DOI":"10.18653\/v1\/D17-1298","article-title":"Adapting sequence models for sentence correction","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Schmaltz","year":"2017"},{"key":"2022010319055320400_bib21","first-page":"4077","article-title":"Prototypical networks for few-shot learning","volume-title":"Advances in Neural Information Processing Systems 30","author":"Snell","year":"2017"},{"key":"2022010319055320400_bib22","first-page":"18583","article-title":"Measuring robustness to natural distribution shifts in image classification","volume-title":"Advances in Neural Information Processing Systems","author":"Taori","year":"2020"},{"key":"2022010319055320400_bib23","first-page":"6000","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022010319055320400_bib24","first-page":"3630","article-title":"Matching networks for one shot learning","volume-title":"Advances in Neural Information Processing Systems 29","author":"Vinyals","year":"2016"},{"issue":"9","key":"2022010319055320400_bib25","first-page":"207","article-title":"Distance metric learning for large margin nearest neighbor classification","volume":"10","author":"Weinberger","year":"2009","journal-title":"Journal of Machine Learning Research"},{"key":"2022010319055320400_bib26","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2022010319055320400_bib27","article-title":"Google\u2019s neural machine translation system: Bridging the gap between human and machine translation","author":"Wu","year":"2016","journal-title":"CoRR"},{"key":"2022010319055320400_bib28","first-page":"180","article-title":"A new data set and method for automatically grading ESOL texts","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Yannakoudakis","year":"2011"},{"key":"2022010319055320400_bib29","article-title":"ADADELTA: An adaptive learning rate method","author":"Zeiler","year":"2012","journal-title":"CoRR"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/4\/729\/1979432\/coli_a_00416.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/4\/729\/1979432\/coli_a_00416.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,3]],"date-time":"2022-01-03T19:06:16Z","timestamp":1641236776000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/47\/4\/729\/106772\/Detecting-Local-Insights-from-Global-Labels"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":29,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,12,23]]},"published-print":{"date-parts":[[2021,12,23]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00416","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,12]]},"published":{"date-parts":[[2021,12]]}}}