{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T22:28:04Z","timestamp":1781216884581,"version":"3.54.1"},"reference-count":96,"publisher":"Cambridge University Press (CUP)","issue":"4","license":[{"start":{"date-parts":[[2023,5,25]],"date-time":"2023-05-25T00:00:00Z","timestamp":1684972800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain strong performance across these tasks. We then apply two methods to locate, for each probing task, where the disambiguating information resides in the input. The first is a new perturbation method that \u201cmasks\u201d various parts of context; the second is the classical method of Shapley values. The most intriguing finding that emerges is a strong tendency for the preceding context to hold more information relevant to the prediction than the following context.<\/jats:p>","DOI":"10.1017\/s1351324923000190","type":"journal-article","created":{"date-parts":[[2023,5,25]],"date-time":"2023-05-25T08:18:55Z","timestamp":1685002735000},"page":"753-792","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":9,"title":["Morphosyntactic probing of multilingual BERT models"],"prefix":"10.1017","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4918-4333","authenticated-orcid":false,"given":"Judit","family":"Acs","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Endre","family":"Hamerlik","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roy","family":"Schwartz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Noah A.","family":"Smith","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andras","family":"Kornai","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"56","published-online":{"date-parts":[[2023,5,25]]},"reference":[{"key":"S1351324923000190_ref82","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00376"},{"key":"S1351324923000190_ref5","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780199279906.001.0001"},{"key":"S1351324923000190_ref91","unstructured":"Vaswani, A. , Shazeer, N. , Parmar, N. , Uszkoreit, J. , Jones, L. , Gomez, A. N. , Kaiser, L. and Polosukhin, I. (2017). Attention is all you need In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, pp. 5998\u20136008, December 4\u20139, 2017."},{"key":"S1351324923000190_ref67","unstructured":"Lundberg, S. M. and Lee, S. (2017). A unified approach to interpreting model predictions In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, pp. 4765\u20134774, December 4\u20139, 2017"},{"key":"S1351324923000190_ref12","unstructured":"Belinkov, Y. , M\u00e0rquez, L. , Sajjad, H. , Durrani, N. , Dalvi, F. and Glass, J. (2017b). Evaluating layers of representation in neural machine translation on part-of-speech and semantic tagging tasks In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Taipei, Taiwan: Asian Federation of Natural Language Processing, pp. 1\u201310."},{"key":"S1351324923000190_ref24","doi-asserted-by":"crossref","unstructured":"Conneau, A. , Kiela, D. , Schwenk, H. , Barrault, L. and Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data, arXiv preprint arXiv: 1705.02364.","DOI":"10.18653\/v1\/D17-1070"},{"key":"S1351324923000190_ref18","unstructured":"Brown, L. (2001). A grammar of Nias Selatan."},{"key":"S1351324923000190_ref27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1269"},{"key":"S1351324923000190_ref9","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00422"},{"key":"S1351324923000190_ref16","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"S1351324923000190_ref43","doi-asserted-by":"crossref","unstructured":"Hewitt, J. and Liang, P. (2019). Designing and interpreting probes with control tasks In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, pp. 2733\u20132743.","DOI":"10.18653\/v1\/D19-1275"},{"key":"S1351324923000190_ref55","doi-asserted-by":"publisher","DOI":"10.1137\/07070111X"},{"key":"S1351324923000190_ref86","doi-asserted-by":"crossref","unstructured":"Shi, X. , Padhi, I. and Knight, K. (2016). Does string-based neural MT learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX: Association for Computational Linguistics, pp. 1526\u20131534.","DOI":"10.18653\/v1\/D16-1159"},{"key":"S1351324923000190_ref29","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K17-2001"},{"key":"S1351324923000190_ref79","doi-asserted-by":"crossref","unstructured":"Ravichander, A. , Belinkov, Y. and Hovy, E. (2021). Probing the probing paradigm: Does probing accuracy entail task relevance? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, pp. 3363\u20133377.","DOI":"10.18653\/v1\/2021.eacl-main.295"},{"key":"S1351324923000190_ref19","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000371"},{"key":"S1351324923000190_ref47","volume-title":"NY Academy of Sciences NLP, Dialog, and Speech Workshop","author":"Htut","year":"2019"},{"key":"S1351324923000190_ref50","doi-asserted-by":"crossref","unstructured":"Kann, K. and Sch\u00fctze, H. (2016). MED: The LMU system for the SIGMORPHON 2016 shared task on morphological reinflection In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, Berlin, Germany: Association for Computational Linguistics, pp. 62\u201370.","DOI":"10.18653\/v1\/W16-2010"},{"key":"S1351324923000190_ref54","doi-asserted-by":"crossref","unstructured":"Kitaev, N. , Cao, S. and Klein, D. (2019). Multilingual constituency parsing with self-attention and pre-training In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics, pp. 3499\u20133505.","DOI":"10.18653\/v1\/P19-1340"},{"key":"S1351324923000190_ref26","unstructured":"Conneau, A. and Lample, G. (2019). Cross-lingual language model pretraining In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 7057\u20137067, December 8\u201314, 2019"},{"key":"S1351324923000190_ref38","unstructured":"Goldberg, Y. (2019). Assessing BERT\u2019s syntactic abilities, arXiv: 1901.05287 [cs]."},{"key":"S1351324923000190_ref14","volume-title":"Advances in Neural Information Processing Systems","author":"Bengio","year":"2000"},{"key":"S1351324923000190_ref33","unstructured":"Edmiston, D. (2020). A systematic analysis of morphological content in BERT models for multiple languages, arXiv: 2004.03032 [cs]."},{"key":"S1351324923000190_ref44","unstructured":"Hewitt, J. and Manning, C. D. (2019). A structural probe for finding syntax in word representations, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN: Association for Computational Linguistics, pp. 4129\u20134138."},{"key":"S1351324923000190_ref71","unstructured":"Nemeskey, D. M. (2020). Natural Language Processing Methods for Language Modeling, PhD Thesis. E\u00f6tv\u00f6s Lor\u00e1nd University"},{"key":"S1351324923000190_ref78","unstructured":"Radford, A. , Narasimhan, K. , Salimans, T. and Sutskever, I. (2018). Improving language understanding by generative pre-training, Preprint. Work in progress."},{"key":"S1351324923000190_ref70","unstructured":"Kote, N. , Biba, M. , Kanerva, J. , R\u00f6nnqvist, S. and Ginter, F. (2019). Morphological tagging and lemmatization of Albanian: A manually annotated corpus and neural models."},{"key":"S1351324923000190_ref81","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Reif","year":"2019"},{"key":"S1351324923000190_ref94","doi-asserted-by":"crossref","unstructured":"Warstadt, A. , Cao, Y. , Grosu, I. , Peng, W. , Blix, H. , Nie, Y. , Alsop, A. , Bordia, S. , Liu, H. , Parrish, A. , Wang, S.-F. , Phang, J. , Mohananey, A. , Htut, P. M. , Jeretic, P. and Bowman, S. R. (2019). Investigating BERT\u2019s knowledge of language: Five analysis methods with NPIs In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, pp. 2877\u20132887.","DOI":"10.18653\/v1\/D19-1286"},{"key":"S1351324923000190_ref95","unstructured":"Wu, Y. , Schuster, M. , Chen, Z. , Le, Q. V. , Norouzi, M. , Macherey, W. , Krikun, M. , Cao, Y. , Gao, Q. , Macherey, K. , Klingner, J. , Shah, A. , Johnson, M. , Liu, X. , Kaiser, L. , Gouws, S. , Kato, Y. , Kudo, T. , Kazawa, H. , Stevens, K. , Kurian, G. , Patil, N. , Wang, W. , Young, C. , Smith, J. , Riesa, J. , Rudnick, A. , Vinyals, O. , Corrado, G. , Hughes, M. and Dean, J. (2016). Google\u2019s neural machine translation system: Bridging the gap between human and machine translation."},{"key":"S1351324923000190_ref84","unstructured":"Shapley, L. S. (1951). Notes on the n-person game -- ii: The value of an n-person game, Technical report, RAND Corporation."},{"key":"S1351324923000190_ref56","doi-asserted-by":"crossref","unstructured":"Kondratyuk, D. and Straka, M. (2019). 75 languages, 1 model: Parsing universal dependencies universally In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China: Association for Computational Linguistics, pp. 2779\u20132795.","DOI":"10.18653\/v1\/D19-1279"},{"key":"S1351324923000190_ref13","doi-asserted-by":"crossref","unstructured":"Ben Zaken, E. , Goldberg, Y. and Ravfogel, S. (2022). BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Dublin, Ireland: Association for Computational Linguistics, pp. 1\u20139.","DOI":"10.18653\/v1\/2022.acl-short.1"},{"key":"S1351324923000190_ref32","unstructured":"Kingma, D. P. and Ba, J. (2015). ADAM: A method for stochastic optimization In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015."},{"key":"S1351324923000190_ref40","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1978.10837"},{"key":"S1351324923000190_ref77","doi-asserted-by":"publisher","DOI":"10.1007\/s11431-020-1647-3"},{"key":"S1351324923000190_ref85","doi-asserted-by":"crossref","unstructured":"Sharan, V. , Khakade, S. , Liang, P. and Valiant, G. (2018). Prediction with a short memory In Proceedings of STOC 2018, New York, NY: Association for Computing Machinery, pp. 1074\u20131087.","DOI":"10.1145\/3188745.3188954"},{"key":"S1351324923000190_ref21","doi-asserted-by":"publisher","DOI":"10.1007\/BF02259530"},{"key":"S1351324923000190_ref89","first-page":"4593","author":"Tenney","year":"2019","journal-title":"BERT rediscovers the classical NLP pipeline"},{"key":"S1351324923000190_ref58","unstructured":"Kurimo, M. , Virpioja, S. , Turunen, V. and Lagus, K. (2010). Morpho challenge competition 2005\u20132010: Evaluations and results In Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology, Association for Computational Linguistics, pp. 87\u201395."},{"key":"S1351324923000190_ref4","unstructured":"Adi, Y. , Kermany, E. , Belinkov, Y. , Lavi, O. and Goldberg, Y. (2017). Fine-grained analysis of sentence embeddings using auxiliary prediction tasks, 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, Conference Track Proceedings. OpenReview.net, April 24\u201326, 2017"},{"key":"S1351324923000190_ref96","doi-asserted-by":"crossref","unstructured":"Zhang, K. W. and Bowman, S. R. (2018). Language modeling teaches you more syntax than translation does: Lessons learned through auxiliary task analysis, arXiv preprint arXiv: 1809.10040.","DOI":"10.18653\/v1\/W18-5448"},{"key":"S1351324923000190_ref34","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K. and Jurafsky, D. (2021). Attention flows are Shapley value explanations In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, pp. 49\u201354.","DOI":"10.18653\/v1\/2021.acl-short.8"},{"key":"S1351324923000190_ref39","first-page":"12","author":"Gupta","year":"2015","journal-title":"Distributional vectors encode referential attributes"},{"key":"S1351324923000190_ref61","first-page":"8236","volume":"34","author":"Li","year":"2020","journal-title":"Why attention? Analyze BiLSTM deficiency and its remedies in the case of NER"},{"key":"S1351324923000190_ref45","unstructured":"Houlsby, N. , Giurgiu, A. , Jastrzebski, S. , Morrone, B. , De Laroussilhe, Q. , Gesmundo, A. , Attariyan, M. and Gelly, S. (2019). Parameter-efficient transfer learning for NLP In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, PMLR, pp. 2790\u20132799."},{"key":"S1351324923000190_ref80","unstructured":"Ravishankar, V. , G\u00f6k\u0131rmak, M. , \u00d8vrelid, L. and Velldal, E. (2019). Multilingual probing of deep pre-trained contextual encoders In Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing, Turku, Finland: Link\u00f6ping University Electronic Press, pp. 37\u201347."},{"key":"S1351324923000190_ref1","unstructured":"\u00c1cs, J. (2019). Exploring BERT\u2019s vocabulary. Available at http:\/\/juditacs.github.io\/2019\/02\/19\/bert-tokenization-stats.html (accessed 9 March 2023)."},{"key":"S1351324923000190_ref72","doi-asserted-by":"publisher","DOI":"10.1353\/lan.1986.0014"},{"key":"S1351324923000190_ref87","doi-asserted-by":"crossref","unstructured":"Sinha, K. , Parthasarathi, P. , Pineau, J. and Williams, A. (2021). Unnatural language inference In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, pp. 7329\u20137346.","DOI":"10.18653\/v1\/2021.acl-long.569"},{"key":"S1351324923000190_ref88","unstructured":"Srivastava, R. K. , Greff, K. and Schmidhuber, J. (2015). Highway networks, arXiv preprint arXiv: 1505.00387."},{"key":"S1351324923000190_ref6","first-page":"33","article-title":"Life on the edge: There\u2019s morphology there after all","volume":"5","author":"Anderson","year":"2006","journal-title":"Lingue e linguaggio"},{"key":"S1351324923000190_ref30","unstructured":"Deal, A. R. (2015). Interaction and satisfaction in \u03c6-agreement In Proceedings of the 45th New England Linguistic SocietyAnnual Meeting, pp. 179\u2013192,Cambridge, MA. Massachusetts Institute of Technology."},{"key":"S1351324923000190_ref35","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00298"},{"key":"S1351324923000190_ref25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1198"},{"key":"S1351324923000190_ref17","doi-asserted-by":"publisher","DOI":"10.21236\/ADA458695"},{"key":"S1351324923000190_ref36","doi-asserted-by":"crossref","unstructured":"Ettinger, A. , Elgohary, A. and Resnik, P. (2016). Probing for semantic evidence of composition by means of simple classification tasks In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, Berlin, Germany: Association for Computational Linguistics, pp. 134\u2013139.","DOI":"10.18653\/v1\/W16-2524"},{"key":"S1351324923000190_ref48","doi-asserted-by":"publisher","DOI":"10.2478\/v10108-011-0007-0"},{"key":"S1351324923000190_ref63","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1112"},{"key":"S1351324923000190_ref66","doi-asserted-by":"publisher","DOI":"10.1002\/asi.5090110403"},{"key":"S1351324923000190_ref20","doi-asserted-by":"crossref","unstructured":"Chi, E. A. , Hewitt, J. and Manning, C. D. (2020). Finding universal grammatical relations in multilingual BERT In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 5564\u20135577.","DOI":"10.18653\/v1\/2020.acl-main.493"},{"key":"S1351324923000190_ref15","doi-asserted-by":"crossref","unstructured":"Bisazza, A. and Tump, C. (2018). The lazy encoder: A fine-grained analysis of the role of morphology in neural machine translation In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium: Association for Computational Linguistics, pp. 2871\u20132876.","DOI":"10.18653\/v1\/D18-1313"},{"key":"S1351324923000190_ref73","unstructured":"Nivre, J. , de Marneffe, M.-C. , Ginter, F. , Haji\u010d, J. , Manning, C. D. , Pyysalo, S. , Schuster, S. , Tyers, F. and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France: European Language Resources Association, pp. 4034\u20134043."},{"key":"S1351324923000190_ref74","first-page":"1532","author":"Pennington","year":"2014","journal-title":"GloVe: Global vectors for word representation"},{"key":"S1351324923000190_ref92","volume-title":"CoCo@ NIPS.","author":"Veldhoen","year":"2016"},{"key":"S1351324923000190_ref37","first-page":"23","article-title":"A new algorithm for data compression","volume":"12","author":"Gage","year":"1994","journal-title":"The C Users Journal"},{"key":"S1351324923000190_ref11","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"S1351324923000190_ref93","doi-asserted-by":"crossref","unstructured":"Voita, E. and Titov, I. (2020). Information-theoretic probing with minimum description length In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 183\u2013196.","DOI":"10.18653\/v1\/2020.emnlp-main.14"},{"key":"S1351324923000190_ref65","author":"Loshchilov","year":"2019","journal-title":"Decoupled weight decay regularization"},{"key":"S1351324923000190_ref3","first-page":"1","article-title":"The Austronesian languages of Asia and Madagascar: A historical perspective","volume":"1","author":"Adelaar","year":"2005","journal-title":"The Austronesian Languages of Asia and Madagascar"},{"key":"S1351324923000190_ref64","unstructured":"Liu, Y. , Ott, M. , Goyal, N. , Du, J. , Joshi, M. , Chen, D. , Levy, O. , Lewis, M. , Zettlemoyer, L. and Stoyanov, V. (2019b). RoBERTa: A robustly optimized BERT pretraining approach."},{"key":"S1351324923000190_ref69","unstructured":"Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space, arXiv preprint arXiv: 1301.3781."},{"key":"S1351324923000190_ref76","doi-asserted-by":"crossref","unstructured":"Qian, P. , Qiu, X. and Huang, X. (2016). Investigating language universal and specific properties in word embeddings In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany: Association for Computational Linguistics, pp. 1478\u20131488.","DOI":"10.18653\/v1\/P16-1140"},{"key":"S1351324923000190_ref28","first-page":"191","volume-title":"The Handbook of Morphology","author":"Corbett","year":"1998"},{"key":"S1351324923000190_ref57","doi-asserted-by":"crossref","unstructured":"Kudo, T. and Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium: Association for Computational Linguistics, pp. 66\u201371.","DOI":"10.18653\/v1\/D18-2012"},{"key":"S1351324923000190_ref83","doi-asserted-by":"crossref","unstructured":"Shapiro, N. T. , Paullada, A. and Steinert-Threlkeld, S. (2021). A multilabel approach to morphosyntactic probing, arXiv preprint arXiv: 2104.08464.","DOI":"10.18653\/v1\/2021.findings-emnlp.382"},{"key":"S1351324923000190_ref22","doi-asserted-by":"crossref","unstructured":"Conneau, A. , Khandelwal, K. , Goyal, N. , Chaudhary, V. , Wenzek, G. , Guzm\u00e0n, F. , Grave, E. , Ott, M. , Zettlemoyer, L. and Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 8440\u20138451.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"S1351324923000190_ref7","unstructured":"Arnard\u00f3ttir, \u00de. , Hafsteinsson, H. , Sigur\u00f0sson, E. F. , Bjarnad\u00f3ttir, K. , Ingason, A. K. , J\u00f3nsd\u00f3ttir, H. and Steingr\u00edmsson, S. (2020). A Universal Dependencies conversion pipeline for a Penn-format constituency treebank In Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020), Barcelona, Spain: Association for Computational Linguistics, pp. 16\u201325."},{"key":"S1351324923000190_ref52","doi-asserted-by":"crossref","unstructured":"K\u00f6hn, A. (2015). What\u2019s in an embedding? Analyzing word embeddings through multilingual evaluation In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 2067\u20132073.","DOI":"10.18653\/v1\/D15-1246"},{"key":"S1351324923000190_ref62","doi-asserted-by":"crossref","unstructured":"Li, X. L. and Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, pp. 4582\u20134597.","DOI":"10.18653\/v1\/2021.acl-long.353"},{"key":"S1351324923000190_ref53","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00155-0_2"},{"key":"S1351324923000190_ref2","doi-asserted-by":"crossref","unstructured":"\u00c1cs, J. , K\u00e1d\u00e1r, \u00c1. and Kornai, A. (2021). Subword pooling makes a difference In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, pp. 2284\u20132295.","DOI":"10.18653\/v1\/2021.eacl-main.194"},{"key":"S1351324923000190_ref41","volume-title":"Meaningful Differences in the Everyday Experience of Young American Children","author":"Hart","year":"1995"},{"key":"S1351324923000190_ref49","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00300"},{"key":"S1351324923000190_ref59","unstructured":"Lapointe, S. (1992). Life on the edge: Argument in favor of an autolexical account of edge inflections In Proceedings of the Chicago Linguistic Society, 28, Chicago, IL: Chicago Linguistic Society. pp. 318\u2013332."},{"key":"S1351324923000190_ref42","doi-asserted-by":"crossref","unstructured":"Hewitt, J. , Ethayarajh, K. , Liang, P. and Manning, C. (2021). Conditional probing: Measuring usable information beyond a baseline In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic: Association for Computational Linguistics, pp. 1626\u20131639.","DOI":"10.18653\/v1\/2021.emnlp-main.122"},{"key":"S1351324923000190_ref10","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1080"},{"key":"S1351324923000190_ref23","unstructured":"Conneau, A. and Kiela, D. (2018). SentEval: An evaluation toolkit for universal sentence representations In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan: European Language Resources Association (ELRA)."},{"key":"S1351324923000190_ref90","unstructured":"Tenney, I. , Xia, P. , Chen, B. , Wang, A. , Poliak, A. , McCoy, R. T. , Kim, N. , Durme, B. V. , Bowman, S. R. , Das, D. and Pavlick, E. (2019b). What do you learn from context? Probing for sentence structure in contextualized word representations In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\u20139, 2019. OpenReview.net."},{"key":"S1351324923000190_ref46","doi-asserted-by":"publisher","DOI":"10.1075\/sihols.23"},{"key":"S1351324923000190_ref60","unstructured":"Lapointe, S. G. (1990). Edge features in GPSG In Papers from the 26th Regional Meeting of the Chicago Linguistic Society, Chicago Linguistic Society, pp. 221\u2013235."},{"key":"S1351324923000190_ref75","doi-asserted-by":"crossref","unstructured":"Peters, M. E. , Neumann, M. , Iyyer, M. , Gardner, M. , Clark, C. , Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, New Orleans, LA: Association for Computational Linguistics, pp. 2227\u20132237.","DOI":"10.18653\/v1\/N18-1202"},{"key":"S1351324923000190_ref31","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN: Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S1351324923000190_ref51","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2598339"},{"key":"S1351324923000190_ref8","unstructured":"Arps, D. , Samih, Y. , Kallmeyer, L. and Sajjad, H. (2022). Probing for constituency structure in neural language models, arXiv: 2204.06201 [cs]."},{"key":"S1351324923000190_ref68","doi-asserted-by":"crossref","unstructured":"Mikhailov, V. , Serikov, O. and Artemova, E. (2021). Morph call: Probing morphosyntactic content of multilingual transformers In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, Association for Computational Linguistics, pp. 97\u2013121.","DOI":"10.18653\/v1\/2021.sigtyp-1.10"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324923000190","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T08:32:37Z","timestamp":1726821157000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324923000190\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,25]]},"references-count":96,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["S1351324923000190"],"URL":"https:\/\/doi.org\/10.1017\/s1351324923000190","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,25]]},"assertion":[{"value":"\u00a9 The Author(s), 2023. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}