{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:41:52Z","timestamp":1765546912424,"version":"build-2065373602"},"reference-count":35,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,6,9]],"date-time":"2022-06-09T00:00:00Z","timestamp":1654732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Innovation Strategy Salzburg (WISS 2025) project \u201cIDA-Lab Salzburg\u201d","award":["20102-F1901166-KZP"],"award-info":[{"award-number":["20102-F1901166-KZP"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Representations from common pre-trained language models have been shown to suffer from the degeneration problem, i.e., they occupy a narrow cone in latent space. This problem can be addressed by enforcing isotropy in latent space. In analogy with variational autoencoders, we suggest applying a token-level variational loss to a Transformer architecture and optimizing the standard deviation of the prior distribution in the loss function as the model parameter to increase isotropy. The resulting latent space is complete and interpretable: any given point is a valid embedding and can be decoded into text again. This allows for text manipulations such as paraphrase generation directly in latent space. Surprisingly, features extracted at the sentence level also show competitive results on benchmark classification tasks.<\/jats:p>","DOI":"10.3390\/make4020025","type":"journal-article","created":{"date-parts":[[2022,6,10]],"date-time":"2022-06-10T10:25:12Z","timestamp":1654856712000},"page":"542-555","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Benefits from Variational Regularization in Language Models"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1721-0453","authenticated-orcid":false,"given":"Cornelia","family":"Ferner","sequence":"first","affiliation":[{"name":"Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein Sued 1, 5412 Puch\/Hallein, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3297-7997","authenticated-orcid":false,"given":"Stefan","family":"Wegenkittl","sequence":"additional","affiliation":[{"name":"Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein Sued 1, 5412 Puch\/Hallein, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,9]]},"reference":[{"key":"ref_1","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K. (2019, January 3\u20137). How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1006"},{"key":"ref_3","unstructured":"Gao, J., He, D., Tan, X., Qin, T., Wang, L., and Liu, T. (2019, January 6\u20139). Representation Degeneration Problem in Training Natural Language Generation Models. Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA."},{"key":"ref_4","unstructured":"Wang, L., Huang, J., Huang, K., Hu, Z., Wang, G., and Gu, Q. (2020, January 26\u201330). Improving Neural Language Generation with Spectrum Control. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L. (2020, January 16\u201320). On the Sentence Embeddings from Pre-trained Language Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.733"},{"key":"ref_6","unstructured":"Kingma, D.P., and Welling, M. (2014, January 14\u201316). Auto-Encoding Variational Bayes. Proceedings of the International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_7","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_8","unstructured":"Gururangan, S., Dang, T., Card, D., and Smith, N.A. (August, January 28). Variational Pretraining for Semi-supervised Text Classification. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_9","unstructured":"Mahabadi, R.K., Belinkov, Y., and Henderson, J. (2021, January 3\u20137). Variational Information Bottleneck for Effective Low-Resource Fine-Tuning. Proceedings of the International Conference on Learning Representations, Virtual."},{"key":"ref_10","unstructured":"Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (2018). Learning Semantic Similarity in a Continuous Space. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhao, T., Lee, K., and Eskenazi, M. (2018, January 15\u201320). Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1101"},{"key":"ref_12","unstructured":"Miao, Y., Yu, L., and Blunsom, P. (2016, January 20\u201322). Neural Variational Inference for Text Processing. Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916), New York, NY, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, T., and Wan, X. (2019, January 10\u201316). T-CVAE: Transformer-Based Conditioned Variational Autoencoder for Story Completion. Proceedings of the IJCAI, Macao, China.","DOI":"10.24963\/ijcai.2019\/727"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shu, R., Lee, J., Nakayama, H., and Cho, K. (2020, January 7\u201312). Latent-variable Non-autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i05.6413"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., and Bengio, S. (2016, January 11\u201312). Generating Sentences from a Continuous Space. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany.","DOI":"10.18653\/v1\/K16-1002"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Fang, L., Li, C., Gao, J., Dong, W., and Chen, C. (2019, January 3\u20137). Implicit Deep Latent Variable Models for Text Generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1407"},{"key":"ref_17","unstructured":"Yang, Z., Hu, Z., Salakhutdinov, R., and Berg-Kirkpatrick, T. (2017, January 6\u201311). Improved Variational Autoencoders for Text Modeling Using Dilated Convolutions. Proceedings of the 34th International Conference on Machine Learning (ICML\u201917), Sydney, NSW, Australia."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, D., and Liu, G. (2019, January 14\u201319). A Transformer-Based Variational Autoencoder for Sentence Generation. Proceedings of the International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary.","DOI":"10.1109\/IJCNN.2019.8852155"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, C., Gao, X., Li, Y., Peng, B., Li, X., Zhang, Y., and Gao, J. (2020, January 16\u201320). Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual.","DOI":"10.18653\/v1\/2020.emnlp-main.378"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, R., Li, X., Chen, G., and Lin, C. (2020, January 8\u201313). Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation. Proceedings of the 28th International Conference on Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.coling-main.216"},{"key":"ref_21","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,  Kaiser, \u0141, and Polosukhin, I. (2017). Attention is All you Need. Advances in Neural Information Processing Systems 30, Curran Associates, Inc."},{"key":"ref_22","unstructured":"Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017, January 24\u201326). beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_23","unstructured":"Odaibo, S. (2019). Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function. arXiv, Available online: arxiv:1907.08956."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Barrault, L., Bojar, O., Costa-Juss\u00e0, M., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Koehn, P., and Malmasi, S. (2019, January 1\u20132). Findings of the 2019 Conference on Machine Translation (WMT19). Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Florence, Italy.","DOI":"10.18653\/v1\/W19-5301"},{"key":"ref_25","unstructured":"Agirre, E., Cer, D., Diab, M., and Gonzalez-Agirre, A. (2012, January 7\u20138). SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics\u2014Volume 2. Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Montr\u00e9al, QC, Canada."},{"key":"ref_26","unstructured":"Lucas, J., Tucker, G., Grosse, R., and Norouzi, M. (2019, January 6\u20139). Understanding Posterior Collapse in Generative Latent Variable Models. Proceedings of the International Conference on Learning Representations, DeepGenStruct Workshop, New Orleans, LA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., and Birch, A. (2016, January 7\u201312). Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany.","DOI":"10.18653\/v1\/P16-1009"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wieting, J., and Gimpel, K. (2018, January 15\u201320). ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations. Proceedings of the 56th Annual Meeting of the Association for Computational LinguisticsMelbourne, Australia.","DOI":"10.18653\/v1\/P18-1042"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Gupta, A., Agarwal, A., Singh, P., and Rai, P. (2018, January 2\u20137). A deep generative framework for paraphrase generation. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11956"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Donahue, C., Lee, M., and Liang, P. (2020, January 5\u201310). Enabling Language Models to Fill in the Blanks. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.225"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wu, X., Zhang, T., Zang, L., Han, J., and Hu, S. (2019, January 10\u201316). Mask and Infill: Applying Masked Language Model for Sentiment Transfer. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization, Vienna, Austria, Macao, China.","DOI":"10.24963\/ijcai.2019\/732"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_33","unstructured":"Turc, I., Chang, M.W., Lee, K., and Toutanova, K. (2019). Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. arXiv, Available online: arxiv:1908.08962v2."},{"key":"ref_34","unstructured":"Conneau, A., and Kiela, D. (2018, January 7\u201312). SentEval: An Evaluation Toolkit for Universal Sentence Representations. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018); European Language Resources Association (ELRA), Miyazaki, Japan."},{"key":"ref_35","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/2\/25\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:26:58Z","timestamp":1760138818000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/4\/2\/25"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,9]]},"references-count":35,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["make4020025"],"URL":"https:\/\/doi.org\/10.3390\/make4020025","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2022,6,9]]}}}