{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T07:14:33Z","timestamp":1774077273108,"version":"3.50.1"},"reference-count":91,"publisher":"Association for Natural Language Processing","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Journal of Natural Language Processing"],"published-print":{"date-parts":[[2026]]},"DOI":"10.5715\/jnlp.33.76","type":"journal-article","created":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T22:13:09Z","timestamp":1773526389000},"page":"76-108","source":"Crossref","is-referenced-by-count":0,"title":["Redundancy, Isotropy, and Intrinsic Dimensionalityof Prompt-based Text Embeddings","\u30d7\u30ed\u30f3\u30d7\u30c8\u306b\u57fa\u3065\u304f\u30c6\u30ad\u30b9\u30c8\u57cb\u3081\u8fbc\u307f\u306e\u30bf\u30b9\u30af\u306b\u3088\u308b\u5197\u9577\u6027\uff0c\u7b49\u65b9\u6027\uff0c\u56fa\u6709\u6b21\u5143\u306e\u9055\u3044"],"prefix":"10.5715","volume":"33","author":[{"given":"Hayato","family":"Tsukagoshi","sequence":"first","affiliation":[{"name":"Nagoya University"}]},{"given":"Ryohei","family":"Sasano","sequence":"additional","affiliation":[{"name":"Nagoya University"}]}],"member":"3685","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"Abdi, H. and Williams, L. J. (2010). \u201cPrincipal component analysis.\u201d <i>WIREs Computational Statistics<\/i>, 2 (4), pp. 433\u2013459.","DOI":"10.1002\/wics.101"},{"key":"2","doi-asserted-by":"crossref","unstructured":"Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Lopez-Gazpio, I., Maritxalar, M., Mihalcea, R., Rigau, G., Uria, L., and Wiebe, J. (2015). \u201cSemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability.\u201d In <i>Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval)<\/i>, pp. 252\u2013263.","DOI":"10.18653\/v1\/S15-2045"},{"key":"3","doi-asserted-by":"crossref","unstructured":"Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W., Mihalcea, R., Rigau, G., and Wiebe, J. (2014). \u201cSemEval-2014 Task 10: Multilingual Semantic Textual Similarity.\u201d In <i>Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval)<\/i>, pp. 81\u201391.","DOI":"10.3115\/v1\/S14-2010"},{"key":"4","doi-asserted-by":"crossref","unstructured":"Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., and Wiebe, J. (2016). \u201cSemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation.\u201d In <i>Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval)<\/i>, pp. 497\u2013511.","DOI":"10.18653\/v1\/S16-1081"},{"key":"5","unstructured":"Agirre, E., Cer, D., Diab, M., and Gonzalez-Agirre, A. (2012). \u201cSemEval-2012 Task 6: A Pilot on Semantic Textual Similarity.\u201d In <i>*SEM 2012: The 1st Joint Conference on Lexical and Computational Semantics \u2013 Semantic Evaluation (SemEval)<\/i>, pp. 385\u2013393."},{"key":"6","unstructured":"Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., and Guo, W. (2013). \u201c*SEM 2013 shared task: Semantic Textual Similarity.\u201d In <i>2nd Joint Conference on Lexical and Computational Semantics (*SEM)<\/i>, pp. 32\u201343."},{"key":"7","doi-asserted-by":"crossref","unstructured":"Ait-Saada, M. and Nadif, M. (2023). \u201cIs Anisotropy Truly Harmful? A Case Study on Text Clustering.\u201d In <i>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 1194\u20131203.","DOI":"10.18653\/v1\/2023.acl-short.103"},{"key":"8","unstructured":"Arora, S., Liang, Y., and Ma, T. (2017). \u201cA Simple but Tough-to-Beat Baseline for Sentence Embeddings.\u201d In <i>International Conference on Learning Representations (ICLR)<\/i>."},{"key":"9","doi-asserted-by":"crossref","unstructured":"Asai, A., Schick, T., Lewis, P., Chen, X., Izacard, G., Riedel, S., Hajishirzi, H., and Yih, W.-t (2023). \u201cTask-aware Retrieval with Instructions.\u201d In <i>Findings of the Association for Computational Linguistics: ACL 2023<\/i>, pp. 3650\u20133675.","DOI":"10.18653\/v1\/2023.findings-acl.225"},{"key":"10","unstructured":"BehnamGhader, P., Adlakha, V., Mosbach, M., Bahdanau, D., Chapados, N., and Reddy, S. (2024). \u201cLLM2Vec: Large Language Models Are Secretly Powerful Text Encoders.\u201d In <i>1st Conference on Language Modeling (COLM)<\/i>."},{"key":"11","doi-asserted-by":"crossref","unstructured":"Bowman, S. R., Angeli, G., Potts, C., and Manning, C. D. (2015). \u201cA Large Annotated Corpus for Learning Natural Language Inference.\u201d In <i>Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 632\u2013642.","DOI":"10.18653\/v1\/D15-1075"},{"key":"12","doi-asserted-by":"crossref","unstructured":"Bruske, J. and Sommer, G. (1998). \u201cIntrinsic Dimensionality Estimation with Optimally Topology Preserving Maps.\u201d <i>IEEE Transactions on Pattern Analysis and Machine Intelligence<\/i>, 20 (5), pp. 572\u2013575.","DOI":"10.1109\/34.682189"},{"key":"13","doi-asserted-by":"crossref","unstructured":"Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., and Specia, L. (2017). \u201cSemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation.\u201d In <i>Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval)<\/i>, pp. 1\u201314.","DOI":"10.18653\/v1\/S17-2001"},{"key":"14","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020). \u201cUnsupervised Cross-lingual Representation Learning at Scale.\u201d In <i>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 8440\u20138451.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"15","doi-asserted-by":"crossref","unstructured":"Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bordes, A. (2017). \u201cSupervised Learning of Universal Sentence Representations from Natural Language Inference Data.\u201d In <i>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 670\u2013680.","DOI":"10.18653\/v1\/D17-1070"},{"key":"16","doi-asserted-by":"crossref","unstructured":"de Souza P. Moreira, G., Osmulski, R., Xu, M., Ak, R., Schifferer, B., and Oldridge, E. (2024). \u201cNV-Retriever: Improving Text Embedding Models with Effective Hard-Negative Mining.\u201d <i>arXiv preprint arXiv:2407.15831<\/i>.","DOI":"10.1145\/3746252.3761254"},{"key":"17","doi-asserted-by":"crossref","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). \u201cBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.\u201d In <i>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)<\/i>, pp. 4171\u20134186.","DOI":"10.18653\/v1\/N19-1423"},{"key":"18","unstructured":"Dinu, G., Barrett, C., Xiang, Y., Calvo, M. R., Currey, A., and Niu, X. (2025). \u201cEffective Post-Training Embedding Compression Via Temperature Control in Contrastive Training.\u201d In <i>International Conference on Learning Representations (ICLR)<\/i>."},{"key":"19","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K. (2018). \u201cUnsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline.\u201d In <i>Proceedings of the 3rd Workshop on Representation Learning for NLP (RepL4NLP)<\/i>, pp. 91\u2013100.","DOI":"10.18653\/v1\/W18-3012"},{"key":"20","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K. (2019). \u201cHow Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings.\u201d In Inui, K., Jiang, J., Ng, V., and Wan, X. (Eds.), <i>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<\/i>, pp. 55\u201365.","DOI":"10.18653\/v1\/D19-1006"},{"key":"21","doi-asserted-by":"crossref","unstructured":"Facco, E., d\u2019Errico, M., Rodriguez, A., and Laio, A. (2017). \u201cEstimating the Intrinsic Dimension of Datasets by a Minimal Neighborhood Information.\u201d <i>Scientific Reports<\/i>, 7.","DOI":"10.1038\/s41598-017-11873-y"},{"key":"22","doi-asserted-by":"crossref","unstructured":"Fukunaga, K. and Olsen, D. R. (1971). \u201cAn Algorithm for Finding Intrinsic Dimensionality of Data.\u201d <i>IEEE Transactions on Computers<\/i>, C-20 (2), pp. 176\u2013183.","DOI":"10.1109\/T-C.1971.223208"},{"key":"23","doi-asserted-by":"crossref","unstructured":"Gao, T., Yao, X., and Chen, D. (2021). \u201cSimCSE: Simple Contrastive Learning of Sentence Embeddings.\u201d In <i>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 6894\u20136910.","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"24","unstructured":"Geigle, G., Reimers, N., R\u016bckl\u00e9, A., and Gurevych, I. (2021). \u201cTWEAC: Transformer with Extendable QA Agent Classifiers.\u201d."},{"key":"25","doi-asserted-by":"crossref","unstructured":"Hasibi, F., Nikolaev, F., Xiong, C., Balog, K., Bratsberg, S. E., Kotov, A., and Callan, J. (2017). \u201cDBpedia-Entity V2: A Test Collection for Entity Search.\u201d In <i>Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval<\/i>, SIGIR \u201917, pp. 1265\u20131268. ACM.","DOI":"10.1145\/3077136.3080751"},{"key":"26","doi-asserted-by":"crossref","unstructured":"Hill, F., Cho, K., and Korhonen, A. (2016). \u201cLearning Distributed Representations of Sentences from Unlabelled Data.\u201d In <i>Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)<\/i>, pp. 1367\u20131377.","DOI":"10.18653\/v1\/N16-1162"},{"key":"27","unstructured":"Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022). \u201cLoRA: Low-Rank Adaptation of Large Language Models.\u201d In <i>International Conference on Learning Representations (ICLR)<\/i>."},{"key":"28","doi-asserted-by":"crossref","unstructured":"Huang, J., Tang, D., Zhong, W., Lu, S., Shou, L., Gong, M., Jiang, D., and Duan, N. (2021). \u201cWhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.\u201d In <i>Findings of the Association for Computational Linguistics: EMNLP 2021<\/i>, pp. 238\u2013244.","DOI":"10.18653\/v1\/2021.findings-emnlp.23"},{"key":"29","unstructured":"Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., and Sayed, W. E. (2023). \u201cMistral 7B.\u201d <i>arXiv preprint arXiv:2310.06825<\/i>."},{"key":"30","doi-asserted-by":"crossref","unstructured":"Jiang, T., Huang, S., Luan, Z., Wang, D., and Zhuang, F. (2024). \u201cScaling Sentence Embeddings with Large Language Models.\u201d In <i>Findings of the Association for Computational Linguistics: EMNLP 2024<\/i>, pp. 3182\u20133196.","DOI":"10.18653\/v1\/2024.findings-emnlp.181"},{"key":"31","doi-asserted-by":"crossref","unstructured":"Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W.-t. (2020). \u201cDense Passage Retrieval for Open-Domain Question Answering.\u201d In <i>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 3784\u20133803.","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"32","doi-asserted-by":"crossref","unstructured":"Keung, P., Lu, Y., Szarvas, G., and Smith, N. A. (2020). \u201cThe Multilingual Amazon Reviews Corpus.\u201d In <i>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 4563\u20134568.","DOI":"10.18653\/v1\/2020.emnlp-main.369"},{"key":"33","doi-asserted-by":"crossref","unstructured":"Kim, J., Lee, D., and Hwang, S.-w. (2024). \u201cHIL: Hybrid Isotropy Learning for Zero-shot Performance in Dense retrieval.\u201d In Duh, K., Gomez, H., and Bethard, S. (Eds.), <i>Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)<\/i>, pp. 7892\u20137903.","DOI":"10.18653\/v1\/2024.naacl-long.437"},{"key":"34","unstructured":"Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., and Fidler, S. (2015). \u201cSkip-Thought Vectors.\u201d In <i>Advances in Neural Information Processing Systems (NIPS)<\/i>, pp. 3294\u20133302."},{"key":"35","unstructured":"Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S. M., Jain, P., and Farhadi, A. (2022). \u201cMatryoshka Representation Learning.\u201d In <i>Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)<\/i>."},{"key":"36","doi-asserted-by":"crossref","unstructured":"Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A. M., Uszkoreit, J., Le, Q., and Petrov, S. (2019). \u201cNatural Questions: A Benchmark for Question Answering Research.\u201d <i>Transactions of the Association for Computational Linguistics (TACL)<\/i>, 7, pp. 452\u2013466.","DOI":"10.1162\/tacl_a_00276"},{"key":"37","unstructured":"Lee, C., Roy, R., Xu, M., Raiman, J., Shoeybi, M., Catanzaro, B., and Ping, W. (2024a). \u201cNV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.\u201d <i>arXiv:2405.17428<\/i>."},{"key":"38","unstructured":"Lee, J., Dai, Z., Ren, X., Chen, B., Cer, D., Cole, J. R., Hui, K., Boratko, M., Kapadia, R., Ding, W., Luan, Y., Duddu, S. M. K., Abrego, G. H., Shi, W., Gupta, N., Kusupati, A., Jain, P., Jonnalagadda, S. R., Chang, M.-W., and Naim, I. (2024b). \u201cGecko: Versatile Text Embeddings Distilled from Large Language Models.\u201d <i>arXiv preprint arXiv:2403.20327<\/i>."},{"key":"39","doi-asserted-by":"crossref","unstructured":"Lei, Y., Wu, D., Zhou, T., Shen, T., Cao, Y., Tao, C., and Yates, A. (2024). \u201cMeta-Task Prompting Elicits Embeddings from Large Language Models.\u201d In <i>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 10141\u201310157.","DOI":"10.18653\/v1\/2024.acl-long.546"},{"key":"40","unstructured":"Levina, E. and Bickel, P. (2004). \u201cMaximum Likelihood Estimation of Intrinsic Dimension.\u201d In Saul, L., Weiss, Y., and Bottou, L. (Eds.), <i>Advances in Neural Information Processing Systems (NIPS)<\/i>, Vol. 17, pp. 777\u2013784."},{"key":"41","doi-asserted-by":"crossref","unstructured":"Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L. (2020). \u201cOn the Sentence Embeddings from Pre-trained Language Models.\u201d In <i>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 9119\u20139130.","DOI":"10.18653\/v1\/2020.emnlp-main.733"},{"key":"42","unstructured":"Li, X., Li, Z., Li, J., Xie, H., and Li, Q. (2025). \u201cESE: Espresso Sentence Embeddings.\u201d In <i>The 13th International Conference on Learning Representations (ICLR)<\/i>."},{"key":"43","unstructured":"Li, Z., Zhang, X., Zhang, Y., Long, D., Xie, P., and Zhang, M. (2023). \u201cTowards General Text Embeddings with Multi-stage Contrastive Learning.\u201d <i>arXiv preprint arXiv:2308.03281<\/i>."},{"key":"44","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). \u201cRoBERTa: A Robustly Optimized BERT Pretraining Approach.\u201d <i>arXiv preprint arXiv:1907.11692<\/i>."},{"key":"45","unstructured":"Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011). \u201cLearning Word Vectors for Sentiment Analysis.\u201d In <i>Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT)<\/i>, pp. 142\u2013150."},{"key":"46","unstructured":"Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., and Zamparelli, R. (2014). \u201cA SICK Cure for The Evaluation of Compositional Distributional Semantic Models.\u201d In <i>Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)<\/i>, pp. 216\u2013223."},{"key":"47","doi-asserted-by":"crossref","unstructured":"McAuley, J. and Leskovec, J. (2013). \u201cHidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text.\u201d In <i>Proceedings of the 7th ACM Conference on Recommender Systems<\/i>, pp. 165\u2013172.","DOI":"10.1145\/2507157.2507163"},{"key":"48","unstructured":"McInnes, L., Healy, J., and Melville, J. (2020). \u201cUMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.\u201d <i>arXiv preprint arXiv:1802.03426<\/i>."},{"key":"49","doi-asserted-by":"crossref","unstructured":"Mickus, T., Gr\u00f6nroos, S.-A., and Attieh, J. (2024). \u201cIsotropy, Clusters, and Classifiers.\u201d In <i>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 75\u201384.","DOI":"10.18653\/v1\/2024.acl-short.7"},{"key":"50","unstructured":"Mu, J. and Viswanath, P. (2018). \u201cAll-but-the-Top: Simple and Effective Postprocessing for Word Representations.\u201d In <i>International Conference on Learning Representations (ICLR)<\/i>."},{"key":"51","unstructured":"Muennighoff, N. (2022). \u201cSGPT: GPT Sentence Embeddings for Semantic Search.\u201d <i>arXiv preprint arXiv:2202.08904<\/i>."},{"key":"52","unstructured":"Muennighoff, N., Su, H., Wang, L., Yang, N., Wei, F., Yu, T., Singh, A., and Kiela, D. (2024). \u201cGenerative Representational Instruction Tuning.\u201d <i>arXiv preprint arXiv:2402.09906<\/i>."},{"key":"53","doi-asserted-by":"crossref","unstructured":"Muennighoff, N., Tazi, N., Magne, L., and Reimers, N. (2023). \u201cMTEB: Massive Text Embedding Benchmark.\u201d In <i>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)<\/i>, pp. 2014\u20132037.","DOI":"10.18653\/v1\/2023.eacl-main.148"},{"key":"54","unstructured":"Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., and Deng, L. (2016). \u201cMS MARCO: A Human Generated MAchine Reading COmprehension Dataset.\u201d <i>arXiv preprint arXiv:1611.09268<\/i>."},{"key":"55","doi-asserted-by":"crossref","unstructured":"Ni, J., Hernandez Abrego, G., Constant, N., Ma, J., Hall, K., Cer, D., and Yang, Y. (2022a). \u201cSentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models.\u201d In <i>Findings of the Association for Computational Linguistics: ACL 2022<\/i>, pp. 1864\u20131874.","DOI":"10.18653\/v1\/2022.findings-acl.146"},{"key":"56","doi-asserted-by":"crossref","unstructured":"Ni, J., Qu, C., Lu, J., Dai, Z., Hernandez Abrego, G., Ma, J., Zhao, V., Luan, Y., Hall, K., Chang, M.-W., and Yang, Y. (2022b). \u201cLarge Dual Encoders Are Generalizable Retrievers.\u201d In <i>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 9844\u20139855.","DOI":"10.18653\/v1\/2022.emnlp-main.669"},{"key":"57","unstructured":"Nussbaum, Z., Morris, J. X., Duderstadt, B., and Mulyar, A. (2024). \u201cNomic Embed: Training a Reproducible Long Context Text Embedder.\u201d <i>arXiv preprint arXiv:2402.01613<\/i>."},{"key":"58","doi-asserted-by":"crossref","unstructured":"O\u2019Neill, J., Rozenshtein, P., Kiryo, R., Kubota, M., and Bollegala, D. (2021). \u201cI Wish I Would Have Loved This One, But I Didn\u2019t \u2013 A Multilingual Dataset for Counterfactual Detection in Product Review.\u201d In <i>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 7092\u20137108.","DOI":"10.18653\/v1\/2021.emnlp-main.568"},{"key":"59","unstructured":"OpenAI (2024). \u201cGPT-4 Technical Report.\u201d <i>arXiv preprint arXiv:2303.08774<\/i>."},{"key":"60","doi-asserted-by":"crossref","unstructured":"Pagliardini, M., Gupta, P., and Jaggi, M. (2018). \u201cUnsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features.\u201d In <i>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)<\/i>, pp. 528\u2013540.","DOI":"10.18653\/v1\/N18-1049"},{"key":"61","doi-asserted-by":"crossref","unstructured":"Reimers, N. and Gurevych, I. (2019). \u201cSentence-BERT: Sentence Embeddings using Siamese BERT-Networks.\u201d In <i>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)<\/i>, pp. 3982\u20133992.","DOI":"10.18653\/v1\/D19-1410"},{"key":"62","unstructured":"Rosenberg, A. and Hirschberg, J. (2007). \u201cV-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure.\u201d In <i>Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)<\/i>, pp. 410\u2013420."},{"key":"63","doi-asserted-by":"crossref","unstructured":"Rousseeuw, P. J. (1987). \u201cSilhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.\u201d <i>Journal of Computational and Applied Mathematics<\/i>, 20, pp. 53\u201365.","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"64","unstructured":"Rudman, W. and Eickhoff, C. (2024). \u201cStable Anisotropic Regularization.\u201d In <i>The 12th International Conference on Learning Representations (ICLR)<\/i>."},{"key":"65","doi-asserted-by":"crossref","unstructured":"Rudman, W., Gillman, N., Rayne, T., and Eickhoff, C. (2022). \u201cIsoScore: Measuring the Uniformity of Embedding Space Utilization.\u201d In Muresan, S., Nakov, P., and Villavicencio, A. (Eds.), <i>Findings of the Association for Computational Linguistics: ACL 2022<\/i>, pp. 3325\u20133339.","DOI":"10.18653\/v1\/2022.findings-acl.262"},{"key":"66","doi-asserted-by":"crossref","unstructured":"Shen, D., Wang, G., Wang, W., Min, M. R., Su, Q., Zhang, Y., Li, C., Henao, R., and Carin, L. (2018). \u201cBaseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms.\u201d In <i>Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 440\u2013450.","DOI":"10.18653\/v1\/P18-1041"},{"key":"67","unstructured":"Springer, J. M., Kotha, S., Fried, D., Neubig, G., and Raghunathan, A. (2024). \u201cRepetition Improves Language Model Embeddings.\u201d <i>arXiv preprint arXiv:2402.15449<\/i>."},{"key":"68","doi-asserted-by":"crossref","unstructured":"Su, H., Shi, W., Kasai, J., Wang, Y., Hu, Y., Ostendorf, M., Yih, W.-t., Smith, N. A., Zettlemoyer, L., and Yu, T. (2023). \u201cOne Embedder, Any Task: Instruction-Finetuned Text Embeddings.\u201d In <i>Findings of the Association for Computational Linguistics: ACL 2023<\/i>, pp. 1102\u20131121.","DOI":"10.18653\/v1\/2023.findings-acl.71"},{"key":"69","unstructured":"Su, J., Cao, J., Liu, W., and Ou, Y. (2021). \u201cWhitening Sentence Representations for Better Semantics and Faster Retrieval.\u201d <i>arXiv preprint arXiv:2103.15316<\/i>."},{"key":"70","doi-asserted-by":"crossref","unstructured":"Tenenbaum, J. B., de Silva, V., and Langford, J. C. (2000). \u201cA Global Geometric Framework for Nonlinear Dimensionality Reduction.\u201d <i>Science<\/i>, 290 (5500), pp. 2319\u20132323.","DOI":"10.1126\/science.290.5500.2319"},{"key":"71","unstructured":"Tsukagoshi, H. and Sasano, R. (2024). \u201cRuri: Japanese General Text Embeddings.\u201d <i>arXiv preprint arXiv:2409.07737<\/i>."},{"key":"72","doi-asserted-by":"crossref","unstructured":"Tsukagoshi, H. and Sasano, R. (2025). \u201cRedundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings.\u201d In <i>Findings of the Association for Computational Linguistics: ACL 2025<\/i>, p. (to be appear).","DOI":"10.18653\/v1\/2025.findings-acl.1330"},{"key":"73","unstructured":"\u585a\u8d8a\u99ff\uff0c\u7b39\u91ce\u907c\u5e73 (2025). \u30d7\u30ed\u30f3\u30d7\u30c8\u306b\u57fa\u3065\u304f\u30c6\u30ad\u30b9\u30c8\u57cb\u3081\u8fbc\u307f\u306e\u30bf\u30b9\u30af\u306b\u3088\u308b\u5197\u9577\u6027\u306e\u9055\u3044. \u8a00\u8a9e\u51e6\u7406\u5b66\u4f1a \u7b2c 31 \u56de\u5e74\u6b21\u5927\u4f1a. [H. Tsukagoshi et al. (2025). Puromputo ni Motozuku Tekisuto Umekomi no Tasuku ni yoru Jochosei no Chigai. Gengoshorigakkai Dai 31 Kai Yokoshuu, pp. 1769\u20131774.]."},{"key":"74","doi-asserted-by":"crossref","unstructured":"Tsukagoshi, H., Sasano, R., and Takeda, K. (2021). \u201cDefSent: Sentence Embeddings using Definition Sentences.\u201d In <i>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)<\/i>, pp. 411\u2013418.","DOI":"10.18653\/v1\/2021.acl-short.52"},{"key":"75","unstructured":"Tulchinskii, E., Kuznetsov, K., Laida, K., Cherniavskii, D., Nikolenko, S., Burnaev, E., Barannikov, S., and Piontkovskaya, I. (2023). \u201cIntrinsic Dimension Estimation for Robust Detection of AI-Generated Texts.\u201d In <i>37th Conference on Neural Information Processing Systems (NeurIPS)<\/i>."},{"key":"76","unstructured":"van der Maaten, L. and Hinton, G. E. (2008). \u201cVisualizing Data using t-SNE.\u201d <i>Journal of Machine Learning Research<\/i>, 9, pp. 2579\u20132605."},{"key":"77","doi-asserted-by":"crossref","unstructured":"Wang, F. and Liu, H. (2021). \u201cUnderstanding the Behaviour of Contrastive Loss.\u201d. pp. 2495\u20132504.","DOI":"10.1109\/CVPR46437.2021.00252"},{"key":"78","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhang, H., and Yu, D. (2023). \u201cOn the Dimensionality of Sentence Embeddings.\u201d In <i>Findings of the Association for Computational Linguistics: EMNLP 2023<\/i>, pp. 10344\u201310354.","DOI":"10.18653\/v1\/2023.findings-emnlp.694"},{"key":"79","unstructured":"Wang, L., Yang, N., Huang, X., Jiao, B., Yang, L., Jiang, D., Majumder, R., and Wei, F. (2022). \u201cText Embeddings by Weakly-Supervised Contrastive Pre-training.\u201d <i>arXiv preprint arXiv:2212.03533<\/i>."},{"key":"80","doi-asserted-by":"crossref","unstructured":"Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024a). \u201cImproving Text Embeddings with Large Language Models.\u201d In <i>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 11897\u201311916.","DOI":"10.18653\/v1\/2024.acl-long.642"},{"key":"81","unstructured":"Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., and Wei, F. (2024b). \u201cMultilingual E5 Text Embeddings: A Technical Report.\u201d <i>arXiv preprint arXiv:2402.05672<\/i>."},{"key":"82","doi-asserted-by":"crossref","unstructured":"Williams, A., Nangia, N., and Bowman, S. (2018). \u201cA Broad-Coverage Challenge Corpus for Sentence Understanding through Inference.\u201d In <i>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)<\/i>, pp. 1112\u20131122.","DOI":"10.18653\/v1\/N18-1101"},{"key":"83","doi-asserted-by":"crossref","unstructured":"Xiao, C., Long, Y., and Al Moubayed, N. (2023). \u201cOn Isotropy, Contextualization and Learning Dynamics of Contrastive-based Sentence Representation Learning.\u201d In <i>Findings of the Association for Computational Linguistics: ACL 2023<\/i>, pp. 12266\u201312283.","DOI":"10.18653\/v1\/2023.findings-acl.778"},{"key":"84","doi-asserted-by":"crossref","unstructured":"Xiao, S., Liu, Z., Zhang, P., and Muennighoff, N. (2024). \u201cC-Pack: Packaged Resources To Advance General Chinese Embedding.\u201d In <i>The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)<\/i>, pp. 641\u2013649.","DOI":"10.1145\/3626772.3657878"},{"key":"85","unstructured":"Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., Zhang, J., Ma, J., Yang, J., Xu, J., Zhou, J., Bai, J., He, J., Lin, J., Dang, K., Lu, K., Chen, K., Yang, K., Li, M., Xue, M., Ni, N., Zhang, P., Wang, P., Peng, R., Men, R., Gao, R., Lin, R., Wang, S., Bai, S., Tan, S., Zhu, T., Li, T., Liu, T., Ge, W., Deng, X., Zhou, X., Ren, X., Zhang, X., Wei, X., Ren, X., Liu, X., Fan, Y., Yao, Y., Zhang, Y., Wan, Y., Chu, Y., Liu, Y., Cui, Z., Zhang, Z., Guo, Z., and Fan, Z. (2024). \u201cQwen2 Technical Report.\u201d <i>arXiv preprint arXiv:2407.10671<\/i>."},{"key":"86","doi-asserted-by":"crossref","unstructured":"Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., and Manning, C. D. (2018). \u201cHotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering.\u201d In <i>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)<\/i>, pp. 2369\u20132380.","DOI":"10.18653\/v1\/D18-1259"},{"key":"87","doi-asserted-by":"crossref","unstructured":"Yano, C., Fukuchi, A., Fukasawa, S., Tachibana, H., and Watanabe, Y. (2024). \u201cMultilingual Sentence-T5: Scalable Sentence Encoders for Multilingual Applications.\u201d In <i>Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)<\/i>, pp. 11849\u201311858.","DOI":"10.63317\/5jbqwe45kajm"},{"key":"88","doi-asserted-by":"crossref","unstructured":"Yokoi, S., Bao, H., Kurita, H., and Shimodaira, H. (2024). \u201cZipfian Whitening.\u201d In <i>Advances in Neural Information Processing Systems (NeurIPS)<\/i>, Vol. 37, pp. 122259\u2013122291.","DOI":"10.52202\/079017-3885"},{"key":"89","unstructured":"Zhang, X., Thakur, N., Ogundepo, O., Kamalloo, E., Alfonso-Hermelo, D., Li, X., Liu, Q., Rezagholizadeh, M., and Lin, J. (2022). \u201cMaking a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages.\u201d <i>arXiv preprint arXiv:2210.09984<\/i>."},{"key":"90","doi-asserted-by":"crossref","unstructured":"Zhuo, W., Sun, Y., Wang, X., Zhu, L., and Yang, Y. (2023). \u201cWhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings.\u201d In <i>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)<\/i>, pp. 12135\u201312148.","DOI":"10.18653\/v1\/2023.acl-long.677"},{"key":"91","doi-asserted-by":"crossref","unstructured":"\u4e0a\u7530\u4eae\uff0c\u6a2a\u4e95\u7965 (2024). \u8a00\u8a9e\u306e\u56fa\u6709\u6b21\u5143\u3092\u6e2c\u308b. \u8a00\u8a9e\u51e6\u7406\u5b66\u4f1a \u7b2c 30 \u56de\u5e74\u6b21\u5927\u4f1a. [R. Ueda et al. (2024). Gengo no Koyujigen o Hakaru. Gengoshorigakkai Dai 30 Kai Yokoshuu, pp. 1605\u20131609].","DOI":"10.5715\/jnlp.30.1128"}],"container-title":["Journal of Natural Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/jnlp\/33\/1\/33_76\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T03:54:06Z","timestamp":1774065246000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/jnlp\/33\/1\/33_76\/_article\/-char\/ja\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026]]},"references-count":91,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026]]}},"URL":"https:\/\/doi.org\/10.5715\/jnlp.33.76","relation":{},"ISSN":["1340-7619","2185-8314"],"issn-type":[{"value":"1340-7619","type":"print"},{"value":"2185-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026]]}}}