{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T06:43:25Z","timestamp":1773816205729,"version":"3.50.1"},"reference-count":35,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,24]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Word2Vec is a prominent model for natural language processing tasks. Similar inspiration is found in distributed embeddings (word-vectors) in recent state-of-the-art deep neural networks. However, wrong combination of hyperparameters can produce embeddings with poor quality. The objective of this work is to empirically show that Word2Vec optimal combination of hyper-parameters exists and evaluate various combinations. We compare them with the publicly released, original Word2Vec embedding. Both intrinsic and extrinsic (downstream) evaluations are carried out, including named entity recognition and sentiment analysis. Our main contributions include showing that the best model is usually task-specific, high analogy scores do not necessarily correlate positively with <jats:italic>F<\/jats:italic>1 scores, and performance is not dependent on data size alone. If ethical considerations to save time, energy, and the environment are made, then relatively smaller corpora may do just as well or even better in some cases. Increasing the dimension size of embeddings after a point leads to poor quality or performance. In addition, using a relatively small corpus, we obtain better WordSim scores, corresponding Spearman correlation, and better downstream performances (with significance tests) compared to the original model, which is trained on a 100 billion-word corpus.<\/jats:p>","DOI":"10.1515\/comp-2022-0236","type":"journal-article","created":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T09:02:37Z","timestamp":1648717357000},"page":"134-141","source":"Crossref","is-referenced-by-count":30,"title":["Word2Vec: Optimal hyperparameters and their impact on natural language processing downstream tasks"],"prefix":"10.1515","volume":"12","author":[{"given":"Tosin","family":"Adewumi","sequence":"first","affiliation":[{"name":"ML Group, EISLAB, Department of Computer Science, Electrical and Space Engineering, Lule\u00e5 University of Technology , 97187 , Lule\u00e5 , Sweden"}]},{"given":"Foteini","family":"Liwicki","sequence":"additional","affiliation":[{"name":"ML Group, EISLAB, Department of Computer Science, Electrical and Space Engineering, Lule\u00e5 University of Technology , 97187 , Lule\u00e5 , Sweden"}]},{"given":"Marcus","family":"Liwicki","sequence":"additional","affiliation":[{"name":"ML Group, EISLAB, Department of Computer Science, Electrical and Space Engineering, Lule\u00e5 University of Technology , 97187 , Lule\u00e5 , Sweden"}]}],"member":"374","published-online":{"date-parts":[[2022,3,24]]},"reference":[{"key":"2022081707553227386_j_comp-2022-0236_ref_001","unstructured":"T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, 2013, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1301.3781."},{"key":"2022081707553227386_j_comp-2022-0236_ref_002","unstructured":"J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1810.04805."},{"key":"2022081707553227386_j_comp-2022-0236_ref_003","unstructured":"Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, et al., Roberta: a robustly optimized bert pretraining approach, 2019, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1907.11692."},{"key":"2022081707553227386_j_comp-2022-0236_ref_004","unstructured":"C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, et al., Exploring the limits of transfer learning with a unified text-to-text transformer, 2019, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1910.10683."},{"key":"2022081707553227386_j_comp-2022-0236_ref_005","unstructured":"D. Bahdanau, K. Cho, and Y. Bengio, \u201cNeural machine translation by jointly learning to align and translate,\u201d In: International Conference on Learning Representations, ICLR, 2015."},{"key":"2022081707553227386_j_comp-2022-0236_ref_006","doi-asserted-by":"crossref","unstructured":"M. L\u00e4ngkvist, L. Karlsson, and A. Loutfi, \u201cA review of unsupervised feature learning and deep learning for time-series modeling,\u201d Pattern Recogn. Lett., vol. 42, pp. 11\u201324, 2014.","DOI":"10.1016\/j.patrec.2014.01.008"},{"key":"2022081707553227386_j_comp-2022-0236_ref_007","unstructured":"B. Dhingra, H. Liu, R. Salakhutdinov, and W. W. Cohen, A comparative study of word embeddings for reading comprehension, 2017, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1703.00993."},{"key":"2022081707553227386_j_comp-2022-0236_ref_008","doi-asserted-by":"crossref","unstructured":"M. Naili, A. H. Chaibi, and H. H. BenGhezala, \u201cComparative study of word embedding methods in topic segmentation,\u201d Proc. Comput. Sci., vol. 112, pp. 340\u2013349, 2017.","DOI":"10.1016\/j.procs.2017.08.009"},{"key":"2022081707553227386_j_comp-2022-0236_ref_009","doi-asserted-by":"crossref","unstructured":"Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, et al., \u201cA comparison of word embeddings for the biomedical natural language processing,\u201d J. Biomed. Inform., vol. 87, pp. 12\u201320, 2018.","DOI":"10.1016\/j.jbi.2018.09.008"},{"key":"2022081707553227386_j_comp-2022-0236_ref_010","doi-asserted-by":"crossref","unstructured":"O. Levy, Y. Goldberg, and I. Dagan, \u201cImproving distributional similarity with lessons learned from word embeddings,\u201d Trans. Assoc. Comput. Linguist., vol. 3, pp. 211\u2013225, 2015.","DOI":"10.1162\/tacl_a_00134"},{"key":"2022081707553227386_j_comp-2022-0236_ref_011","doi-asserted-by":"crossref","unstructured":"Y. Zhang, Q. Chen, Z. Yang, H. Lin, and Z. Lu, \u201cBiowordvec, improving biomedical word embeddings with subword information and mesh,\u201d Scientif. Data, vol. 6, no. 1, pp. 1\u20139, 2019.","DOI":"10.1038\/s41597-019-0055-0"},{"key":"2022081707553227386_j_comp-2022-0236_ref_012","doi-asserted-by":"crossref","unstructured":"B. Wang, A. Wang, F. Chen, Y. Wang, and C. C. Jay Kuo, \u201cEvaluating word embedding models: methods and experimental results,\u201d APSIPA Trans. Signal Inform. Process, vol. 8, 2019. 10.1017\/ATSIP.2019.12.","DOI":"10.1017\/ATSIP.2019.12"},{"key":"2022081707553227386_j_comp-2022-0236_ref_013","unstructured":"J. Turian, L. Ratinov, and Y. Bengio, \u201cWord representations: a simple and general method for semi-supervised learning,\u201d In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2010, pp. 384\u2013394."},{"key":"2022081707553227386_j_comp-2022-0236_ref_014","unstructured":"T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, \u201cDistributed representations of words and phrases and their compositionality,\u201d In: Advances in Neural Information Processing Systems, Curran Associates, Inc., 2013, pp. 3111\u20133119."},{"key":"2022081707553227386_j_comp-2022-0236_ref_015","unstructured":"R. \u0158eh\u016f\u0159ek and P. Sojka, \u201cSoftware framework for topic modelling with large corpora,\u201d In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta, May 2010, pp. 45\u201350. http:\/\/is.muni.cz\/publication\/884893\/en."},{"key":"2022081707553227386_j_comp-2022-0236_ref_016","doi-asserted-by":"crossref","unstructured":"T. P. Adewumi, \u201cInner loop program construct: A faster way for program execution,\u201d Open Comput. Sci., vol. 8, no. 1, pp. 115\u2013122, 2018.","DOI":"10.1515\/comp-2018-0004"},{"key":"2022081707553227386_j_comp-2022-0236_ref_017","doi-asserted-by":"crossref","unstructured":"T. P. Adewumi and M. Liwicki, \u201cInner for-loop for speeding up blockchain mining,\u201d Open Comput. Sci., vol. 10, no. 1, pp. 42\u201347, 2019. 10.1515\/comp-2020-0004.","DOI":"10.1515\/comp-2020-0004"},{"key":"2022081707553227386_j_comp-2022-0236_ref_018","doi-asserted-by":"crossref","unstructured":"S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, \u201cIndexing by latent semantic analysis,\u201d J. Am. Soc. Inform. Sci., vol. 41, no. 6, pp. 391\u2013407, 1990.","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"2022081707553227386_j_comp-2022-0236_ref_019","unstructured":"Z. Huang, W. Xu, and K. Yu, Bidirectional lstm-crf models for sequence tagging, 2015, arXiv: http:\/\/arXiv.org\/abs\/arXiv:1508.01991."},{"key":"2022081707553227386_j_comp-2022-0236_ref_020","unstructured":"G. E. Hinton and S. Roweis, \u201cStochastic neighbor embedding,\u201d Adv. Neural Inform. Process. Sys., MIT Press, vol. 15, 2002."},{"key":"2022081707553227386_j_comp-2022-0236_ref_021","unstructured":"L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, et al., \u201cPlacing search in context: The concept revisited,\u201d ACM Trans. Inform. Sys., vol. 20, no. 1, pp. 116\u2013131, 2002."},{"key":"2022081707553227386_j_comp-2022-0236_ref_022","doi-asserted-by":"crossref","unstructured":"J. Thomason, A. Padmakumar, J. Sinapov, N. Walker, Y. Jiang, H. Yedidsion, et al., \u201cJointly improving parsing and perception for natural language commands through human-robot dialog,\u201d J. Artif. Intell. Res., vol. 67, pp. 327\u2013374, 2020.","DOI":"10.1613\/jair.1.11485"},{"key":"2022081707553227386_j_comp-2022-0236_ref_023","doi-asserted-by":"crossref","unstructured":"J. Pennington, R. Socher, and C. D. Manning, \u201cGlove: Global vectors for word representation,\u201d In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1532\u20131543. 10.3115\/v1\/D14-1162.","DOI":"10.3115\/v1\/D14-1162"},{"key":"2022081707553227386_j_comp-2022-0236_ref_024","doi-asserted-by":"crossref","unstructured":"A. Gatt and E. Krahmer, \u201cSurvey of the state of the art in natural language generation: Core tasks, applications and evaluation,\u201d J. Artif. Intell. Res., vol. 61, pp. 65\u2013170, 2018.","DOI":"10.1613\/jair.5477"},{"key":"2022081707553227386_j_comp-2022-0236_ref_025","unstructured":"T. P. Adewumi, F. Liwicki, and M. Liwicki, Corpora compared: The case of the Swedish Gigaword & Wikipedia Corpora, 2020, arXiv: http:\/\/arXiv.org\/abs\/arXiv:2011.03281."},{"key":"2022081707553227386_j_comp-2022-0236_ref_026","unstructured":"T. P. Adewumi, F. Liwicki, and M. Liwicki, The challenge of diacritics in Yoruba embeddings, 2020, arXiv: http:\/\/arXiv.org\/abs\/arXiv:2011.07605."},{"key":"2022081707553227386_j_comp-2022-0236_ref_027","unstructured":"T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, \u201cMan is to computer programmer as woman is to homemaker? debiasing word embeddings,\u201d In: Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 4349\u20134357."},{"key":"2022081707553227386_j_comp-2022-0236_ref_028","doi-asserted-by":"crossref","unstructured":"T. P. Adewumi, F. Liwicki, and M. Liwicki, \u201cConversational systems in machine learning from the point of view of the philosophy of science using alime chat and related studies,\u201d Philosophies, vol. 4, no. 3, p. 41.","DOI":"10.3390\/philosophies4030041"},{"key":"2022081707553227386_j_comp-2022-0236_ref_029","unstructured":"Wikipedia, Wiki news abstract, 2019."},{"key":"2022081707553227386_j_comp-2022-0236_ref_030","unstructured":"Wikipedia, Simple wiki articles, 2019."},{"key":"2022081707553227386_j_comp-2022-0236_ref_031","doi-asserted-by":"crossref","unstructured":"C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, et al., \u201cOne billion word benchmark for measuring progress in statistical language modeling,\u201d Technical report, Google, 2013.","DOI":"10.21437\/Interspeech.2014-564"},{"key":"2022081707553227386_j_comp-2022-0236_ref_032","unstructured":"A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, \u201cLearning word vectors for sentiment analysis,\u201d In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1, Association for Computational Linguistics, 2011, pp. 142\u2013150."},{"key":"2022081707553227386_j_comp-2022-0236_ref_033","doi-asserted-by":"crossref","unstructured":"J. Bos, V. Basile, K. Evang, N. J. Venhuizen, and J. Bjerva, \u201cThe Groningen meaning bank,\u201d In: Handbook of Linguistic Annotation, Springer, 2017, pp. 463\u2013496","DOI":"10.1007\/978-94-024-0881-2_18"},{"key":"2022081707553227386_j_comp-2022-0236_ref_034","doi-asserted-by":"crossref","unstructured":"E. Loper and S. Bird, Nltk: the natural language toolkit, 2002, arXiv: http:\/\/arXiv.org\/abs\/cs\/0205028.","DOI":"10.3115\/1118108.1118117"},{"key":"2022081707553227386_j_comp-2022-0236_ref_035","doi-asserted-by":"crossref","unstructured":"B. Chiu, A. Korhonen, and S. Pyysalo, \u201cIntrinsic evaluation of word vectors fails to predict extrinsic performance,\u201d In: Proceedings of the 1st Workshop on Evaluating Vector-space Representations for NLP, Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1\u20136.","DOI":"10.18653\/v1\/W16-2501"}],"container-title":["Open Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0236\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0236\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T07:58:17Z","timestamp":1660723097000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0236\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,1]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,3,16]]},"published-print":{"date-parts":[[2022,3,16]]}},"alternative-id":["10.1515\/comp-2022-0236"],"URL":"https:\/\/doi.org\/10.1515\/comp-2022-0236","relation":{},"ISSN":["2299-1093"],"issn-type":[{"value":"2299-1093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,1]]}}}