{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T07:32:22Z","timestamp":1723015942483},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,8]]},"abstract":"<jats:p>Recently, variants of neural networks for computational linguistics have been proposed and successfully applied to neural language modeling and neural machine translation. These neural models can leverage knowledge from massive corpora but they are extremely slow as they predict candidate words from a large vocabulary during training and inference. As an alternative to gradient approximation and softmax with class decomposition, we explore the tree-based hierarchical softmax method and reform its architecture, making it compatible with modern GPUs and introducing a compact tree-based loss function. When combined with several word hierarchical clustering algorithms, improved performance is achieved in language modelling task with intrinsic evaluation criterions on PTB, WikiText-2 and WikiText-103 datasets.<\/jats:p>","DOI":"10.24963\/ijcai.2017\/271","type":"proceedings-article","created":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T09:14:07Z","timestamp":1501233247000},"page":"1951-1957","source":"Crossref","is-referenced-by-count":2,"title":["Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models"],"prefix":"10.24963","author":[{"given":"Nan","family":"Jiang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Software Development Environment, Beihang University, China"},{"name":"School of Computer Science and Engineering, Beihang University, China"}]},{"given":"Wenge","family":"Rong","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Software Development Environment, Beihang University, China"},{"name":"School of Computer Science and Engineering, Beihang University, China"}]},{"given":"Min","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Software Engineering, Chongqing University, China"}]},{"given":"Yikang","family":"Shen","sequence":"additional","affiliation":[{"name":"Montr\u00e9al Institute for Learning Algorithms, Universt\u00e9 de Montr\u00e9al, Canada"}]},{"given":"Zhang","family":"Xiong","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Software Development Environment, Beihang University, China"},{"name":"School of Computer Science and Engineering, Beihang University, China"}]}],"member":"10584","event":{"number":"26","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)","University of Technology Sydney (UTS)","Australian Computer Society (ACS)"],"acronym":"IJCAI-2017","name":"Twenty-Sixth International Joint Conference on Artificial Intelligence","start":{"date-parts":[[2017,8,19]]},"theme":"Artificial Intelligence","location":"Melbourne, Australia","end":{"date-parts":[[2017,8,26]]}},"container-title":["Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T11:53:07Z","timestamp":1501242787000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2017\/271"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2017,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2017\/271","relation":{},"subject":[],"published":{"date-parts":[[2017,8]]}}}