{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T06:08:58Z","timestamp":1778134138445,"version":"3.51.4"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,8]]},"abstract":"<jats:p>Reinforcement learning (RL), which has been successfully applied to sequence prediction, introduces \\textit{reward} as sequence-level supervision signal to evaluate the quality of a generated sequence. Existing RL approaches use the ground-truth sequence to define reward, which limits the application of RL techniques to labeled data. Since labeled data is usually scarce and\/or costly to collect, it is desirable to leverage large-scale unlabeled data. In this paper, we extend existing RL methods for sequence prediction to exploit unlabeled data. We propose to learn the reward function from labeled data and use the predicted reward as \\textit{pseudo reward} for unlabeled data so that we can learn from unlabeled data using the pseudo reward. To get good pseudo reward on unlabeled data, we propose a RNN-based reward network with attention mechanism, trained with purposely biased data distribution.\n\nExperiments show that the pseudo reward can provide good supervision and guide the learning process on unlabeled data. We observe significant improvements on both neural machine translation and text summarization.<\/jats:p>","DOI":"10.24963\/ijcai.2017\/432","type":"proceedings-article","created":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T09:14:07Z","timestamp":1501233247000},"page":"3098-3104","source":"Crossref","is-referenced-by-count":5,"title":["Sequence Prediction with Unlabeled Data by Reward Function Learning"],"prefix":"10.24963","author":[{"given":"Lijun","family":"Wu","sequence":"first","affiliation":[{"name":"School of Data and Computer Science, Sun Yat-sen University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li","family":"Zhao","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"Qin","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianhuang","family":"Lai","sequence":"additional","affiliation":[{"name":"School of Data and Computer Science, Sun Yat-sen University"},{"name":"Guangdong Key Laboratory of Information Security Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tie-Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Twenty-Sixth International Joint Conference on Artificial Intelligence","theme":"Artificial Intelligence","location":"Melbourne, Australia","acronym":"IJCAI-2017","number":"26","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)","University of Technology Sydney (UTS)","Australian Computer Society (ACS)"],"start":{"date-parts":[[2017,8,19]]},"end":{"date-parts":[[2017,8,26]]}},"container-title":["Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2017,7,28]],"date-time":"2017-07-28T11:53:53Z","timestamp":1501242833000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2017\/432"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2017,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2017\/432","relation":{},"subject":[],"published":{"date-parts":[[2017,8]]}}}